WHAT, WHERE, WHEN

8 downloads 0 Views 3MB Size Report
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño. 2. WHAT, WHERE, WHEN. Primary Biodiversity Data (PBD):.
Biodiversity Informatics Training Curriculum: Biodiversity Diagnoses, Entebbe, Uganda, 2015. S5: 1-21

Temporal checks Arturo H. Ariño University of Navarra

Primary Biodiversity Data (PBD):

WHAT, WHERE, WHEN (okay--and a host of other things)

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

2

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

3

Humpbacks, again

Type of data

Number of records

%

Any data

25773

Taxon name

25773

100%

Georeference

24871

97%

Date

23441

91%

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

4

GRID - Arendal

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

5

month Etiquetas1de fila2 3 4 1 597 40 52 14 2 4 49 73 18 3 6 35 69 9 4 4 40 34 4 5 3 63 45 5 6 4 40 115 6 7 9 68 26 10 8 6 82 65 6 9 32 61 73 5 10 17 59 70 13 11 6 56 41 8 12 9 42 37 11 13 6 82 43 23 14 40 109 31 12 15 11 109 62 7 16 12 24 22 8 17 23 31 5 6 18 31 40 23 34 19 5 73 28 29 20 23 25 35 21 21 24 16 27 18 22 21 27 12 20 23 27 10 11 14 24 5 61 9 8 25 18 25 17 25 26 11 42 8 16 27 16 86 5 20 28 17 95 7 9 29 74 37 9 15 30 62 7 33 31 57 3

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

5 34 12 39 23 23 26 19 13 56 53 58 41 17 32 29 27 41 8 42 30 29 28 36 36 20 38 41 39 54 31 52

6 36 57 74 26 14 57 39 45 66 43 106 100 126 72 144 124 68 112 89 124 102 132 111 112 79 60 82 77 90 68

7 88 84 67 65 64 109 142 111 157 140 100 126 134 102 177 155 124 141 238 204 175 135 129 217 190 270 310 229 167 174 247

8 239 185 196 173 183 251 194 221 185 225 212 210 222 194 181 164 153 177 119 239 174 175 228 223 265 255 201 177 172 118 126

9 132 194 147 143 171 107 114 140 84 106 78 145 133 137 93 108 95 66 117 62 129 90 97 108 88 63 124 66 53 50

10 74 93 34 55 50 77 73 71 28 40 41 21 38 50 21 22 42 20 30 15 51 26 25 25 26 19 26 42 14 21 20

11 18 14 23 14 14 40 8 11 15 16 12 20 12 12 12 11 16 9 13 9 7 5 17 8 10 3 1 7 4

12 4 6 6 8 8 6 9 12 2 10 4 9 2 5 9 8 6 17 6 2 9 6 4 2 8 2 1 2 4 5

6

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

7

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

8

Ariño A.H., Otegui J.. (2009). Metaanálisis de los datos de biodiversidad suministrados a través de gbif.es. Universidad de Navarra. DOI: 10.13140/2.1.4661.5366 Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144

type 06/11/1904 10 Apr 1974 08/09/2003 02:11 PM 1 Apr 1970 -- Apr 1859 -- -- ---Locality: BLB 11 September 2003 1608 9 Jan 2006 1909/03/21/1909/03/21 February 09, 2014 03:22 [date unknown] Aug 1925 -- -- 192April 28, 2013 12:25:11 PM PDT Mon Jun 18 2012 14:25:22 GMT-0400 (EDT) -1984 2013-08-15 1906 -- --- ---prior to 11 Jul 2007 October 12, 2008 October 1963 - Jul 1984 99 XXX 9999 2013-08-13 11:36am 2014-05-21 6:25:01 PM PDT 2014-05-31 10:26 am PST 2014-08-20 10:33AM 1884 1 1

XXY DMY XXY DMY XMY XXY N/A DMY Y DMY YMD MDY N/A MY XXY MDY MDY Y YMD DMY DMY MDY MDY XMY XXX YMD YMD YMD YMD YXX

9

HasYear HasMonth HasDay HasHour HasMinute HasSecond AMPM TZ UpTo Range LeadingZeroDay leadingZeroMonth MonthAbbrev DOW

Now- What’s in a date?

Order

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

Y Y Y Y Y N N Y Y Y Y Y N Y N Y Y Y Y N Y Y Y Y Y Y Y Y Y Y

Y Y Y Y Y N N Y N Y Y Y N Y N Y Y N Y N Y Y Y Y N Y Y Y Y Y

Y Y Y Y N N N Y N Y Y Y N N N Y Y N Y N Y Y N N N Y Y Y Y Y

N N Y N N N N N N N N Y N N N Y Y N Y N N N N N N Y N Y Y N

N N Y N N N N N N N N Y N N N Y Y N Y N N N N N N Y N Y Y N

N N N N N N N N N N N N N N N Y Y N Y N N N N N N N N N N N

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

N N Y N N N N N N N N N N N N Y N N Y N N N N N N Y Y Y Y N

N N N N N N N N N N N N N N N Y Y N N N N N N N N N Y Y N N

N N N N N N N N N N N N N N N N N Y N N Y N N Y N N N N N N

N N N N N N N N N N Y N N N N N N N N N Y N N N N N N N N N

Y Y Y N N N N N N N ? N N N N N N N ? N N N N N Y ? ? ? ? N

Y Y Y Y Y Y Y Y N

Y Y Y N Y N Y N Y Y N N Y -

N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N

Incidence 39,1% 21,6% 14,0% 5,8% 3,5% 3,3% 2,3% 2,2% 1,8% 1,6% 1,1% 0,7% 0,6% 0,2% 0,2% 0,2% 0,2% 0,2% 0,2% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1%

10

2010.04.28

2011.14.09

YYYY.MM.DD Otegui, 2013

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

11

2010.04.28 2011.14.09

ErrCode: 10

>ErrCode10: Potential swap month/day Otegui, 2013

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

12

Humpbacks, again

Type of data

Number of records

%

With issues

Any data

25773

Taxon name

25773

100%

0,2%

Georeference

24871

97%

10,4%

Date

23441

91%

1,8%

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

13

Otegui, Ariño et al., 2013 – Biodiversity Informatics 8: 173-184

0.22% (0.6%)

7.81% (22.0%)

0.85% (2.4%)

0.31% (6.8%) 26.45% (74.5%)

0.26% (5.9%)

3.36% (74.3%)

0.41% (9.0%) 0.17% (3.7%)

0.12% (0.4%) Day

Day & month

Month

Day & year

Year

Month & year Day, month & year

5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 Records

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

14

Chronhorogram Plot (tempolar coordinates) proposed by Ariño & Otegui, Proceedings of TDWG, 2008. This rendering published in Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144

~ 30 yrs

1750

2012

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

15

Otegui, J., Robles, E., & Ariño, A. H. (2009). Noise in biodiversity data. In Poster presented at the ‘‘e-biosphere’’Conference on Biodiversity Informatics, London.

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

16

Otegui, J., Robles, E., & Ariño, A. H. (2009). Noise in biodiversity data. In Poster presented at the ‘‘e-biosphere’’Conference on Biodiversity Informatics, London.

Provider 10 Provider 16 Provider 24 Provider 38 Provider 169 Provider 82 Provider 90 Provider 111 Provider 220 Provider 139 Provider 255 Provider 172

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

17

SUMMARY: Main issue when checking time

George Pal Productions, 1960 BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

18

Workflow • Split months, days, years • Check outstanding date frequencies • Pivot over provider, date component

• Check impossible dates • Months over 12 • Years beyond today

• Check potential voids • Look at first-day/first month frequencies • Look at day-of-year frequencies

• Homogenize dates in ISO format, keep components separate

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

19

Try it! • Over your downloaded sample date: • Pivot and get date type frequencies from verbatim data • Calculate date frequencies by day-of-month • Locate any 1/1 inconsistency

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

20

Sources and references All figures, plots and analyses by the author except where otherwise noted. Citation: Ariño A.H., 2015: Taxonomical checks. BITC: National Biodiversity Diagnoses, Entebbe (Uganda), S3: 1-31

• Ariño A.H., Otegui J.. (2009). Metaanálisis de los datos de biodiversidad suministrados a través de gbif.es. Universidad de Navarra. DOI: 10.13140/2.1.4661.5366 • Ariño, A. H., Otegui, J. (2008). Sampling biodiversity sampling. Proceedings of TDWG, 2008, 77-78. • ARIÑO A.H., ROBLES E. 2006: Variable-level nomenclators. Proceedings of TDWG, 2006: 84 pp. ISBN 1-930723-56-3. • Global Biodiversity Information Facility (GBIF) data portal, www.gbif.org. • Otegui J. (2013). Quality and fitness-for-use assessments on ther primary data indexed at the Global Biodiversity Information Facility (GBIF). PhD Thesis, University of Navarra. • Otegui, J., Ariño, A. H., Chavan, V., & Gaiji, S. (2013). On the dates of GBIF mobilised primary biodiversity records. Biodiversity Informatics, 8(2). • Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144 • Otegui, J., Robles, E., & Ariño, A. H. (2009). Noise in biodiversity data. In Poster presented at the ‘‘e-biosphere’’Conference on Biodiversity Informatics, London. • Pravettoni R., 2011: The long migration of the Humpback Whale. UNEP/GRID-Arendal. http://www.grida.no/publications/rr/livingplanet/

BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño

21