BITC: Biodiversity Diagnoses. â Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño. 2. WHAT, WHERE, WHEN. Primary Biodiversity Data (PBD):.
Biodiversity Informatics Training Curriculum: Biodiversity Diagnoses, Entebbe, Uganda, 2015. S5: 1-21
Temporal checks Arturo H. Ariño University of Navarra
Primary Biodiversity Data (PBD):
WHAT, WHERE, WHEN (okay--and a host of other things)
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
2
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
3
Humpbacks, again
Type of data
Number of records
%
Any data
25773
Taxon name
25773
100%
Georeference
24871
97%
Date
23441
91%
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
4
GRID - Arendal
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
5
month Etiquetas1de fila2 3 4 1 597 40 52 14 2 4 49 73 18 3 6 35 69 9 4 4 40 34 4 5 3 63 45 5 6 4 40 115 6 7 9 68 26 10 8 6 82 65 6 9 32 61 73 5 10 17 59 70 13 11 6 56 41 8 12 9 42 37 11 13 6 82 43 23 14 40 109 31 12 15 11 109 62 7 16 12 24 22 8 17 23 31 5 6 18 31 40 23 34 19 5 73 28 29 20 23 25 35 21 21 24 16 27 18 22 21 27 12 20 23 27 10 11 14 24 5 61 9 8 25 18 25 17 25 26 11 42 8 16 27 16 86 5 20 28 17 95 7 9 29 74 37 9 15 30 62 7 33 31 57 3
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
5 34 12 39 23 23 26 19 13 56 53 58 41 17 32 29 27 41 8 42 30 29 28 36 36 20 38 41 39 54 31 52
6 36 57 74 26 14 57 39 45 66 43 106 100 126 72 144 124 68 112 89 124 102 132 111 112 79 60 82 77 90 68
7 88 84 67 65 64 109 142 111 157 140 100 126 134 102 177 155 124 141 238 204 175 135 129 217 190 270 310 229 167 174 247
8 239 185 196 173 183 251 194 221 185 225 212 210 222 194 181 164 153 177 119 239 174 175 228 223 265 255 201 177 172 118 126
9 132 194 147 143 171 107 114 140 84 106 78 145 133 137 93 108 95 66 117 62 129 90 97 108 88 63 124 66 53 50
10 74 93 34 55 50 77 73 71 28 40 41 21 38 50 21 22 42 20 30 15 51 26 25 25 26 19 26 42 14 21 20
11 18 14 23 14 14 40 8 11 15 16 12 20 12 12 12 11 16 9 13 9 7 5 17 8 10 3 1 7 4
12 4 6 6 8 8 6 9 12 2 10 4 9 2 5 9 8 6 17 6 2 9 6 4 2 8 2 1 2 4 5
6
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
7
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
8
Ariño A.H., Otegui J.. (2009). Metaanálisis de los datos de biodiversidad suministrados a través de gbif.es. Universidad de Navarra. DOI: 10.13140/2.1.4661.5366 Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144
type 06/11/1904 10 Apr 1974 08/09/2003 02:11 PM 1 Apr 1970 -- Apr 1859 -- -- ---Locality: BLB 11 September 2003 1608 9 Jan 2006 1909/03/21/1909/03/21 February 09, 2014 03:22 [date unknown] Aug 1925 -- -- 192April 28, 2013 12:25:11 PM PDT Mon Jun 18 2012 14:25:22 GMT-0400 (EDT) -1984 2013-08-15 1906 -- --- ---prior to 11 Jul 2007 October 12, 2008 October 1963 - Jul 1984 99 XXX 9999 2013-08-13 11:36am 2014-05-21 6:25:01 PM PDT 2014-05-31 10:26 am PST 2014-08-20 10:33AM 1884 1 1
XXY DMY XXY DMY XMY XXY N/A DMY Y DMY YMD MDY N/A MY XXY MDY MDY Y YMD DMY DMY MDY MDY XMY XXX YMD YMD YMD YMD YXX
9
HasYear HasMonth HasDay HasHour HasMinute HasSecond AMPM TZ UpTo Range LeadingZeroDay leadingZeroMonth MonthAbbrev DOW
Now- What’s in a date?
Order
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
Y Y Y Y Y N N Y Y Y Y Y N Y N Y Y Y Y N Y Y Y Y Y Y Y Y Y Y
Y Y Y Y Y N N Y N Y Y Y N Y N Y Y N Y N Y Y Y Y N Y Y Y Y Y
Y Y Y Y N N N Y N Y Y Y N N N Y Y N Y N Y Y N N N Y Y Y Y Y
N N Y N N N N N N N N Y N N N Y Y N Y N N N N N N Y N Y Y N
N N Y N N N N N N N N Y N N N Y Y N Y N N N N N N Y N Y Y N
N N N N N N N N N N N N N N N Y Y N Y N N N N N N N N N N N
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
N N Y N N N N N N N N N N N N Y N N Y N N N N N N Y Y Y Y N
N N N N N N N N N N N N N N N Y Y N N N N N N N N N Y Y N N
N N N N N N N N N N N N N N N N N Y N N Y N N Y N N N N N N
N N N N N N N N N N Y N N N N N N N N N Y N N N N N N N N N
Y Y Y N N N N N N N ? N N N N N N N ? N N N N N Y ? ? ? ? N
Y Y Y Y Y Y Y Y N
Y Y Y N Y N Y N Y Y N N Y -
N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N
Incidence 39,1% 21,6% 14,0% 5,8% 3,5% 3,3% 2,3% 2,2% 1,8% 1,6% 1,1% 0,7% 0,6% 0,2% 0,2% 0,2% 0,2% 0,2% 0,2% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1% 0,1%
10
2010.04.28
2011.14.09
YYYY.MM.DD Otegui, 2013
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
11
2010.04.28 2011.14.09
ErrCode: 10
>ErrCode10: Potential swap month/day Otegui, 2013
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
12
Humpbacks, again
Type of data
Number of records
%
With issues
Any data
25773
Taxon name
25773
100%
0,2%
Georeference
24871
97%
10,4%
Date
23441
91%
1,8%
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
13
Otegui, Ariño et al., 2013 – Biodiversity Informatics 8: 173-184
0.22% (0.6%)
7.81% (22.0%)
0.85% (2.4%)
0.31% (6.8%) 26.45% (74.5%)
0.26% (5.9%)
3.36% (74.3%)
0.41% (9.0%) 0.17% (3.7%)
0.12% (0.4%) Day
Day & month
Month
Day & year
Year
Month & year Day, month & year
5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 Records
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
14
Chronhorogram Plot (tempolar coordinates) proposed by Ariño & Otegui, Proceedings of TDWG, 2008. This rendering published in Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144
~ 30 yrs
1750
2012
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
15
Otegui, J., Robles, E., & Ariño, A. H. (2009). Noise in biodiversity data. In Poster presented at the ‘‘e-biosphere’’Conference on Biodiversity Informatics, London.
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
16
Otegui, J., Robles, E., & Ariño, A. H. (2009). Noise in biodiversity data. In Poster presented at the ‘‘e-biosphere’’Conference on Biodiversity Informatics, London.
Provider 10 Provider 16 Provider 24 Provider 38 Provider 169 Provider 82 Provider 90 Provider 111 Provider 220 Provider 139 Provider 255 Provider 172
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
17
SUMMARY: Main issue when checking time
George Pal Productions, 1960 BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
18
Workflow • Split months, days, years • Check outstanding date frequencies • Pivot over provider, date component
• Check impossible dates • Months over 12 • Years beyond today
• Check potential voids • Look at first-day/first month frequencies • Look at day-of-year frequencies
• Homogenize dates in ISO format, keep components separate
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
19
Try it! • Over your downloaded sample date: • Pivot and get date type frequencies from verbatim data • Calculate date frequencies by day-of-month • Locate any 1/1 inconsistency
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
20
Sources and references All figures, plots and analyses by the author except where otherwise noted. Citation: Ariño A.H., 2015: Taxonomical checks. BITC: National Biodiversity Diagnoses, Entebbe (Uganda), S3: 1-31
• Ariño A.H., Otegui J.. (2009). Metaanálisis de los datos de biodiversidad suministrados a través de gbif.es. Universidad de Navarra. DOI: 10.13140/2.1.4661.5366 • Ariño, A. H., Otegui, J. (2008). Sampling biodiversity sampling. Proceedings of TDWG, 2008, 77-78. • ARIÑO A.H., ROBLES E. 2006: Variable-level nomenclators. Proceedings of TDWG, 2006: 84 pp. ISBN 1-930723-56-3. • Global Biodiversity Information Facility (GBIF) data portal, www.gbif.org. • Otegui J. (2013). Quality and fitness-for-use assessments on ther primary data indexed at the Global Biodiversity Information Facility (GBIF). PhD Thesis, University of Navarra. • Otegui, J., Ariño, A. H., Chavan, V., & Gaiji, S. (2013). On the dates of GBIF mobilised primary biodiversity records. Biodiversity Informatics, 8(2). • Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. doi:10.1371/journal.pone.0055144 • Otegui, J., Robles, E., & Ariño, A. H. (2009). Noise in biodiversity data. In Poster presented at the ‘‘e-biosphere’’Conference on Biodiversity Informatics, London. • Pravettoni R., 2011: The long migration of the Humpback Whale. UNEP/GRID-Arendal. http://www.grida.no/publications/rr/livingplanet/
BITC: Biodiversity Diagnoses. – Entebbe, Uganda, January 2015. Session: Time Checks. A.H. Ariño
21