Retrieving textual information from LiPD files#
Preamble#
PyLiPD
is a Python package that allows you to read, manipulate, and write LiPD formatted datasets. In this tutorial, we will demonstrate how you can use pre-defined APIs that allows getting specific information from a LiPD file.
Goals#
Use existing APIs to get information about the datasets loaded in the workspace, their location, the variables available, the types of geologic archives.
Obtain a BibTeX file of references to properly credit scholarly contributions
Reading Time: 5 minutes
Keywords#
LiPD
Pre-requisites#
None. This tutorial assumes basic knowledge of Python and Pandas. If you are not familiar with this coding language and this particular library, check out this tutorial: http://linked.earth/ec_workshops_py/.
Relevant Packages#
pylipd
Data Description#
This notebook uses the following datasets, in LiPD format:
McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862
Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.
PAGES2k Consortium., Emile-Geay, J., McKay, N. et al. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). doi:10.1038/sdata.2017.88
Demonstration#
Extracting infomation about the content of a LiPD object#
Let’s start by importing our favorite package and load our datasets.
from pylipd.lipd import LiPD
Let’s load some diverse datasets to highlight to capabilities:
path = '../data/Pages2k/'
D = LiPD()
D.load_from_dir(path)
Loading 16 LiPD files
0%| | 0/16 [00:00<?, ?it/s]
38%|███▊ | 6/16 [00:00<00:00, 50.11it/s]
75%|███████▌ | 12/16 [00:00<00:00, 46.90it/s]
100%|██████████| 16/16 [00:00<00:00, 42.60it/s]
Loaded..
data_path = ['../data/Crystal.McCabe-Glynn.2013.lpd', '../data/ODP846.Lawrence.2006.lpd', 'https://lipdverse.org/data/iso2k100_CO06MOPE/1_0_2//CO06MOPE.lpd']
D.load(data_path)
Loading 3 LiPD files
0%| | 0/3 [00:00<?, ?it/s]
33%|███▎ | 1/3 [00:00<00:00, 2.80it/s]
67%|██████▋ | 2/3 [00:01<00:00, 1.51it/s]
100%|██████████| 3/3 [00:01<00:00, 1.75it/s]
100%|██████████| 3/3 [00:01<00:00, 1.76it/s]
Loaded..
Getting information about Datasets#
From the introductory notebooks on loading LiPD datasets and working with LiPD
objects, you should be already familiar with the functions to get all the names of the datasets.
D.get_all_dataset_names()
['Eur-NorthernSpain.Martin-Chivelet.2011',
'Eur-NorthernScandinavia.Esper.2012',
'Eur-Stockholm.Leijonhufvud.2009',
'Eur-LakeSilvaplana.Trachsel.2010',
'Eur-SpanishPyrenees.Dorado-Linan.2012',
'Arc-Kongressvatnet.D_Andrea.2012',
'Eur-CoastofPortugal.Abrantes.2011',
'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006',
'Ocn-FeniDrift.Richter.2009',
'Ocn-SinaiPeninsula_RedSea.Moustafa.2000',
'Ant-WAIS-Divide.Severinghaus.2012',
'Asi-SourthAndMiddleUrals.Demezhko.2007',
'Ocn-AlboranSea436B.Nieto-Moreno.2013',
'Eur-SpannagelCave.Mangini.2005',
'Ocn-RedSea.Felis.2000',
'Eur-FinnishLakelands.Helama.2014',
'Crystal.McCabe-Glynn.2013',
'ODP846.Lawrence.2006',
'CO06MOPE']
len(D.get_all_dataset_names())
19
In fact, this function has been used throughout these notebooks to be able to extract other types of information. Another equivalent function returns all the datasetIDs
. datasetIDs
are unique identifiers for each LiPD dataset. This notion was introduced as the name may not be unique enough for unique identification. All datasets from the LiPDGraph
will have an ID but it is not mandatory.
D.get_all_dataset_ids()
['WX0GIjmoc46FH1Oj4c5r',
'fyUORoSbcL0GP0J3wyoj',
'uOhCAmcuPO5Xo9rSniHn',
'23GDZxTEJsBQAH05hU4g',
'PPWjMBBkRAcCv6bkL58K',
'pwY7bQRstXsZc6iOpgRI',
'33wLrOlZRR8hw53DVKSr',
'HH7jd52QFWaBgs9OvMqP',
'IVVTVphliHduuTjQhlTM',
'wH1adV7y36OC0h3kwDRF',
'5oHqINxYpL0XCaLcIjhR',
'mE7P31hoHDXy1Q9yfQlq',
'fYUegig785BJMl3NrZcz',
'19nwWA48PSW3uSoDRiA4',
'4fZQAHmeuJn8ipLfurWv',
'ZDMEZiVVO4eFNwBA4D3o',
'iso2k100_CO06MOPE']
len(D.get_all_dataset_ids())
17
Notice that the function returned only 17 items (2 less than the dataset names). The reason is these files were created before datasetIDs were prevalent on the Lipdverse.
Another function that allows to look up information stored at the dataset level is get_all_archiveTypes
. This one works a little bit differently than the previous functions in that it will only return the unique names present in these datasets:
D.get_all_archiveTypes()
['Speleothem',
'Wood',
'Documents',
'Lake sediment',
'Marine sediment',
'Coral',
'Borehole']
This function is particularly useful to know what terms can be used to filter with specific queries. You can see that coral
appears with two different capitalizations. For filtering, this won’t matter as we will see in the next tutorial.
You can get information about the location of each dataset as follows:
df_loc = D.get_all_locations()
df_loc
dataSetName | geo_meanLat | geo_meanLon | geo_meanElev | |
---|---|---|---|---|
0 | Eur-NorthernSpain.Martin-Chivelet.2011 | 42.9000 | -3.5000 | 1250.0 |
1 | Eur-NorthernScandinavia.Esper.2012 | 68.0000 | 25.0000 | 300.0 |
2 | Eur-Stockholm.Leijonhufvud.2009 | 59.3200 | 18.0600 | 10.0 |
3 | Eur-LakeSilvaplana.Trachsel.2010 | 46.5000 | 9.8000 | 1791.0 |
4 | Eur-SpanishPyrenees.Dorado-Linan.2012 | 42.5000 | 1.0000 | 1200.0 |
5 | Arc-Kongressvatnet.D'Andrea.2012 | 78.0217 | 13.9311 | 94.0 |
6 | Eur-CoastofPortugal.Abrantes.2011 | 41.1000 | -8.9000 | -80.0 |
7 | Ocn-PedradeLume-CapeVerdeIslands.Moses.2006 | 16.7600 | -22.8883 | -5.0 |
8 | Ocn-FeniDrift.Richter.2009 | 55.5000 | -13.9000 | -2543.0 |
9 | Ocn-SinaiPeninsula,RedSea.Moustafa.2000 | 27.8483 | 34.3100 | -3.0 |
10 | Ant-WAIS-Divide.Severinghaus.2012 | -79.4630 | -112.1250 | 1766.0 |
11 | Asi-SourthAndMiddleUrals.Demezhko.2007 | 55.0000 | 59.5000 | 1900.0 |
12 | Ocn-AlboranSea436B.Nieto-Moreno.2013 | 36.2053 | -4.3133 | -1108.0 |
13 | Eur-SpannagelCave.Mangini.2005 | 47.1000 | 11.6000 | 2347.0 |
14 | Ocn-RedSea.Felis.2000 | 27.8500 | 34.3200 | -6.0 |
15 | Eur-FinnishLakelands.Helama.2014 | 62.0000 | 28.3250 | 130.0 |
16 | Crystal.McCabe-Glynn.2013 | 36.5900 | -118.8200 | 1386.0 |
17 | ODP846.Lawrence.2006 | -3.1000 | -90.8000 | -3296.0 |
18 | CO06MOPE | 16.7500 | -22.8883 | -5.0 |
Getting information about variables#
To get information about available variable names, you can do the following:
D.get_all_variable_names()
['year',
'd18O',
'MXD',
'temperature',
'trsgi',
'Uk37',
'Mg_Ca',
'depth_top',
'depth_bottom',
'notes',
'uncertainty_temperature',
'230th/232th_uncertainty',
'corr_age',
'd234uinitial',
'238u',
'depth_dating',
'd18o',
'corr_age_uncert',
'230th age',
'd234uinitial_uncertainty',
'depth',
'230th/238u_uncertainty',
'230th/232th',
'age',
'230th/238u',
'sample',
'Year',
'238u_uncertainty',
'230th age_uncertaity',
'd234u',
'232th',
'232th_uncertainty',
'd234u_undertainty',
'230th age_uncertainty',
'event',
'c37 total',
'u. peregrina d18o',
'section',
'depth cr',
'sample label',
'ukprime37',
'sst',
'c. wuellerstorfi d13c',
'd180',
'median',
'lower95',
'depth comp',
'temp prahl',
'c. wuellerstorfi d18o',
'site/hole',
'interval',
'upper95',
'u. peregrina d13c',
'temp muller']
Note that like the functions retrieving the various archiveTypes
, this function also only returns the unique names. As we have explored previously, the Euro2k database contains more than one record correspoonding to temperature
. Again, this function can be used to figure out what to filter by.
If you want to have more granularity about which variable is available in which datasets and their associated unique IDs, you can use the following function:
D.get_all_variables()
uri | TSID | variableName | |
---|---|---|---|
0 | http://linked.earth/lipd/Eur-NorthernSpain.Mar... | PYTE7VH7UMO | year |
1 | http://linked.earth/lipd/Eur-NorthernScandinav... | PYTECO66XAD | year |
2 | http://linked.earth/lipd/Eur-Stockholm.Leijonh... | PYTWVH672OU | year |
3 | http://linked.earth/lipd/Eur-LakeSilvaplana.Tr... | PYT1E4X3DDF | year |
4 | http://linked.earth/lipd/Eur-SpanishPyrenees.D... | PYT2K8MIA3N | year |
... | ... | ... | ... |
93 | http://linked.earth/lipd/paleo0measurement0.PY... | PYTJ3PSH0LT | site/hole |
94 | http://linked.earth/lipd/paleo0measurement0.PY... | PYT2ZB6MLZ9 | interval |
95 | http://linked.earth/lipd/chron0model0summary0.... | PYTDIEKUM44 | upper95 |
96 | http://linked.earth/lipd/paleo0measurement1.PY... | PYTTUPVG4K3 | u. peregrina d13c |
97 | http://linked.earth/lipd/paleo0measurement0.PY... | PYTGO6NV72Y | temp muller |
98 rows × 3 columns
Get a bibliography#
pylipd
makes is easy to retrieve the publication information from the subset of records you used and export this to a .bib
file:
bibs, df = D.get_bibtex(remote = True, save = True, path = '../data/mybiblio.bib', verbose = False)
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (http://nbn-resolving.de/urn:nbn:de:gbv:46-ep000102745), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Cannot find a matching record for the provided DOI (None), creating the entry manually
Let’s decompose the parameters for this function:
remote
: If set to True,PyLipd
will use thecrossref
function in thedoi2bib
package to retrieve the bilbiography. You can only use this option online. If the retrieval fails, the entry will be created from the information in the LiPD file. If set to False, only the information in the file will be used.save
,path
: Ifsave
is set to True,PyliPD
will save the entries in a.bib
file. In this example, we saved the file to the data folder contained in this repository.verbose
if set to True, the bibliography will print on the screen.
In addition to saving the file, the function returns bibs
, a list of text bliography and df
, which presents the information in a Pandas DataFrame
.
df.head()
dsname | title | authors | doi | pubyear | year | journal | volume | issue | pages | type | publisher | report | citeKey | edition | institution | url | url2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Eur-NorthernSpain.Martin-Chivelet.2011 | World Data Center for Paleoclimatology | J. Martín-Chivelet | None | None | NaN | None | None | None | None | dataCitation | None | None | martin2011httpwwwncdcnoaagovpaleostudy12194Dat... | None | World Data Center for Paleoclimatology | None | http://www.ncdc.noaa.gov/paleo/study/12194 |
1 | Eur-NorthernSpain.Martin-Chivelet.2011 | Land surface temperature changes in northern I... | María J. Turrero and Ana I. Ortega and Javier ... | 10.1016/j.gloplacha.2011.02.002 | None | 2011.0 | Global and Planetary Change | 77 | None | 1-12 | article | Elsevier BV | None | martin2011landsurfacetemperaturecha | None | None | None | None |
2 | Eur-NorthernScandinavia.Esper.2012 | Orbital forcing of tree-ring data | Jan Esper and David C. Frank and Nils Fischer ... | 10.1038/nclimate1589 | None | 2012.0 | Nature Climate Change | 2 | None | 862-866 | article | Nature Publishing Group | None | esper2012orbitalforcingoftreeringd | None | None | None | None |
3 | Eur-NorthernScandinavia.Esper.2012 | World Data Center for Paleoclimatology | J. Esper | None | None | NaN | None | None | None | None | dataCitation | None | None | esper2012httpwwwncdcnoaagovpaleostudy1003406Da... | None | World Data Center for Paleoclimatology | None | http://www.ncdc.noaa.gov/paleo/study/1003406 |
4 | Eur-Stockholm.Leijonhufvud.2009 | Five centuries of Stockholm winter/spring temp... | Anders Moberg and Johan Söderberg and Ulrica S... | 10.1007/s10584-009-9650-y | None | 2009.0 | Climatic Change | 101 | None | 109-141 | article | Springer Science + Business Media | None | leijonhufvud2009fivecenturiesofstockholmw | None | None | None | None |