Reading LiPD formatted datasets with PyLiPD
#
Preamble#
PyLiPD
is a Python package that allows you to read, manipulate, and write LiPD formatted datasets.
Goals#
Open LiPD formatted datasets from:
a local copy
a web URL
Our LipdGraph database.
Load in parallel
Add more LiPD datasets to an existing object
Merge two
LiPD
objects together
Reading Time: 5 minutes
Keywords#
LiPD
Pre-requisites#
None. This tutorial assumes basic knowledge of Python. If you are not familiar with this coding language, check out this tutorial: http://linked.earth/ec_workshops_py/.
Relevant Packages#
pylipd
Data Description#
This notebook uses the following datasets, in LiPD format:
Nurhati, I. S., Cobb, K. M., & Di Lorenzo, E. (2011). Decadal-scale SST and salinity variations in the central tropical Pacific: Signatures of natural and anthropogenic climate change. Journal of Climate, 24(13), 3294–3308. doi:10.1175/2011jcli3852.1
Moses, C. S., Swart, P. K., and Rosenheim, B. E. (2006), Evidence of multidecadal salinity variability in the eastern tropical North Atlantic, Paleoceanography, 21, PA3010, doi:10.1029/2005PA001257.
PAGES2k Consortium., Emile-Geay, J., McKay, N. et al. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). doi:10.1038/sdata.2017.88
Stott, L., Timmermann, A., & Thunell, R. (2007). Southern Hemisphere and deep-sea warming led deglacial atmospheric CO2 rise and tropical warming. Science (New York, N.Y.), 318(5849), 435–438. doi:10.1126/science.1143791
Tudhope, A. W., Chilcott, C. P., McCulloch, M. T., Cook, E. R., Chappell, J., Ellam, R. M., et al. (2001). Variability in the El Niño-Southern Oscillation through a glacial-interglacial cycle. Science, 291(1511), 1511-1517. doi:doi:10.1126/science.1057969
Tierney, J. E., Abram, N. J., Anchukaitis, K. J., Evans, M. N., Giry, C., Kilbourne, K. H., et al. (2015). Tropical sea surface temperatures for the past four centuries reconstructed from coral archives. Paleoceanography, 30(3), 226–252. doi:10.1002/2014pa002717
Orsi, A. J., Cornuelle, B. D., and Severinghaus, J. P. (2012), Little Ice Age cold interval in West Antarctica: Evidence from borehole temperature at the West Antarctic Ice Sheet (WAIS) Divide, Geophys. Res. Lett., 39, L09710, doi:10.1029/2012GL051260.
Demonstration#
PyLiPD
uses object-oriented programming (OOP). In OOP, object contains the data, associated parameters (e.g., metadata) for the object and code that represents procedures that are applicable to each object. OOP is ubiquituous in Python and presents several advantages over method-oriented programming: it follows the natural relationship between an object and a method, with each call representing a clearly defined action.
In PyLiPD
you will only be dealing with the LiPD
object, so you can import it directly:
from pylipd.lipd import LiPD
Loading LiPD formatted datasets from a local file#
First let’s create an empty object, in which we can load the dataset:
D = LiPD()
Now let’s load our data:
data_path = '../data/Ocn-Palmyra.Nurhati.2011.lpd'
D.load(data_path)
Loading 1 LiPD files
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 30.36it/s]
Loaded..
If you want to see the dataset names contained in your object easily, you can use this function, which returns a list of dataset names:
names = D.get_all_dataset_names()
print(names)
['Ocn-Palmyra.Nurhati.2011']
Loading a LiPD formatted datasets from a url#
data_url = 'https://lipdverse.org/data/iso2k100_CO06MOPE/1_0_2//CO06MOPE.lpd'
D2=LiPD()
D2.load(data_url)
Loading 1 LiPD files
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 2.39it/s]
100%|██████████| 1/1 [00:00<00:00, 2.39it/s]
Loaded..
names = D2.get_all_dataset_names()
print(names)
['CO06MOPE']
If you want to work with both files together, you can simply load the new dataset into your existing object:
D.load(data_url)
names = D.get_all_dataset_names()
print(names)
Loading 1 LiPD files
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 2.10it/s]
100%|██████████| 1/1 [00:00<00:00, 2.10it/s]
Loaded..
['Ocn-Palmyra.Nurhati.2011', 'CO06MOPE']
You can also create the object directly:
data = ['../data/Ocn-Palmyra.Nurhati.2011.lpd', 'https://lipdverse.org/data/iso2k100_CO06MOPE/1_0_2//CO06MOPE.lpd']
D3=LiPD()
D3.load(data)
names = D3.get_all_dataset_names()
print(names)
Loading 2 LiPD files
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 6.47it/s]
100%|██████████| 2/2 [00:00<00:00, 6.46it/s]
Loaded..
['Ocn-Palmyra.Nurhati.2011', 'CO06MOPE']
Loading from a directory#
Let’s load some of the datasets contained in the Euro2k database:
path = '../data/Pages2k/'
D_dir = LiPD()
D_dir.load_from_dir(path)
Loading 16 LiPD files
0%| | 0/16 [00:00<?, ?it/s]
38%|███▊ | 6/16 [00:00<00:00, 52.09it/s]
75%|███████▌ | 12/16 [00:00<00:00, 39.37it/s]
100%|██████████| 16/16 [00:00<00:00, 42.62it/s]
Loaded..
names = D_dir.get_all_dataset_names()
print(names)
['Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-FeniDrift.Richter.2009', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014']
You can still load single files using the method described above and append them:
D_dir.load(data_url)
names = D_dir.get_all_dataset_names()
print(names)
Loading 1 LiPD files
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 3.36it/s]
100%|██████████| 1/1 [00:00<00:00, 3.35it/s]
Loaded..
['Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-FeniDrift.Richter.2009', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014', 'CO06MOPE']
Loading from the remote LipdGraph database#
Files stored on the LiPDverse are also available in a graph database, which supports complex querying through the SPARQL query language. PyLiPD
essentially wraps these complex queries into Python calls to facilitate the manipulation of the datasets.
To load a file from the remote database, all you need to know is the dataset name:
lipd_remote = LiPD()
lipd_remote.set_endpoint("https://linkedearth.graphdb.mint.isi.edu/repositories/LiPDVerse-dynamic")
lipd_remote.load_remote_datasets(["Ocn-MadangLagoonPapuaNewGuinea.Kuhnert.2001", "MD98_2181.Stott.2007", "Ant-WAIS-Divide.Severinghaus.2012"])
print(lipd_remote.get_all_dataset_names())
Caching datasets from remote endpoint..
Making remote query to endpoint: https://linkedearth.graphdb.mint.isi.edu/repositories/LiPDVerse-dynamic
Done..
['Ocn-MadangLagoonPapuaNewGuinea.Kuhnert.2001', 'MD98_2181.Stott.2007', 'Ant-WAIS-Divide.Severinghaus.2012']
Loading in parallel#
If you plan on loading mulitple LiPD files (hundreds to thousands), you may want to do so in parallel. If you choose to do so, you need to use the if __name__ == "__main__"
notation:
if __name__ == "__main__" :
D_parallel = LiPD()
D_parallel.load_from_dir(path, parallel=True)
Loading 16 LiPD files
0%| | 0/16 [00:00<?, ?it/s]
56%|█████▋ | 9/16 [00:00<00:00, 73.93it/s]
100%|██████████| 16/16 [00:00<00:00, 93.17it/s]
Loaded..
After the intial loading, you can resume using your object directly:
print(D_parallel.get_all_dataset_names())
['Eur-LakeSilvaplana.Trachsel.2010', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ant-WAIS-Divide.Severinghaus.2012', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-FeniDrift.Richter.2009', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014']
Merging LiPD objects#
In the course of your work, you may need to merge two LiPD
objects together. Let’s merge D
into D_parallel
:
D_merged = D_parallel.merge(D)
print(D_merged.get_all_dataset_names())
['Eur-LakeSilvaplana.Trachsel.2010', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ant-WAIS-Divide.Severinghaus.2012', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-FeniDrift.Richter.2009', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014', 'CO06MOPE', 'Ocn-Palmyra.Nurhati.2011']