Reading LiPD formatted datasets with PyLiPD#

Authors#

Deborah Khider

Preamble#

PyLiPD is a Python package that allows you to read, manipulate, and write LiPD formatted datasets.

Goals#

  • Open LiPD formatted datasets from:

  • Load in parallel

  • Add more LiPD datasets to an existing object

  • Merge two LiPD objects together

Reading Time: 5 minutes

Keywords#

LiPD

Pre-requisites#

None. This tutorial assumes basic knowledge of Python. If you are not familiar with this coding language, check out this tutorial: http://linked.earth/ec_workshops_py/.

Relevant Packages#

pylipd

Data Description#

This notebook uses the following datasets, in LiPD format:

  • Nurhati, I. S., Cobb, K. M., & Di Lorenzo, E. (2011). Decadal-scale SST and salinity variations in the central tropical Pacific: Signatures of natural and anthropogenic climate change. Journal of Climate, 24(13), 3294–3308. doi:10.1175/2011jcli3852.1

  • Moses, C. S., Swart, P. K., and Rosenheim, B. E. (2006), Evidence of multidecadal salinity variability in the eastern tropical North Atlantic, Paleoceanography, 21, PA3010, doi:10.1029/2005PA001257.

  • PAGES2k Consortium., Emile-Geay, J., McKay, N. et al. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). doi:10.1038/sdata.2017.88

  • Stott, L., Timmermann, A., & Thunell, R. (2007). Southern Hemisphere and deep-sea warming led deglacial atmospheric CO2 rise and tropical warming. Science (New York, N.Y.), 318(5849), 435–438. doi:10.1126/science.1143791

  • Tudhope, A. W., Chilcott, C. P., McCulloch, M. T., Cook, E. R., Chappell, J., Ellam, R. M., et al. (2001). Variability in the El Niño-Southern Oscillation through a glacial-interglacial cycle. Science, 291(1511), 1511-1517. doi:doi:10.1126/science.1057969

  • Tierney, J. E., Abram, N. J., Anchukaitis, K. J., Evans, M. N., Giry, C., Kilbourne, K. H., et al. (2015). Tropical sea surface temperatures for the past four centuries reconstructed from coral archives. Paleoceanography, 30(3), 226–252. doi:10.1002/2014pa002717

  • Orsi, A. J., Cornuelle, B. D., and Severinghaus, J. P. (2012), Little Ice Age cold interval in West Antarctica: Evidence from borehole temperature at the West Antarctic Ice Sheet (WAIS) Divide, Geophys. Res. Lett., 39, L09710, doi:10.1029/2012GL051260.

Demonstration#

PyLiPD uses object-oriented programming (OOP). In OOP, object contains the data, associated parameters (e.g., metadata) for the object and code that represents procedures that are applicable to each object. OOP is ubiquituous in Python and presents several advantages over method-oriented programming: it follows the natural relationship between an object and a method, with each call representing a clearly defined action.

In PyLiPD you will only be dealing with the LiPD object, so you can import it directly:

from pylipd.lipd import LiPD

Loading LiPD formatted datasets from a local file#

First let’s create an empty object, in which we can load the dataset:

D = LiPD()

Now let’s load our data:

data_path = '../data/Ocn-Palmyra.Nurhati.2011.lpd'
D.load(data_path)
Loading 1 LiPD files
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 30.36it/s]
Loaded..

If you want to see the dataset names contained in your object easily, you can use this function, which returns a list of dataset names:

names = D.get_all_dataset_names()
print(names)
['Ocn-Palmyra.Nurhati.2011']

Loading a LiPD formatted datasets from a url#

data_url = 'https://lipdverse.org/data/iso2k100_CO06MOPE/1_0_2//CO06MOPE.lpd'

D2=LiPD()
D2.load(data_url)
Loading 1 LiPD files
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  2.39it/s]
100%|██████████| 1/1 [00:00<00:00,  2.39it/s]
Loaded..

names = D2.get_all_dataset_names()
print(names)
['CO06MOPE']

If you want to work with both files together, you can simply load the new dataset into your existing object:

D.load(data_url)

names = D.get_all_dataset_names()
print(names)
Loading 1 LiPD files
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  2.10it/s]
100%|██████████| 1/1 [00:00<00:00,  2.10it/s]
Loaded..
['Ocn-Palmyra.Nurhati.2011', 'CO06MOPE']

You can also create the object directly:

data = ['../data/Ocn-Palmyra.Nurhati.2011.lpd', 'https://lipdverse.org/data/iso2k100_CO06MOPE/1_0_2//CO06MOPE.lpd']

D3=LiPD()
D3.load(data)

names = D3.get_all_dataset_names()
print(names)
Loading 2 LiPD files
  0%|          | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00,  6.47it/s]
100%|██████████| 2/2 [00:00<00:00,  6.46it/s]
Loaded..
['Ocn-Palmyra.Nurhati.2011', 'CO06MOPE']

Loading from a directory#

Let’s load some of the datasets contained in the Euro2k database:

path = '../data/Pages2k/'

D_dir = LiPD()
D_dir.load_from_dir(path)
Loading 16 LiPD files
  0%|          | 0/16 [00:00<?, ?it/s]
 38%|███▊      | 6/16 [00:00<00:00, 52.09it/s]
 75%|███████▌  | 12/16 [00:00<00:00, 39.37it/s]
100%|██████████| 16/16 [00:00<00:00, 42.62it/s]
Loaded..

names = D_dir.get_all_dataset_names()
print(names)
['Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-FeniDrift.Richter.2009', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014']

You can still load single files using the method described above and append them:

D_dir.load(data_url)

names = D_dir.get_all_dataset_names()
print(names)
Loading 1 LiPD files
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  3.36it/s]
100%|██████████| 1/1 [00:00<00:00,  3.35it/s]
Loaded..
['Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-FeniDrift.Richter.2009', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014', 'CO06MOPE']

Loading from the remote LipdGraph database#

Files stored on the LiPDverse are also available in a graph database, which supports complex querying through the SPARQL query language. PyLiPD essentially wraps these complex queries into Python calls to facilitate the manipulation of the datasets.

To load a file from the remote database, all you need to know is the dataset name:

lipd_remote = LiPD()
lipd_remote.set_endpoint("https://linkedearth.graphdb.mint.isi.edu/repositories/LiPDVerse-dynamic")
lipd_remote.load_remote_datasets(["Ocn-MadangLagoonPapuaNewGuinea.Kuhnert.2001", "MD98_2181.Stott.2007", "Ant-WAIS-Divide.Severinghaus.2012"])

print(lipd_remote.get_all_dataset_names())
Caching datasets from remote endpoint..
Making remote query to endpoint: https://linkedearth.graphdb.mint.isi.edu/repositories/LiPDVerse-dynamic
Done..
['Ocn-MadangLagoonPapuaNewGuinea.Kuhnert.2001', 'MD98_2181.Stott.2007', 'Ant-WAIS-Divide.Severinghaus.2012']

Loading in parallel#

If you plan on loading mulitple LiPD files (hundreds to thousands), you may want to do so in parallel. If you choose to do so, you need to use the if __name__ == "__main__" notation:

if __name__ == "__main__" :
    D_parallel = LiPD()
    D_parallel.load_from_dir(path, parallel=True)
Loading 16 LiPD files
  0%|          | 0/16 [00:00<?, ?it/s]
 56%|█████▋    | 9/16 [00:00<00:00, 73.93it/s]
100%|██████████| 16/16 [00:00<00:00, 93.17it/s]
Loaded..

After the intial loading, you can resume using your object directly:

print(D_parallel.get_all_dataset_names())
['Eur-LakeSilvaplana.Trachsel.2010', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ant-WAIS-Divide.Severinghaus.2012', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-FeniDrift.Richter.2009', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014']
Note: Once datasets are loaded into the object with one of the methods described above, you can always append more using a different method.

Merging LiPD objects#

In the course of your work, you may need to merge two LiPD objects together. Let’s merge D into D_parallel:

D_merged = D_parallel.merge(D)

print(D_merged.get_all_dataset_names())
['Eur-LakeSilvaplana.Trachsel.2010', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ant-WAIS-Divide.Severinghaus.2012', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-FeniDrift.Richter.2009', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-RedSea.Felis.2000', 'Eur-FinnishLakelands.Helama.2014', 'CO06MOPE', 'Ocn-Palmyra.Nurhati.2011']