Working with ensembles in PyLiPD#

Authors#

Deborah Khider

Preamble#

Ensembles are key to uncertainty quantification, and are a main reason for why the LiPD format was created. LiPD stores tables of uncertainty ensembles (in particular, age), which PyLiPD can efficiently load for analysis. This notebook describes how PyLiPD handles such age ensembles.

Goals#

  • Reading an ensemble from a LiPD object

Reading Time: 5 minutes

Keywords#

LiPD, age uncertainty, age ensembles

Pre-requisites#

None. This tutorial assumes basic knowledge of Python and Pandas. If you are not familiar with this coding language and this particular library, check out this tutorial: http://linked.earth/ec_workshops_py/.

Relevant Packages#

pylipd

Data Description#

This notebook uses the following datasets, in LiPD format:

  • McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862

  • Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.

Demonstration#

Extracting ensemble information#

from pylipd.lipd import LiPD
D = LiPD()
data_path = ['../data/Crystal.McCabe-Glynn.2013.lpd', '../data/ODP846.Lawrence.2006.lpd']
D.load(data_path)
Loading 2 LiPD files
  0%|          | 0/2 [00:00<?, ?it/s]
 50%|█████     | 1/2 [00:00<00:00,  2.72it/s]
100%|██████████| 2/2 [00:01<00:00,  1.18it/s]
100%|██████████| 2/2 [00:01<00:00,  1.29it/s]
Loaded..

names = D.get_all_dataset_names()
print(names)
['Crystal.McCabe-Glynn.2013', 'ODP846.Lawrence.2006']

To load the ensemble tables for all the files:

This may take a few minutes since we need to load matrices containing 1000 columns and a few hundred rows into memory. Although it is possible to load all the ensemble tables present in all the datasets, we strongly suggest against it (you will run out of memory). Instead, open dataset by dataset as we will demonstrate in this notebook.
%time
df = D.get_ensemble_tables()

df
CPU times: user 2 μs, sys: 0 ns, total: 2 μs
Wall time: 4.53 μs
datasetName ensembleTable ensembleVariableName ensembleVariableValues ensembleVariableUnits ensembleDepthName ensembleDepthValues ensembleDepthUnits notes
0 Crystal.McCabe-Glynn.2013 http://linked.earth/lipd/Crystal.McCabe-Glynn.... Year [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... yr AD depth [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... mm None
1 ODP846.Lawrence.2006 http://linked.earth/lipd/chron0model0ensemble0 age [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,... kyr BP depth [0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8... m None

The dataframes return the following information:

  • datasetName: The name of the dataset

  • ensembleTable: The ensemble tables associated with the dataset. If more than one ensembleTable is available for the record, then each table will be contained on a different row

  • ensembleVariableName: The name of the ensemble variable. Most likely, it will be a variant of ‘age’ or ‘year’

  • ensembleVariableValues: The values on the ensembles

  • ensembleVariableUnits: The units associated with the time variable

  • ensembleDepthName: The name of the depth vector

  • ensembleDepthValues: The values for the depth axis. This is particularly useful when matching a ensemble table to a particular variable

  • ensembleDepthUnits: The units for the depth.

  • notes: Notes regarding how the model was obtained/done.

If interested in only one dataset (see the warning):

df = D.get_ensemble_tables(dsname=names[0])

df
datasetName ensembleTable ensembleVariableName ensembleVariableValues ensembleVariableUnits ensembleDepthName ensembleDepthValues ensembleDepthUnits notes
0 Crystal.McCabe-Glynn.2013 http://linked.earth/lipd/Crystal.McCabe-Glynn.... Year [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... yr AD depth [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... mm None

If you know the variable name:

df = D.get_ensemble_tables(ensembleVarName='age')

df
datasetName ensembleTable ensembleVariableName ensembleVariableValues ensembleVariableUnits ensembleDepthName ensembleDepthValues ensembleDepthUnits notes methodobj methods
0 ODP846.Lawrence.2006 http://linked.earth/lipd/chron0model0ensemble0 age [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,... kyr BP depth [0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8... m None None None

Working with the PaleoData#

For this part of the demo, let’s work with only the Crystal Cave record. We can pass the name of the dataset directly to the function through the dsname parameter. In this case, the name is the first entry (index 0, remember that Python uses zero-index) in the names list that we obtained previously:

df = D.get_ensemble_tables(dsname=names[0])
df
datasetName ensembleTable ensembleVariableName ensembleVariableValues ensembleVariableUnits ensembleDepthName ensembleDepthValues ensembleDepthUnits notes
0 Crystal.McCabe-Glynn.2013 http://linked.earth/lipd/Crystal.McCabe-Glynn.... Year [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... yr AD depth [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... mm None
df_ts = D.get_timeseries_essentials(dsnames=names[0])

df_ts
dataSetName archiveType geo_meanLat geo_meanLon geo_meanElev paleoData_variableName paleoData_values paleoData_units paleoData_proxy paleoData_proxyGeneral time_variableName time_values time_units depth_variableName depth_values depth_units
0 Crystal.McCabe-Glynn.2013 Speleothem 36.59 -118.82 1386.0 d18o [-8.01, -8.23, -8.61, -8.54, -8.6, -9.08, -8.9... permil None None age [2007.7, 2007.0, 2006.3, 2005.6, 2004.9, 2004.... yr AD depth [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0... mm

To learn how to use these two tables to create a Pyleoclim EnsembleSeries object and use it for analysis and visualization, see this tutorial