Working with ensembles in PyLiPD

Working with ensembles in `PyLiPD`#

Authors#

Preamble#

Ensembles are key to uncertainty quantification, and are a main reason for why the LiPD format was created. LiPD stores tables of uncertainty ensembles (in particular, age), which PyLiPD can efficiently load for analysis. This notebook describes how PyLiPD handles such age ensembles.

Goals#

Reading an ensemble from a LiPD object

Reading Time: 5 minutes

Keywords#

LiPD, age uncertainty, age ensembles

Pre-requisites#

None. This tutorial assumes basic knowledge of Python and Pandas. If you are not familiar with this coding language and this particular library, check out this tutorial: http://linked.earth/ec_workshops_py/.

Relevant Packages#

pylipd

Data Description#

This notebook uses the following datasets, in LiPD format:

McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862
Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.

Demonstration#

Extracting ensemble information#

from pylipd.lipd import LiPD

D = LiPD()
data_path = ['../data/Crystal.McCabe-Glynn.2013.lpd', '../data/ODP846.Lawrence.2006.lpd']
D.load(data_path)

Loading 2 LiPD files

  0%|          | 0/2 [00:00<?, ?it/s]

 50%|█████     | 1/2 [00:00<00:00,  2.72it/s]

100%|██████████| 2/2 [00:01<00:00,  1.18it/s]

100%|██████████| 2/2 [00:01<00:00,  1.29it/s]

Loaded..

names = D.get_all_dataset_names()
print(names)

['Crystal.McCabe-Glynn.2013', 'ODP846.Lawrence.2006']

To load the ensemble tables for all the files:

This may take a few minutes since we need to load matrices containing 1000 columns and a few hundred rows into memory. Although it is possible to load all the ensemble tables present in all the datasets, we strongly suggest against it (you will run out of memory). Instead, open dataset by dataset as we will demonstrate in this notebook.

%time
df = D.get_ensemble_tables()

df

CPU times: user 2 μs, sys: 0 ns, total: 2 μs
Wall time: 4.53 μs

	datasetName	ensembleTable	ensembleVariableName	ensembleVariableValues	ensembleVariableUnits	ensembleDepthName	ensembleDepthValues	ensembleDepthUnits	notes
0	Crystal.McCabe-Glynn.2013	http://linked.earth/lipd/Crystal.McCabe-Glynn....	Year	[[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007...	yr AD	depth	[0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0...	mm	None
1	ODP846.Lawrence.2006	http://linked.earth/lipd/chron0model0ensemble0	age	[[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,...	kyr BP	depth	[0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8...	m	None

The dataframes return the following information:

datasetName: The name of the dataset
ensembleTable: The ensemble tables associated with the dataset. If more than one ensembleTable is available for the record, then each table will be contained on a different row
ensembleVariableName: The name of the ensemble variable. Most likely, it will be a variant of ‘age’ or ‘year’
ensembleVariableValues: The values on the ensembles
ensembleVariableUnits: The units associated with the time variable
ensembleDepthName: The name of the depth vector
ensembleDepthValues: The values for the depth axis. This is particularly useful when matching a ensemble table to a particular variable
ensembleDepthUnits: The units for the depth.
notes: Notes regarding how the model was obtained/done.

If interested in only one dataset (see the warning):

df = D.get_ensemble_tables(dsname=names[0])

df

	datasetName	ensembleTable	ensembleVariableName	ensembleVariableValues	ensembleVariableUnits	ensembleDepthName	ensembleDepthValues	ensembleDepthUnits	notes
0	Crystal.McCabe-Glynn.2013	http://linked.earth/lipd/Crystal.McCabe-Glynn....	Year	[[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007...	yr AD	depth	[0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0...	mm	None

If you know the variable name:

df = D.get_ensemble_tables(ensembleVarName='age')

df

	datasetName	ensembleTable	ensembleVariableName	ensembleVariableValues	ensembleVariableUnits	ensembleDepthName	ensembleDepthValues	ensembleDepthUnits	notes	methodobj	methods
0	ODP846.Lawrence.2006	http://linked.earth/lipd/chron0model0ensemble0	age	[[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,...	kyr BP	depth	[0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8...	m	None	None	None

Working with the PaleoData#

For this part of the demo, let’s work with only the Crystal Cave record. We can pass the name of the dataset directly to the function through the dsname parameter. In this case, the name is the first entry (index 0, remember that Python uses zero-index) in the names list that we obtained previously:

df = D.get_ensemble_tables(dsname=names[0])
df

	datasetName	ensembleTable	ensembleVariableName	ensembleVariableValues	ensembleVariableUnits	ensembleDepthName	ensembleDepthValues	ensembleDepthUnits	notes
0	Crystal.McCabe-Glynn.2013	http://linked.earth/lipd/Crystal.McCabe-Glynn....	Year	[[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007...	yr AD	depth	[0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0...	mm	None

df_ts = D.get_timeseries_essentials(dsnames=names[0])

df_ts

	dataSetName	archiveType	geo_meanLat	geo_meanLon	geo_meanElev	paleoData_variableName	paleoData_values	paleoData_units	paleoData_proxy	paleoData_proxyGeneral	time_variableName	time_values	time_units	depth_variableName	depth_values	depth_units
0	Crystal.McCabe-Glynn.2013	Speleothem	36.59	-118.82	1386.0	d18o	[-8.01, -8.23, -8.61, -8.54, -8.6, -9.08, -8.9...	permil	None	None	age	[2007.7, 2007.0, 2006.3, 2005.6, 2004.9, 2004....	yr AD	depth	[0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0...	mm

To learn how to use these two tables to create a Pyleoclim EnsembleSeries object and use it for analysis and visualization, see this tutorial