Working with ensembles in PyLiPD
#
Preamble#
Ensembles are key to uncertainty quantification, and are a main reason for why the LiPD format was created. LiPD stores tables of uncertainty ensembles (in particular, age), which PyLiPD
can efficiently load for analysis. This notebook describes how PyLiPD
handles such age ensembles.
Goals#
Reading an ensemble from a LiPD object
Reading Time: 5 minutes
Keywords#
LiPD, age uncertainty, age ensembles
Pre-requisites#
None. This tutorial assumes basic knowledge of Python and Pandas. If you are not familiar with this coding language and this particular library, check out this tutorial: http://linked.earth/ec_workshops_py/.
Relevant Packages#
pylipd
Data Description#
This notebook uses the following datasets, in LiPD format:
McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862
Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.
Demonstration#
Extracting ensemble information#
from pylipd.lipd import LiPD
D = LiPD()
data_path = ['../data/Crystal.McCabe-Glynn.2013.lpd', '../data/ODP846.Lawrence.2006.lpd']
D.load(data_path)
Loading 2 LiPD files
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:00<00:00, 2.80it/s]
100%|██████████| 2/2 [00:01<00:00, 1.45it/s]
100%|██████████| 2/2 [00:01<00:00, 1.57it/s]
Loaded..
names = D.get_all_dataset_names()
print(names)
['Crystal.McCabe-Glynn.2013', 'ODP846.Lawrence.2006']
To load the ensemble tables for all the files:
%time
df = D.get_ensemble_tables()
df
CPU times: user 2 μs, sys: 0 ns, total: 2 μs
Wall time: 4.29 μs
datasetName | ensembleTable | ensembleVariableName | ensembleVariableValues | ensembleVariableUnits | ensembleDepthName | ensembleDepthValues | ensembleDepthUnits | notes | |
---|---|---|---|---|---|---|---|---|---|
0 | Crystal.McCabe-Glynn.2013 | http://linked.earth/lipd/Crystal.McCabe-Glynn.... | Year | [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... | yr AD | depth | [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... | mm | None |
1 | ODP846.Lawrence.2006 | http://linked.earth/lipd/chron0model0ensemble0 | age | [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,... | kyr BP | depth | [0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8... | m | None |
The dataframes return the following information:
datasetName
: The name of the datasetensembleTable
: The ensemble tables associated with the dataset. If more than one ensembleTable is available for the record, then each table will be contained on a different rowensembleVariableName
: The name of the ensemble variable. Most likely, it will be a variant of ‘age’ or ‘year’ensembleVariableValues
: The values on the ensemblesensembleVariableUnits
: The units associated with the time variableensembleDepthName
: The name of the depth vectorensembleDepthValues
: The values for the depth axis. This is particularly useful when matching a ensemble table to a particular variableensembleDepthUnits
: The units for the depth.notes
: Notes regarding how the model was obtained/done.
If interested in only one dataset (see the warning):
df = D.get_ensemble_tables(dsname=names[0])
df
datasetName | ensembleTable | ensembleVariableName | ensembleVariableValues | ensembleVariableUnits | ensembleDepthName | ensembleDepthValues | ensembleDepthUnits | notes | |
---|---|---|---|---|---|---|---|---|---|
0 | Crystal.McCabe-Glynn.2013 | http://linked.earth/lipd/Crystal.McCabe-Glynn.... | Year | [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... | yr AD | depth | [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... | mm | None |
If you know the variable name:
df = D.get_ensemble_tables(ensembleVarName='age')
df
datasetName | ensembleTable | ensembleVariableName | ensembleVariableValues | ensembleVariableUnits | ensembleDepthName | ensembleDepthValues | ensembleDepthUnits | notes | methodobj | methods | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | ODP846.Lawrence.2006 | http://linked.earth/lipd/chron0model0ensemble0 | age | [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,... | kyr BP | depth | [0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8... | m | None | None | None |
Working with the PaleoData#
For this part of the demo, let’s work with only the Crystal Cave record. We can pass the name of the dataset directly to the function through the dsname
parameter. In this case, the name is the first entry (index 0, remember that Python uses zero-index) in the names
list that we obtained previously:
df = D.get_ensemble_tables(dsname=names[0])
df
datasetName | ensembleTable | ensembleVariableName | ensembleVariableValues | ensembleVariableUnits | ensembleDepthName | ensembleDepthValues | ensembleDepthUnits | notes | |
---|---|---|---|---|---|---|---|---|---|
0 | Crystal.McCabe-Glynn.2013 | http://linked.earth/lipd/Crystal.McCabe-Glynn.... | Year | [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... | yr AD | depth | [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... | mm | None |
df_ts = D.get_timeseries_essentials(dsnames=names[0])
df_ts
dataSetName | archiveType | geo_meanLat | geo_meanLon | geo_meanElev | paleoData_variableName | paleoData_values | paleoData_units | paleoData_proxy | paleoData_proxyGeneral | time_variableName | time_values | time_units | depth_variableName | depth_values | depth_units | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Crystal.McCabe-Glynn.2013 | Speleothem | 36.59 | -118.82 | 1386.0 | d18o | [-8.01, -8.23, -8.61, -8.54, -8.6, -9.08, -8.9... | permil | None | None | age | [2007.7, 2007.0, 2006.3, 2005.6, 2004.9, 2004.... | yr AD | depth | [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0... | mm |
To learn how to use these two tables to create a Pyleoclim EnsembleSeries
object and use it for analysis and visualization, see this tutorial