The Dataset class#

Authors#

Deborah Khider

Preamble#

The next sets of tutorials go through editing and creating LiPD files from Python. Before we delve into the details on how to do so, it is good to remind ourselves of two important facts:

  1. PyLiPD uses the object-oriented programming (OOP). In OOP, object contains the data, associated parameters (e.g., metadata) for the object and code that represents procedures that are applicable to each object. So far, we have seen two objects the LiPD object and the LiPDSeries object. Both of these objects contain a graph that follows an ontology.

  2. The LinkedEarth Ontolgy describes paleoclimate datasets and was created from the LiPD format. Ontologies list the types of objects, called classes (e.g., Dataset, Publication, Variable), the relationship that connects them (e.g., Dataset publishedIn Publication), and constraints on the ways that classes and relationships can be combined. Here is a snipet of the LinkedEarth Ontology:

image.png

As you can see, the top class is the Dataset class.

Why is this information relevant now?#

At first glance, OOP and ontologies serve two different purposes. However, they function is a very similar fashion: a class (or object) that can be manipulated through its properties (or methods). We used this resemblance to help us create editing functions for PyLiPD. In short, each class in the ontology was made into an object into PyLiPD and each property was given functionalities to read/write/edit the property value.

This is how the Dataset object was created and your entry point to editing LiPD files.

How is Dataset different from the LiPD class?#

At first glance, the two classes are very similar as they contained the data and metadata for paleoclimate datasets as a graph. However, the functions attached to the Dataset class are meant for editing while those associated with the LiPD class are meant for querying and manipulation. Separating the two also ensure that files are not overwritten by mistake.

However, if you prefer to use the Python APIs for each property to loop over various files, the Dataset class may be more useful to you. This option requires knowledge of the ontology and the LiPD stucture.

Goals#

  • Create a Dataset class from an existing file

  • Retrieve information from the file

Reading Time: 5 minutes

Keywords#

LiPD, LinkedEarth Ontology, Object-Oriented Programming

Pre-requisites#

An understanding of OOP and the LinkedEarth Ontology:

  • The Linked Earth Core Ontology provides the main concepts and relationships to describe a paleoclimate dataset and its values.

  • The Archive Type Ontology describes a taxonomy of the most common types of archives (e.g., Coral, Glacier Ice).

  • The Paleo Variables Ontology describes a taxonomy of the most common types of paleo variables.

  • The Paleo Proxy Ontology describes a taxonomy of the most common types of paleo proxies.

  • The Paleo Units Ontology describes a taxonomy of the most common types of paleo units.

  • The Interpretation Ontology describes a taxonomy of the most common interpretations.

  • The Instrument Ontology describes a taxonomy of the most common instrument for taking measurements.

  • The Chron Variables Ontology describes a taxonomy of the most common types of chron variables. Under Construction.

  • The Chron Proxy Ontology describes a taxonomy of the most common types of chron proxies. Under Construction.

  • The Chron Units Ontology describes a taxonomy of the most common types of chron units. Under Construction.

Relevant Packages#

pylipd

Data Description#

This notebook uses the following datasets, in LiPD format:

  • Nurhati, I. S., Cobb, K. M., & Di Lorenzo, E. (2011). Decadal-scale SST and salinity variations in the central tropical Pacific: Signatures of natural and anthropogenic climate change. Journal of Climate, 24(13), 3294–3308. doi:10.1175/2011jcli3852.1

  • Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.

Demonstration#

Let’s import the LiPD and Dataset classes:

from pylipd.classes.dataset import Dataset
from pylipd.lipd import LiPD

# Pandas for data
import pandas as pd
import pyleoclim as pyleo
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 6
      4 # Pandas for data
      5 import pandas as pd
----> 6 import pyleoclim as pyleo

ModuleNotFoundError: No module named 'pyleoclim'

For the purpose of this demonstration, let’s open the dataset from Nurhati et al. (2011).

D = LiPD()

data_path = '../data/Ocn-Palmyra.Nurhati.2011.lpd'
D.load(data_path)
Loading 1 LiPD files
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 35.86it/s]
Loaded..

Creating a Dataset object#

Convert to a Dataset object using the get_datasets method. By default, This method returns a list. Since we have only one dataset, we select the first item:

ds = D.get_datasets()[0]

Obtaining information about the Dataset#

Get Dataset level information#

You can now access information about the dataset using Python APIs. The methods to do so are named as get+ the name of the class/property. For instance, to get the name of the dataset, you should use the getName function:

name = ds.getName()
name
'Ocn-Palmyra.Nurhati.2011'

In doubt, consult the documentation for the various objects. The documentation is light on these since they were created directly from the ontology.

Location#

Remember that data properties such as Name have a range of string, float, integer and represent the leaf in the graph. However, you may need to dig into the graph to obtain that answer. For instance, let’s have a look at the geographical coordinates for the site:

geo = ds.getLocation()
type(geo)
pylipd.classes.location.Location

As you can see, the function returns another object called Location with its own functions that corresponds to the properties attached to the Location class in the LinkedEarth Ontology. Let’s get the latitute and longitude information:

lat = geo.getLatitude()
lon = geo.getLongitude()

coord = [lon, lat]
print(coord)
[-162.13, 5.87]

Data#

Let’s access the data contained in PaleoData and load them in a Pandas DataFrame:

data_tables = []

for paleoData in ds.getPaleoData(): # loop over the various PaleoData objects
    for table in paleoData.getMeasurementTables(): #get the measurement tables
        df = table.getDataFrame(use_standard_names=True) # grab the data and standardize the variable names
        data_tables.append(df)
print("There are", len(data_tables), " tables in the dataset")
There are 2  tables in the dataset
data1 = data_tables[0]
data1.head()
d18O year
0 0.39 1998.21
1 0.35 1998.13
2 0.35 1998.04
3 0.35 1997.96
4 0.36 1997.88

Note that the basic information about the variables are stored in the attributes of the DataFrame. The dictionary key for each of the variable corresponds to the data hearder.

data_tables[0].attrs
{'d18O': {'@id': 'http://linked.earth/lipd/Ocn-Palmyra.Nurhati.2011.paleo2.measurementTable1.Ocean2kHR_162.d18O',
  'archiveType': 'Coral',
  'number': 1,
  'hasMaxValue': 1.26,
  'hasMeanValue': 0.7670059435,
  'hasMedianValue': 0.78,
  'hasMinValue': 0.07,
  'missingValue': 'NaN',
  'variableName': 'd18O',
  'notes': 'd18Osw (residuals calculated from coupled SrCa and d18O measurements)',
  'proxy': 'd18O',
  'resolution': {'@id': 'http://linked.earth/lipd/Ocn-Palmyra.Nurhati.2011.paleo2.measurementTable1.Ocean2kHR_162.d18O.Resolution',
   'hasMaxValue': 0.09,
   'hasMeanValue': 0.08333085502,
   'hasMedianValue': 0.08,
   'hasMinValue': 0.08,
   'units': 'yr AD'},
  'hasStandardVariable': 'd18O',
  'units': 'permil',
  'TSid': 'Ocean2kHR_162',
  'variableType': 'measured',
  'proxyObservationType': 'd18O',
  'measurementTableMD5': '3d028342178e079acb4366bfedf54a77',
  'sensorSpecies': 'lutea',
  'useInGlobalTemperatureAnalysis': False,
  'wDSPaleoUrl': 'https://www1.ncdc.noaa.gov/pub/data/paleo/pages2k/pages2k-temperature-v2-2017/data-version-2.0.0/Ocn-Palmyra.Nurhati.2011-2.txt',
  'sensorGenus': 'Porites'},
 'year': {'@id': 'http://linked.earth/lipd/Ocn-Palmyra.Nurhati.2011.paleo2.measurementTable1.PYTEBCDC4GO.year',
  'archiveType': 'Coral',
  'number': 2,
  'description': 'Year AD',
  'hasMaxValue': 1998.21,
  'hasMeanValue': 1942.168336,
  'hasMedianValue': 1942.17,
  'hasMinValue': 1886.13,
  'missingValue': 'NaN',
  'variableName': 'year',
  'resolution': {'@id': 'http://linked.earth/lipd/Ocn-Palmyra.Nurhati.2011.paleo2.measurementTable1.PYTEBCDC4GO.year.Resolution',
   'hasMaxValue': 0.09,
   'hasMeanValue': 0.08333085502,
   'hasMedianValue': 0.08,
   'hasMinValue': 0.08,
   'units': 'yr AD'},
  'hasStandardVariable': 'year',
  'units': 'yr AD',
  'TSid': 'PYTEBCDC4GO',
  'variableType': 'inferred',
  'wDSPaleoUrl': 'https://www1.ncdc.noaa.gov/pub/data/paleo/pages2k/pages2k-temperature-v2-2017/data-version-2.0.0/Ocn-Palmyra.Nurhati.2011-2.txt',
  'inferredVariableType': 'Year',
  'measurementTableMD5': '3d028342178e079acb4366bfedf54a77',
  'dataType': 'float'}}

You can use Pyleoclim to plot the data and conduct further analyses:

ts = pyleo.Series(time = data1['year'], value = data1['d18O'],
                 time_name = 'year', time_unit = 'CE',
                 value_name = 'd18O', value_unit = 'per mil')

ts.plot()
Time axis values sorted in ascending order
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Time [years CE]', ylabel='d18O [per mil]'>)
../_images/b830ac0fc86dbaeedf66f836f0f9964b7b2aeed6dd0a119f25fb6647a1449243.png
You can use the attributes associated with each DataFrame to loop over several tables and create `pyleoclim.Series` objects.

Understanding the relationships among classes#

As mentioned, the classes and methods present in the pylipd.classes module are derived from the LinkedEarth Ontology. You can alwasy refer to it when in doubt. We also provide a handy diagram here illustrating the relationship:

image

Working with age ensembles#

For the purpose of this demonstration, let’s load the record from Lawrence et al. (2006), which contains an ensemble table, and return it into a DataSet object:

lipd = LiPD()
lipd.load('../data/ODP846.Lawrence.2006.lpd')

ds_ens = lipd.get_datasets()[0]
Loading 1 LiPD files
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  1.36it/s]
Loaded..

Now let’s grab the ensemble data information:

df_ens = [] # create an empty list to store all ensemble tables across models. 

for cd in ds_ens.getChronData():
    for model in cd.getModeledBy():
        for etable in model.getEnsembleTables():
            df_ens.append(etable.getDataFrame())

Let’s have a look at the resulting DataFrame:

df_ens[0].head()
depth age
0 0.12 [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, ...
1 0.23 [9.03, 8.64, 8.64, 10.58, 7.09, 10.58, 6.71, 9...
2 0.33 [11.74, 11.35, 10.96, 12.12, 10.58, 11.74, 11....
3 0.43 [13.28, 13.28, 12.51, 13.67, 12.9, 13.28, 13.2...
4 0.53 [14.83, 14.83, 15.22, 15.6, 14.83, 15.6, 14.83...

As will be the case with all EnsembleTable, the resulting DataFrame contains two columns: (1) depth and (2) age. The possible values for each depth is then stored as a numpy vector.

The DataFrame also contains relevant metadata information stored as attributes:

df_ens[0].attrs
{'depth': {'@id': 'http://linked.earth/lipd/chron0model0ensemble0.PYTGOFY4KZD.depth',
  'number': 1,
  'variableName': 'depth',
  'hasStandardVariable': 'depth',
  'units': 'm',
  'TSid': 'PYTGOFY4KZD'},
 'age': {'@id': 'http://linked.earth/lipd/chron0model0ensemble0.PYTUHE3XLGQ.age',
  'number': '[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001]',
  'variableName': 'age',
  'hasStandardVariable': 'age',
  'TSid': 'PYTUHE3XLGQ'}}