Custom queries on LiPD objects

Custom queries on LiPD objects#

Authors#

Preamble#

PyLiPD is a Python package that allows you to read, manipulate, and write LiPD formatted datasets. While PyLiPD comes pre-loaded with certain functions to filter datasets and extract information (see other tutorials), in some cases one might want to design more specific queries. In this tutorial, we will demonstrate how to generate such custom queries using SPARQL.

Goals#

Learn how to build SPAQRL queries

Reading Time: 15 minutes

Keywords#

LiPD, SPARQL

Pre-requisites#

Understand the basics of RDF and SPAQRL
Loading LiPD objects
Getting information from LiPD objects
Filtering by certain criteria

Relevant Packages#

pylipd

Data Description#

This notebook uses the following datasets, in LiPD format:

McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862
Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.
PAGES2k Consortium., Emile-Geay, J., McKay, N. et al. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). doi:10.1038/sdata.2017.88

Demonstration#

Understanding the LiPD object#

Let’s start by importing our favorite package and load our datasets.

from pylipd.lipd import LiPD

path = '../data/Pages2k/'

D = LiPD()
D.load_from_dir(path)

Loading 16 LiPD files

  0%|          | 0/16 [00:00<?, ?it/s]

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:46:16 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

 31%|███▏      | 5/16 [00:00<00:00, 41.90it/s]

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:46:49 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

 62%|██████▎   | 10/16 [00:00<00:00, 44.53it/s]

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:47:20 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:46:12 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

 94%|█████████▍| 15/16 [00:00<00:00, 34.60it/s]

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

100%|██████████| 16/16 [00:00<00:00, 37.06it/s]

Loaded..

By now, you should be familiar with our load functionalities. But you need to understand a little bit about what’s happening under the hood to truly appreciate this tutorial and why everything will function the way it does.

The first step is to expand the dataset stored in LiPD format. Remember that LiPD essentially consists of data tables stored in csv format and a JSON-LD file that contains the metadata.
The second step is to map the JSON file to RDF using our ontology as the schema.
The third (and unique) step is to load the data contained into the csv files into the graph directly. This is quite unique to LinkedEarth. If you work with other knowledge bases, you’ll find that most of the time, the actual data are stored in another format that you will need the learn to handle in conjunction with the metadata.

Also, each dataset gets its own graph! So you can think of the LiPD object as a collection of graphs, each of which represent a particular dataset.

Let’s have a close look at our object, shall we?

D.copy().__dict__

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:46:16 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:46:49 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:47:20 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:46:12 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#date, Converter=<function parse_xsd_date at 0x7f2af88416c0>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/rdflib/xsd_datetime.py", line 593, in parse_xsd_date
    return parse_date(date_string if not minus else ("-" + date_string))
  File "/usr/share/miniconda/envs/pylipd/lib/python3.10/site-packages/isodate/isodates.py", line 193, in parse_date
    raise ISO8601Error("Unrecognised ISO 8601 date format: %r" % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '2022-08-23 23:41:56 U'

{'graph': <Graph identifier=N36a23acd10514ab7b688dfb0a0ddfebf (<class 'rdflib.graph.ConjunctiveGraph'>)>}

In particular, our graph object is a ConjuctiveGraph from the rdflib library. Why should you care?

If you isolate the graph, then you can use the rdflib package directly. Since this is a well-maintained community package with a larger user base than PyLiPD, chances are that stackoverflow or their documentation will provide you with some answers on how to work with your graph.
why a ConjuctiveGraph? We use this particular type because although each dataset is stored into its own graph, we often want to query across datasets (i.e., across graphs) and this particular object in rdflib allows us to do so.

Let’s try to create our own queries from this.

Constructing a query#

Let’s return to the query described in this primer in which we try to gather all the dataset names. Let’s modify it to also return the variable ds so you can see the difference between the object and the name:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?ds ?dsname WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .}

"""

Notice that this is just a string containing the query in SPARQL language. Now, let’s run it:

res_sparql, res_df = D.query(query)

Which returns two objects:

res_sparql, which is a SPARQLResult object from rdflib
res_df, which processes the SPARQLResult into a DataFrame, which is often what you will be interested to look at. Let’s have a look at it:

res_df.head()

	ds	dsname
0	http://linked.earth/lipd/Ocn-RedSea.Felis.2000	Ocn-RedSea.Felis.2000
1	http://linked.earth/lipd/Eur-CoastofPortugal.A...	Eur-CoastofPortugal.Abrantes.2011
2	http://linked.earth/lipd/Eur-LakeSilvaplana.Tr...	Eur-LakeSilvaplana.Trachsel.2010
3	http://linked.earth/lipd/Arc-Kongressvatnet.D_...	Arc-Kongressvatnet.D'Andrea.2012
4	http://linked.earth/lipd/Ocn-AlboranSea436B.Ni...	Ocn-AlboranSea436B.Nieto-Moreno.2013

First, notice that the columns of the dataframe corresponds to the names of the variables that you have specified in the SPARQL query. You can choose any name that you want as long as you use a question mark at the beginning to indicate to SPARQL that you are questioning the database for the answer. Second, notice the difference between ds and dsname. ds is the dataset object and the query returns the URI of that object. dsname is a string, which contains the names of the dataset. In general, we form our URIs with the names of the objects so don’t be surprised if you sometimes end up with the URI instead of the name (we all sometimes forget to ask for the name).

One thing to consider is the speed of query. If you’re doing exploratory work with SPARQL, you may wish to only return a few results to see if you’re doing this right. In this case, the number of datasets is quite small:

len(res_df.index)

But if you have thousands of rows in that dataframe, you might not want to wait for the query to perform as this may take a few minutes. If you only want to return the first 10 rows in the query, just add the limit to the query:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?ds ?dsname WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .}

LIMIT 10

"""

res, res_df = D.query(query)

len(res_df.index)

Please note that the number of rows doesn’t correspond to the number of datasets. For instance, if a dataset contains several variables (as most do) and I ask for each variable in a dataset, then each variable will be contained in an individual row.

Optional variables#

Let’s pop out one the Pages2k dataset into a new LiPD object, D2 and add the ODP846 dataset for the purpose of this demonstration. Why ODP846? The file was created before LiPD had DatasetIDs, therefore I know this dataset doesn’t have one.

D.get_all_dataset_names()

['Ocn-RedSea.Felis.2000',
 'Eur-CoastofPortugal.Abrantes.2011',
 'Eur-LakeSilvaplana.Trachsel.2010',
 'Arc-Kongressvatnet.D_Andrea.2012',
 'Ocn-AlboranSea436B.Nieto-Moreno.2013',
 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000',
 'Eur-Stockholm.Leijonhufvud.2009',
 'Eur-NorthernScandinavia.Esper.2012',
 'Eur-SpannagelCave.Mangini.2005',
 'Eur-NorthernSpain.Martin-Chivelet.2011',
 'Asi-SourthAndMiddleUrals.Demezhko.2007',
 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006',
 'Ocn-FeniDrift.Richter.2009',
 'Eur-SpanishPyrenees.Dorado-Linan.2012',
 'Ant-WAIS-Divide.Severinghaus.2012',
 'Eur-FinnishLakelands.Helama.2014']

D2 = D.pop('Ocn-RedSea.Felis.2000')

D2.load('../data/ODP846.Lawrence.2006.lpd')
D2.get_all_dataset_names()

Loading 1 LiPD files

  0%|          | 0/1 [00:00<?, ?it/s]

100%|██████████| 1/1 [00:01<00:00,  1.20s/it]

100%|██████████| 1/1 [00:01<00:00,  1.20s/it]

Loaded..

['Ocn-RedSea.Felis.2000', 'ODP846.Lawrence.2006']

Let’s perform a similar query, but asking for the datasetID in addition to returning the dataset name:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?ds ?dsname ?datasetID WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .
?ds le:hasDatasetId ?datasetID}

"""

res, res_df = D2.query(query)

res_df

	ds	dsname	datasetID
0	http://linked.earth/lipd/Ocn-RedSea.Felis.2000	Ocn-RedSea.Felis.2000	4fZQAHmeuJn8ipLfurWv

As you can see, only the PAGES2k dataset was returned through this query. The ODP846 dataset was omitted because it doesn’t have a datasetID. If you want to make this query optional, then all you need to do is write OPTIONAL in front of it:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?ds ?dsname ?datasetID WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .
OPTIONAL{?ds le:hasDatasetId ?datasetID.}}

"""

res, res_df = D2.query(query)

res_df

	ds	dsname	datasetID
0	http://linked.earth/lipd/Ocn-RedSea.Felis.2000	Ocn-RedSea.Felis.2000	4fZQAHmeuJn8ipLfurWv
1	http://linked.earth/lipd/ODP846.Lawrence.2006	ODP846.Lawrence.2006	None

In this case, you at least get the dataset name. When to use OPTIONAL filtering is really up to you. If the answer associated with a specific property is critical to your analysis, then it allows you to filter out these datasets directly. However, if the piece of information is not critical, you may want to use the OPTIONAL filter.

Querying for predicates#

Constructing these queries beg the question: do I need to know the ontology? To some extent, yes. You would need to know any database schema to be able to query it. However you do not need to know it in excrutiating details.

First, as you may have surmised, most of the PyLiPD functionalities designed to return information about the datasets are essentially wrappers around these queries. So if you’re interested, you can actually have a look at queries.py within the PyLiPD package to get some inspiration.

Second, you can also directly query all the predicates attached to an object directly. Let’s do so, shall we?

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT DISTINCT ?p WHERE {

?ds a le:Dataset .
?ds ?p ?o .
}
"""

Let’s break that query down. The second line contains the directive DISTINCT which will return unique predicates. The last line makes the predicate a variable that we are querying.

_,df = D2.query(query)

df

	p
0	http://linked.earth/ontology#maxYear
1	http://linked.earth/ontology#hasSpreadsheetLink
2	http://linked.earth/ontology#hasPaleoData
3	http://linked.earth/ontology#hasPublication
4	http://linked.earth/ontology#hasDatasetId
5	http://linked.earth/ontology#lipdVersion
6	http://linked.earth/ontology#hasLocation
7	http://linked.earth/ontology#hasChangeLog
8	http://linked.earth/ontology#hasUrl
9	http://www.w3.org/1999/02/22-rdf-syntax-ns#type
10	http://linked.earth/ontology#hasContributor
11	http://linked.earth/ontology#googleDataURL
12	http://linked.earth/ontology#googleMetadataWor...
13	http://linked.earth/ontology#inCompilation3_
14	http://linked.earth/ontology#hasName
15	http://linked.earth/ontology#studyName
16	http://linked.earth/ontology#minYear
17	http://linked.earth/ontology#hasOriginalDataUrl
18	http://linked.earth/ontology#inCompilation1_
19	http://linked.earth/ontology#hasArchiveType
20	http://linked.earth/ontology#createdBy
21	http://linked.earth/ontology#inCompilation2_
22	http://linked.earth/ontology#hasChronData
23	http://linked.earth/ontology#hasInvestigator
24	http://linked.earth/ontology#lipdComplete

Geoquery#

Let’s try a geospatial query on our original database. From the list of properties above, hasLocation links to coordinates. So let’s try to have a look at this object:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT DISTINCT ?p WHERE {

?ds a le:Dataset .
?ds le:hasLocation ?geo .
?geo ?p ?o .
}
"""

_,df = D.query(query)

df

	p
0	http://linked.earth/ontology#hasLongitude
1	http://linked.earth/ontology#region
2	http://linked.earth/ontology#coordinates
3	http://linked.earth/ontology#hasLatitude
4	http://linked.earth/ontology#coordinatesFor
5	http://linked.earth/ontology#hasSiteName
6	http://www.w3.org/2003/01/geo/wgs84_pos#lat
7	http://www.w3.org/2003/01/geo/wgs84_pos#long
8	http://linked.earth/ontology#hasElevation
9	http://linked.earth/ontology#pages2kRegion
10	http://www.w3.org/2003/01/geo/wgs84_pos#alt
11	http://www.w3.org/1999/02/22-rdf-syntax-ns#type
12	http://linked.earth/ontology#hasOcean

From this properties, lat, long and alt are promising. Notice that the URI is different. This is not a predicate from the ontology created by LinkedEarth but rather from another ontology with a different URI. We should define it in a prefix.

query ="""

PREFIX le:<http://linked.earth/ontology#>
PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#>

SELECT ?dsname ?lat ?lon  WHERE {

?ds a le:Dataset .
?ds le:hasName ?dsname .

?ds le:hasLocation ?geo .
OPTIONAL{?geo wgs84:lat ?lat .
        ?geo wgs84:long ?lon .}

}
"""

Let’s have a look at our new query:

Notice the new prefix since we are reusing the wgs84 ontology.
The third line defines the variables that we wish to return in our query, namely the datasetName, latitude and longitude
The last paragraph concerns our geographical query. Note that these lines are made OPTIONAL in case the file doesn’t contain geographical coordinates. This is unlikely to be the case, but it is a quich way to appraise errors in our database or your LiPD files.

_,df = D.query(query)

df

	dsname	lat	lon
0	Eur-CoastofPortugal.Abrantes.2011	41.1000	-8.9000
1	Eur-LakeSilvaplana.Trachsel.2010	46.5000	9.8000
2	Arc-Kongressvatnet.D'Andrea.2012	78.0217	13.9311
3	Ocn-AlboranSea436B.Nieto-Moreno.2013	36.2053	-4.3133
4	Ocn-SinaiPeninsula,RedSea.Moustafa.2000	27.8483	34.3100
5	Eur-Stockholm.Leijonhufvud.2009	59.3200	18.0600
6	Eur-NorthernScandinavia.Esper.2012	68.0000	25.0000
7	Eur-SpannagelCave.Mangini.2005	47.1000	11.6000
8	Eur-NorthernSpain.Martin-Chivelet.2011	42.9000	-3.5000
9	Asi-SourthAndMiddleUrals.Demezhko.2007	55.0000	59.5000
10	Ocn-PedradeLume-CapeVerdeIslands.Moses.2006	16.7600	-22.8883
11	Ocn-FeniDrift.Richter.2009	55.5000	-13.9000
12	Eur-SpanishPyrenees.Dorado-Linan.2012	42.5000	1.0000
13	Ant-WAIS-Divide.Severinghaus.2012	-79.4630	-112.1250
14	Eur-FinnishLakelands.Helama.2014	62.0000	28.3250

Let’s take this query a little bit further: let’s say I only want to keep the data between 40N and 70N and 0E to 30E. Welcome to the FILTER functionality.

Your query is going to look much the same with the added FILTER:

query ="""

PREFIX le:<http://linked.earth/ontology#>
PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#>

SELECT ?dsname ?lat ?lon  WHERE {

?ds a le:Dataset .
?ds le:hasName ?dsname .

?ds le:hasLocation ?geo .
?geo wgs84:lat ?lat .
?geo wgs84:long ?lon .

FILTER ( ?lat >= 40 && ?lat < 70 && ?lon >= 0 && ?lon < 30 ) .

}
"""

I also removed the OPTIONAL argument since we are doing a geospatial query. If the dataset doesn’t have lat/lon, then it shouldn’t be included anyway in the return.

_,df = D.query(query)

df

	dsname	lat	lon
0	Eur-LakeSilvaplana.Trachsel.2010	46.50	9.800
1	Eur-Stockholm.Leijonhufvud.2009	59.32	18.060
2	Eur-NorthernScandinavia.Esper.2012	68.00	25.000
3	Eur-SpannagelCave.Mangini.2005	47.10	11.600
4	Eur-SpanishPyrenees.Dorado-Linan.2012	42.50	1.000
5	Eur-FinnishLakelands.Helama.2014	62.00	28.325

Congratulations! You’ve performed your first filter! In case you are wondering, this exact query is used in the function filter_by_geo_box.

Filtering by name#

You must now be dying to know how you can look for specific variable names and filter by them. In the interest of time, I’ll create the query that returns all unique variable names first (This is what’s behind get_all_variable_names) but I encourage you to spend some time following it line-by-line and even use the predicate search shown above to see how I got the names of the properties.

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT DISTINCT ?varname WHERE{

?ds a le:Dataset .

?ds le:hasPaleoData ?data .
?data le:hasMeasurementTable ?table .
?table le:hasVariable ?var .
?var le:hasName ?varname .
}
"""

_,df = D.query(query)

df

	varname
0	year
1	temperature
2	Uk37
3	d18O
4	MXD
5	notes
6	depth_bottom
7	Mg_Ca
8	depth_top
9	trsgi
10	uncertainty_temperature

This returned all unique variable names present in all the PaleoData objects tables present in the LiPD object. This is the power of SPARQL: you do not need to know how many tables/columns/data are present in the dataset to query across them.

Let’s filter by temperature. In this case, I really mean temperature and ignore the uncertainty.

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?dsname ?varname WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .

?ds le:hasPaleoData ?data .
?data le:hasMeasurementTable ?table .
?table le:hasVariable ?var .
?var le:hasName ?varname .

FILTER (?varname = 'temperature')

}
"""

_,df = D.query(query)

df

	dsname	varname
0	Eur-CoastofPortugal.Abrantes.2011	temperature
1	Eur-LakeSilvaplana.Trachsel.2010	temperature
2	Arc-Kongressvatnet.D'Andrea.2012	temperature
3	Ocn-AlboranSea436B.Nieto-Moreno.2013	temperature
4	Eur-Stockholm.Leijonhufvud.2009	temperature
5	Asi-SourthAndMiddleUrals.Demezhko.2007	temperature
6	Ocn-FeniDrift.Richter.2009	temperature
7	Ocn-FeniDrift.Richter.2009	temperature
8	Ant-WAIS-Divide.Severinghaus.2012	temperature
9	Eur-FinnishLakelands.Helama.2014	temperature

If you want to relax this assumption, you have two choices:

explicitly go look for uncertainty_temperature
Use regular expressions. Although as you may discover, you now have two problems: SPARQL and regex.

Let’s have a look at the first option:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?dsname ?varname WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .

?ds le:hasPaleoData ?data .
?data le:hasMeasurementTable ?table .
?table le:hasVariable ?var .
?var le:hasName ?varname .

FILTER (?varname = 'temperature' || ?varname = 'uncertainty_temperature')

}
"""

Much like other languages, SPARQL uses || to indicate a logical “or” and && for a logical “and”.

_,df = D.query(query)

df

	dsname	varname
0	Eur-CoastofPortugal.Abrantes.2011	temperature
1	Eur-LakeSilvaplana.Trachsel.2010	temperature
2	Arc-Kongressvatnet.D'Andrea.2012	temperature
3	Ocn-AlboranSea436B.Nieto-Moreno.2013	temperature
4	Eur-Stockholm.Leijonhufvud.2009	temperature
5	Asi-SourthAndMiddleUrals.Demezhko.2007	temperature
6	Ocn-FeniDrift.Richter.2009	temperature
7	Ocn-FeniDrift.Richter.2009	temperature
8	Ant-WAIS-Divide.Severinghaus.2012	temperature
9	Ant-WAIS-Divide.Severinghaus.2012	uncertainty_temperature
10	Eur-FinnishLakelands.Helama.2014	temperature

Using regular expressions:

query ="""

PREFIX le:<http://linked.earth/ontology#>

SELECT ?dsname ?varname WHERE{

?ds a le:Dataset .
?ds le:hasName ?dsname .

?ds le:hasPaleoData ?data .
?data le:hasMeasurementTable ?table .
?table le:hasVariable ?var .
?var le:hasName ?varname .

FILTER regex (?varname, "temperature.*", "i" ) .

}
"""

_,df = D.query(query)

df

	dsname	varname
0	Eur-CoastofPortugal.Abrantes.2011	temperature
1	Eur-LakeSilvaplana.Trachsel.2010	temperature
2	Arc-Kongressvatnet.D'Andrea.2012	temperature
3	Ocn-AlboranSea436B.Nieto-Moreno.2013	temperature
4	Eur-Stockholm.Leijonhufvud.2009	temperature
5	Asi-SourthAndMiddleUrals.Demezhko.2007	temperature
6	Ocn-FeniDrift.Richter.2009	temperature
7	Ocn-FeniDrift.Richter.2009	temperature
8	Ant-WAIS-Divide.Severinghaus.2012	temperature
9	Ant-WAIS-Divide.Severinghaus.2012	uncertainty_temperature
10	Eur-FinnishLakelands.Helama.2014	temperature

As you have seen in previous tutorials, PyLiPD natively uses regex to query for variables.

Concluding remarks#

We are expanding PyLiPD to include more and more of the common queries in paleoclimatology. If you’d like to see one, please open an issue. Make sure you consult the open issues before opening your own and upvote if it already exists. If a issue is upvoted, it will be most likely slated for being addressed quickly.

Below are some FAQs, with their answers:

Can I create my own properties in a LiPD file and search with PyLiPD? The short answer resides in whether you are creating a data property (a property linking to a known class such as string, float, int, boolean) or an object property. Data properties will immediately be recognized by PyLiPD and can be used. For object properties, please contact us as this will change the underlying classes in the ontology.
Can you create APIs for my properties? The short answer is no, unless this is a property that will be re-used extensively by others. As you navigate datasets, you will notice that some properties are essentially used as flags for analysis. This may only serve one person and will therefore not be intergrated at the API level. However, if you cannot create your own query using this tutorial, please start a discussion on Discourse.