My last task as a PhD student at USC was to generate a record of Holocene sea surface and deep ocean temperature and salinity variability within the Indo-Pacific Warm Pool. As a geochemist, I spent hours in the lab picking foraminifera, cleaning them, and performing isotopes and trace metal analyses.
I didn’t get to publish this work before I left for a postdoc at UT Austin. In retrospect, this became more of a blessing than a curse. Although my PhD had focused on generating these time series, my postdoc with Charles Jackson allowed me to explore the data analysis aspect of the field. Using this one record, we asked the following question: was solar variability responsible for the observed periodic millennial cycles given the large time uncertainties inherent to paleoceanographic reconstructions?
The conclusion of this study was: “we cannot entirely discount the solar hypothesis.” Refuting, or failing to refute, the hypothesis would require many more records than the one we had. Several Holocene records had already been published and we already had written the code to process the data. So why not do it?
Easier said than done. Everyone involved in synthesis work knows how frustrating looking for records can be. Furthermore, once the series are identified, they come in disparate formats that are not easily readable by a computer, and the codes we had tailored to my record.
The hours spent in the lab picking the foraminifera, cleaning them, and performing isotopes and trace metal analyses would have to be replaced by hours spent in front of a computer picking appropriate time series based on several selection criteria (e.g., covering the Holocene with a resolution better than 200 years), cleaning the available records so they can be read by a computer, and finally performing the analysis.
Charles and I started working on a proposal in 2013 and had allocated about 80% of the first-year salaries on the budget to the first two tasks: identifying and cleaning the records. As Julien mentioned in his first post, the thought of PhD-level scientists spending up to 80% of their time and money on a federal budget doing a job a computer can do should make everyone pause.
It did make us pause. We never submitted this proposal, thinking it would never get funded if the main activity (i.e., the one taking much of the time and budget) would amount to gather data that had already been generated.
Having spent time in a lab, I appreciate the time it takes to select the proper shells for analysis, removing contaminants, and run the various instruments. I wouldn’t conceive giving this task to a machine. As a human being living in the world of Siri, however, I can’t wrap my head around spending 80% of my time opening files one by one to check for basic metadata: whether the raw radiocarbon data had been archived or whether the record met my resolution criteria. Furthermore, the basic search I performed to filter the thousands of records present on the databases may or may not have found all the records that I knew existed from the literature (I learned in this process than searching for Mg/Ca vs MgCa can return astonishingly different results).
When Siri came out in 2011, it revolutionized the way people interacted with their phone. Most of the functionalities of the iPhone became accessible with a simple verbal query. Nowadays, I use Siri for navigation, opening apps, making calls, sending text messages, or asking for a vegetarian restaurant within a one-mile radius – which comes in handy when traveling to conferences.
So where is my Siri for paleo? Closer than you may think. LinkedEarth can deliver on its promise to reduce the time spent looking for records. I gave a demo of the search capabilities at AGU in December and the LinkedEarth team is working on offering a GUI-based approach.
However, although I had been an early Siri adopter, I didn’t truly appreciate the behind-the-scene work needed to make it a reality. Besides the technological aspect of the app, the concept rests on the notion that the data it sifts through are formatted in a standard fashion. For instance, iOS is capable of downloading local public transit information. Such a simple task would not be possible without a standard way of expressing public transit information.
What is a standard anyway? EarthCube defines it as “a public specification documenting some practice or technology that is adopted and used by a community. [..] There is a continuum starting with any documented practice in some community. If lots of people use a particular documented practice it could be adopted as a best practice. If almost everyone uses some documented practice, then it is a de facto standard.”
Notice the emphasis on community and on practice. If only one person uses a technical specification, it’s not a standard. If it’s voted on but not applied in practice, it’s not much use either. LinkedEarth can help with the implementation phase. But we need the community to decide on the specification and to adopt the standard. So where should we start with specifying a standard? As everything else we do, it is motivated by science questions.
Let’s go back to my original goal: investigating millennial-scale variability in the Holocene. Expanding the database to include Holocene records of sea surface temperature variability would require the following queries: (1) finding records that span the Holocene; (2) finding the subset of those that primarily reflect sea-surface temperature ; and (3) finding the subset of that subset with a resolution finer than 200 years. To do that, you need the following metadata:
|Spanning the Holocene (0-10ka)||Concept of age (time), min and max values of series of values|
|Recording sea surface temperature||Concept of SST (as an inferred variable) or/and Mg/Ca, Uk37, and TEX86 (as the measured variables)|
|With a resolution of finer than 200 years||Concept of resolution.|
Other types of basic queries include: searching for a particular publication, using either the DOI, title, journal, authors; and searching by archive type. The latter is currently used on the LinkedEarth wiki to obtain the maps under each archive category (for example, marine sediments).
A standard not only helps us with the menial task of searching for records in a database. It can also assist with doing the science we want to do in the first place. Going back to my current study on Holocene millennial-scale variability, answering the question “Can solar variability account for the observed sea surface temperature cycles?” requires to:
- Make sure that the periodicities found in these records are robust in face of age uncertainties. To do so, we used the Bchron probability model to obtain ensemble age models and use the Lomb-Scargle periodogram for unevenly-spaced time series for each member of the age model ensembles.
- Use cross-wavelet analysis to determine the phasing between solar and climate variability.
Collectively, 11 records from the Indo-Pacific Warm Pool suggest that these periodicities are in fact robust when considering all records together, despite large uncertainties in individual records. On the other hand, these 11 records are too few to confirm or refute the solar forcing hypothesis. We would need many more records of Holocene climate variability to do so. LinkedEarth offers the platform and the tools (see here for some details) to do that; all we need to move forward are community standards. Once those are in place, we (and you!) will be able to add more records and analyze them with a flick of a coding wand. Indeed, having highly structured data allows to write efficient, cutting-edge tools, which we are already sharing with the community: GeoChronR for age-uncertain series, Pyleoclim for a lot of other good things, and more Matlab code than you would ever want to stare at.
This is how a standard should be conceived: by having a discussion on what the community needs in order to do the most rigorous and exciting science. The discussion has already started at the Workshop on Paleoclimate Data Standards and is now continuing on the wiki by the various working groups. Your voice needs to be heard! Please join a group today and start writing about the queries you want to do.
Welcome to the future,