Developing data standards is a key goal of EarthCube, and doing so for paleoclimate was one of LinkedEarth’s core missions . Over the life of this project, we have come to realize that a data standard is really three things:
- a standard format for the data
- a standard terminology for metadata
- standard guidelines for reporting data (i.e. a reporting standard)
I like to think of this as the organization of library cards into an old-fashioned file cabinet (to give you an idea, this is the library I have in mind). For this card system to function, one needs: (1) a set of compartments and drawers to house the cards; (2) labels to identify and classify the books; and (3) a disciplined adherence to the classification system. That is: including essential information on the cards, and putting them back where they belong, otherwise the classification falls apart.
We now have a standard digital format (LiPD) and a budding research ecosystem around it. We have also standardized the terminology via the LinkedEarth Ontology. And as of last month, we now have the beginnings of a reporting standard: PaCTS.
PaCTS is the PaleoClimate reporTing Standard, a grassroots effort to standardize the data and metadata that would most ensure the most long-term value for digitally-archived paleoclimate datasets, past, present and future. We just submitted a paper describing the first version of the standard to Paleoceanography & Paleoclimatology, and are now turning the page and taking stock of lessons learned. To me, there are five major ones:
- There can never be enough community discussion. Although we used a mix of in-person and digital means (winding email threads, Skype calls, Twitter back & forths, and Working Group wiki pages), it is clear that other avenues could be beneficial. To that end, I hereby introduce the LinkedEarth forums, which we hope will facilitate more transparent community discussions in a central place. In the rest of this post, I link to specific threads to nucleate a discussion, but others are welcome too, of course.
- The paleo community really does care about standards. The main obstacle is engaging enough folks to gather all the needed input to build a truly representative consensus. Everyone’s ideas are welcome on how to broaden participation. See also “Who speaks for the trees?“
- The folks who did provide input to PaCTS v1.0 really do seem to want a whole lot more metadata, having voted most of the proposed categories as “essential” and nearly all the rest as “recommended”. However, when Deborah tried to apply the PaCTS v1.0 recommendations to a marine sediment dataset she had generated herself (at least in part) less than a decade ago, it became clear that many of the metadata properties were difficult to obtain, even in this ideal scenario. Imagine trying to do the same for a dataset you did not generate, and having to ferret out the information from pre-digital publications, old notebooks, gray literature, and the like ! This means that PaCTS v1.0 is at best an aspirational standard. How to we get from PaCTS v1.0 to a workable PaCTS?
- Both of these points relate to the question of defining a more inclusive, clear and interactive process by which to update and adopt this community standard. Here is a forum to discuss that.
- This brings us to the topic of incentives, without which adherence to the standard is likely to remain extremely limited. Given the reward structure of academia, I see two levers: funding agencies and publishers. This thread discusses how to push them both.
What is your take on all this? What do you see as the next steps forward for data standardization? We hope the forum will be a small step in the direction of inclusive, transparent discussions, and we hope that you make use of it.