Featured Partnership: Cscience and LinkedEarth

Guest blog post from Liz Bradley from Cscience. Learn more about how AI can help you construct age models!

The critical first step in the analysis of paleoclimate records like ice or sediment cores is the construction of an age model, which relates the depth in a core to the calendar age of the material at that point. The reasoning involved in age-model construction is complex, subtle, and scientifically demanding because the processes that control the rate of material accumulation over time, and that affect the core between formation and sampling, are unknown. Geoscientists approach this problem by treating the core like a crime scene and asking the question: "What physical and chemical processes could have produced this situation, and what does that say about the timeline?" The sheer number of possibilities, however, coupled with the volume and complexity of the climatology data that is available nowadays, severely limit the scope of these investigations. Simply put, it is hard---and becoming much harder---to examine all of the potential relationships that may, or may not, be lurking in the data.

The goal of the CScience project, which is underway at the University of Colorado/Boulder under the direction of computer scientists Liz Bradley and Ken Anderson and geoscientists Jim White and Tom Marchitto, is to build an automated reasoning tool that helps human experts navigate efficiently and effectively through this complex problem landscape. The underlying ideas draw on the fields of software engineering and artificial intelligence (AI).

The centerpiece of the Cscience project is an integrated open-source software system called CSciBox that makes it easy for non-programmers to compose scientific analysis workflows, to build, evaluate and use age models, and to import and export data. CSciBox's toolset includes a number of useful modules including CALIB-style 14C calibrations, reservoir-age corrections, various kinds of interpolations (including BACON), ice-flow models, and layer-counting software. Below is a screenshot of CSciBox in action on a marine sediment core. The scientist has used CSciBox to build two different age models. In both cases, she began by using the IntCal 2013 calibration curve to calibrate the raw 14C data. To build the purple age model, she used CSciBox to fit a regression line to the results; for the other, (aqua) she instructed CSciBox to run BACON.

Cscience Blogpost Image

The AI engine wraps around that core, using automated reasoning techniques to explore the space of possible age models, evaluating each one and reporting upon its results. This amplifies the capabilities of the human expert, reducing the onerous repetitive part of the work of building age models and freeing him or her to focus on the science.

The CSciBox source code is available on github, as are one-click installer versions for Mac OSX and Windows. Please visit the Cscience website  for tutorials, citations, and links to the code---and do let us know if you have comments or suggestions about the system.

The technical challenges involved in automating scientific reasoning are substantial. Capturing the knowledge that human experts employ to solve problems is famously hard. We do not follow explicit, algorithm-like rules when we look at data; rather, we make judgements based on implicit rules...we "know it when we see it." This kind of reasoning is very hard to formalize. Just imagine coming up with a really crisp definition of a bump in a signal---a definition that you could program up for a computer to execute---and you'll see what I mean. Extracting and formalizing this kind of implicit knowledge requires extended collaborations between domain scientists and AI practitioners. That is the primary computer science research issue in this project.

(As an aside, the field of AI is divided into two camps: statistical AI and symbolic AI. Statistical AI completely ducks the knowledge engineering problem by "learning" patterns from corpora of examples that have been labeled by human experts, typically with various kinds of sophisticated regressions. Statistical AIs work really really well, but they can't explain their reasoning, which does not go over well at ALL with scientists and engineers. That's why we taken the symbolic-AI route in our work.)

Why is this post on the LinkedEarth blog page? Because CSciBox uses LiPD, and it has benefited greatly as a result. Building LiPD capabilities into CSciBox's importer eliminated the need to ask the user about file formats or variable names, and semantic links between columns in the input file: e.g., the fact that the values in column C of a csv file were the uncertainties of the values in column B. Building LiPD capabilities into the exporter made it easy to store complete details about the input data (including descriptions of any preprocessing steps), along with full descriptions of any analyses that were performed using the CSciBox tool and citations to appropriate references, all linked in a semantically meaningful, machine-readable way to those data & methods. This kind of "metadata"---data about the data---is critical in scientific analysis software. Without LiPD, we would have had to come up with our own handcrafted export format for that metadata: a format that no other software tool would be able to understand.

All in all, LiPD has been a primary enabler of CSciBox's goals of usability, interoperability, and reproducibility. Not only can any software that understands LiPD can share data and metadata seamlessly with CSciBox, and vice versa; equally importantly, the systematic, meaningful, machine-readable storage of metadata that is afforded by LiPD amounts to complete documentation of every CSciBox analysis.

Please visit the Cscience website  for more information.

The Cscience tutorials are also available on YouTube

If you'd like to do a guest post for LinkedEarth, please contact us!

Leave a Reply