Welcome to PaleoHack - Python introduction workshop

These short modules will walk you through an introduction to Python and the scientific Python stack; namely, NumPy, Pandas,Matplotlib, and Cartopy. These packages form the foundation upon which Pyleoclim is built. The last two modules refer to the use of GitHub and FAIR principles.

After these modules, you will be able to use Python more effectively for your research and work with Pyleoclim more efficiently. Remember that Pyleoclim is built upon these tools, so learning to make a plot in Matplotlib, for instance, will give you a better idea of how Pyleoclim works under the hood and how you can modify your plots.

This page runs on a python3 kernel.

Module 1: Introduction to Python

In this module, you will learn how to write basic Python code. Make sure that you go through the material provided by Project Pythia before attempting the exercises. Note that the first time you build the Binder, this may take some time. Also do not navigate away from the code block (e.g., by opening another exercise) as this will stop the execution.

Module 2: Introduction to Jupyter

This module will teach you how to use Jupyter Notebook/JupyterLab. We will be making use of these technologies for the reminder of the hackathon. There are no LinkedEarth specific exercises related to this module.

Module 3: Introduction to NumPy

This module will teach you about NumPy. NumPy arrays are the most basic data structure for Pyleoclim, underlying all the other packages. Virtually all operations made on the data involve NumPy arrays, so it is important to understand a bit about how they work.

Module 4: Introduction to Pandas

This model will teach you how to user Pandas. Some functionalities in Pyleoclim are supported by Pandas (and we will add more in the coming year). For now, Pandas represent the easiest way to get non-LiPD formatted data into Python for use with Pyleoclim. One of the main reasons we do not use Pandas is because most its advanced capabilities require a datetime format incompatible with paleoclimate data.

Module 5: Introduction to Matplotlib

This module will teach you how to use Matplotlib, a comprehensive library for creating static, animated, and interactive visualizations in Python. The module is structured around the visualization of two contemporaneous monthly timeseries: the NINO3 SST dataset and All India Rainfall, to explore potential connections.

Module 6: Introduction to Cartopy

This module will teach you about Cartopy, a mapping package used widely in the geosciences, including Pyleoclim.

Module 7: Introduction to GitHub

This module will teach you about Version Control and how to use Git and Github.

Module 8: Development best-practices including FAIR data principles

This module will teach you different aspects in regards to Data Workflows, handling filenames and best practices such as testing, continuous integration and licensing.

References and Resources

Here are scripts to re-use and links to additional references and topics to learn.

About this course

This is a free, open source course on how to use different Data Science Tools such as Jupyter Notebooks, relevant Python libraries, GitHub, and information about making your science FAIR. It's made possible by a long and fruitful collaboration to NSF and EarthCube. Contributions and comments on how to improve the course are welcome! To file an issue go to: https://github.com/LinkedEarth/ec_workshops_py/issues

About me

For nearly a decade, the EarthCube community has been transforming the conduct of geosciences research by developing and maintaining a well-connected and facile environment that improves access, sharing, visualization, and analysis of data and related resources. While sharable tools, methods, and cyberinfrastructure have been critical achievements for EarthCube, we find that our dedicated community is what makes our program successful. LinkedEarth builds upon EarthCube success, specifically targeting the paleogeosciences community.