PaleoPal - AI-Powered Assistant for Paleoclimate Research

The Philosophy Behind PaleoPal

PaleoPal is an AI-powered assistant designed to accelerate paleoclimate research by combining retrieval-augmented generation (RAG), specialized AI agents, and a comprehensive vector knowledge base derived from notebooks, research papers, and ontologies.

Core Principles

Domain-Specific Intelligence: PaleoPal understands the unique challenges of paleoclimate research, from SPARQL queries for LinkedEarth datasets to complex time series analysis workflows.
Multi-Agent Architecture: Specialized agents handle different aspects of research:
- SPARQL Agent: Generates and refines queries for paleoclimate databases
- Code Agent: Writes Python code for data analysis and visualization
- Workflow Agent: Plans multi-step research workflows
Knowledge-Driven: Built on a vector knowledge base that includes:
- Notebook snippets and workflows
- Research paper methods
- API documentation
- Ontology entities and relationships
- SPARQL query templates
Transparency and Collaboration: Real-time progress visualization, clarification dialogues, and context-aware responses ensure researchers understand and control the AI's reasoning process.

Why PaleoPal?

Paleoclimate research involves complex, multi-step workflows that require deep domain knowledge. Traditional AI assistants lack the specialized understanding needed for tasks like:

Querying LinkedEarth GraphDB with proper SPARQL syntax
Understanding paleoclimate data structures and ontologies
Generating code that follows established analysis patterns
Planning workflows that integrate multiple data sources and methods

PaleoPal bridges this gap by combining general-purpose AI capabilities with domain-specific knowledge retrieval, making it a true research partner rather than just a code generator.

Installation Instructions

Quick Start with Docker (Recommended)

The easiest way to get started with PaleoPal is using Docker Compose:

# Clone the repository
git clone https://github.com/yourusername/paleopal.git
cd paleopal

# Create environment file
cp backend/env.example backend/.env

# Edit backend/.env with your API keys:
# OPENAI_API_KEY=your_key_here
# ANTHROPIC_API_KEY=your_key_here
# GOOGLE_API_KEY=your_key_here
# XAI_API_KEY=your_key_here

# Start all services
docker compose up -d

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# Qdrant Dashboard: http://localhost:6333/dashboard

Local Development Setup

Prerequisites

Python 3.11+
Node.js 18+ and npm
Docker (for Qdrant vector database)

Backend Setup

# Create virtual environment
cd backend
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start Qdrant (Docker)
docker run -p 6333:6333 -p 6334:6334 \
  -v qdrant-data:/qdrant/storage \
  --name paleopal-qdrant qdrant/qdrant:v1.7.0

# Configure environment variables
cp env.example .env
# Edit .env with your API keys

# Launch API server
uvicorn main:app --reload

Frontend Setup

# Install dependencies
cd frontend
npm install

# Start development server
npm start
# Frontend will be available at http://localhost:3000

Populating the Knowledge Base

To enable semantic search, you need to populate Qdrant with embeddings:

# Set environment variables
export QDRANT_HOST=localhost
export QDRANT_PORT=6333

# Run the indexing script
cd backend/libraries
bash index_everything.sh

# This indexes:
# - SPARQL queries
# - Ontology entities
# - Notebook snippets and workflows
# - ReadTheDocs documentation
# - Literature methods

VS Code Extension

PaleoPal includes a VS Code extension for integrated research workflows:

Open VS Code → Extensions view
Click "..." menu → Install from VSIX
Select vscode-extension/paleopal-vscode-extension-0.1.0.vsix
Configure settings:
- paleopal.backendUrl: http://localhost:8000/api
- paleopal.defaultProvider: openai|anthropic|google|ollama|grok

System Requirements

RAM: Minimum 4GB, recommended 8GB+
Disk Space: 10GB+ for Docker volumes and dependencies
API Keys: At least one LLM provider API key (OpenAI, Anthropic, Google, or xAI)

Frequently Asked Questions

What is PaleoPal?

PaleoPal is an AI-powered assistant specifically designed for paleoclimate research. It helps researchers by generating SPARQL queries, writing analysis code, and planning multi-step workflows using a knowledge base of notebooks, papers, and ontologies.

Do I need to know programming to use PaleoPal?

While PaleoPal can help generate code, some familiarity with Python and data analysis concepts is helpful. The system is designed to assist researchers who already work with paleoclimate data, making their workflows more efficient rather than replacing domain expertise.

Which LLM providers are supported?

PaleoPal supports multiple LLM providers:

OpenAI (GPT-4o)
Anthropic (Claude 3.5)
Google (Gemini 2.5)
xAI (Grok 3)
Ollama (DeepSeek-R1, for local deployment)

You need at least one API key to use PaleoPal.

How does the knowledge base work?

PaleoPal uses semantic search (via Qdrant vector database) to retrieve relevant context from:

Notebook snippets and workflows
Research paper methods
API documentation
Ontology entities and relationships
SPARQL query templates

This context is then used to augment the AI's responses, ensuring domain-specific accuracy.

Can I use PaleoPal without Docker?

Yes, you can run the backend and frontend locally, but you'll still need Qdrant running (either via Docker or a local installation). The Docker setup is recommended for easier deployment and data persistence.

How do I add my own notebooks to the knowledge base?

Place your Jupyter notebooks in backend/libraries/notebook_library/my_notebooks/ and run the indexing script:

cd backend/libraries
python notebook_library/index_notebooks.py --keep-invalid --no-synth-imports notebook_library/my_notebooks

Is my data stored securely?

PaleoPal stores conversations and execution state locally (in SQLite) or in Docker volumes. Vector embeddings are stored in your local Qdrant instance. No data is sent to external services except the LLM providers you configure. Always review your API key permissions and data handling policies.

Can I use PaleoPal offline?

Partially. While you can run PaleoPal locally with Ollama for LLM inference, you'll still need internet access for:

Initial setup and dependency installation
Accessing LinkedEarth GraphDB (if using SPARQL queries)
Downloading embedding models (first time only)

How do I report bugs or request features?

Please open an issue on the GitHub repository with:

A clear description of the problem or feature request
Steps to reproduce (for bugs)
Your system configuration (OS, Python version, etc.)

Is PaleoPal free to use?

PaleoPal itself is open-source software. However, you'll need API keys from LLM providers, which may have usage costs depending on your provider and usage volume. Some providers offer free tiers for development and research.

Demos

Watch these videos to see PaleoPal in action:

Getting Started with PaleoPal

Video coming soon

Learn how to set up PaleoPal and start your first conversation

SPARQL Query Generation

Video coming soon

See how PaleoPal generates and refines SPARQL queries for LinkedEarth datasets

Code Generation for Analysis

Video coming soon

Watch PaleoPal generate Python code for paleoclimate data analysis

Workflow Planning

Video coming soon

Explore how PaleoPal plans multi-step research workflows

Adding Your Own Demo Videos

To add demo videos to this page, replace the video placeholders with embedded video iframes. For example:

<div class="demo-item">
    <h3>Your Demo Title</h3>
    <div class="video-container">
        <iframe 
            src="https://www.youtube.com/embed/YOUR_VIDEO_ID" 
            frameborder="0" 
            allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" 
            allowfullscreen>
        </iframe>
    </div>
</div>