The Philosophy Behind PaleoPal
PaleoPal is an AI-powered assistant designed to accelerate paleoclimate research by combining retrieval-augmented generation (RAG), specialized AI agents, and a comprehensive vector knowledge base derived from notebooks, research papers, and ontologies.
Core Principles
- Domain-Specific Intelligence: PaleoPal understands the unique challenges of paleoclimate research, from SPARQL queries for LinkedEarth datasets to complex time series analysis workflows.
- Multi-Agent Architecture: Specialized agents handle different aspects of research:
- SPARQL Agent: Generates and refines queries for paleoclimate databases
- Code Agent: Writes Python code for data analysis and visualization
- Workflow Agent: Plans multi-step research workflows
- Knowledge-Driven: Built on a vector knowledge base that includes:
- Notebook snippets and workflows
- Research paper methods
- API documentation
- Ontology entities and relationships
- SPARQL query templates
- Transparency and Collaboration: Real-time progress visualization, clarification dialogues, and context-aware responses ensure researchers understand and control the AI's reasoning process.
Why PaleoPal?
Paleoclimate research involves complex, multi-step workflows that require deep domain knowledge. Traditional AI assistants lack the specialized understanding needed for tasks like:
- Querying LinkedEarth GraphDB with proper SPARQL syntax
- Understanding paleoclimate data structures and ontologies
- Generating code that follows established analysis patterns
- Planning workflows that integrate multiple data sources and methods
PaleoPal bridges this gap by combining general-purpose AI capabilities with domain-specific knowledge retrieval, making it a true research partner rather than just a code generator.
Installation Instructions
Quick Start with Docker (Recommended)
The easiest way to get started with PaleoPal is using Docker Compose:
# Clone the repository
git clone https://github.com/yourusername/paleopal.git
cd paleopal
# Create environment file
cp backend/env.example backend/.env
# Edit backend/.env with your API keys:
# OPENAI_API_KEY=your_key_here
# ANTHROPIC_API_KEY=your_key_here
# GOOGLE_API_KEY=your_key_here
# XAI_API_KEY=your_key_here
# Start all services
docker compose up -d
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# Qdrant Dashboard: http://localhost:6333/dashboard
Local Development Setup
Prerequisites
- Python 3.11+
- Node.js 18+ and npm
- Docker (for Qdrant vector database)
Backend Setup
# Create virtual environment
cd backend
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start Qdrant (Docker)
docker run -p 6333:6333 -p 6334:6334 \
-v qdrant-data:/qdrant/storage \
--name paleopal-qdrant qdrant/qdrant:v1.7.0
# Configure environment variables
cp env.example .env
# Edit .env with your API keys
# Launch API server
uvicorn main:app --reload
Frontend Setup
# Install dependencies
cd frontend
npm install
# Start development server
npm start
# Frontend will be available at http://localhost:3000
Populating the Knowledge Base
To enable semantic search, you need to populate Qdrant with embeddings:
# Set environment variables
export QDRANT_HOST=localhost
export QDRANT_PORT=6333
# Run the indexing script
cd backend/libraries
bash index_everything.sh
# This indexes:
# - SPARQL queries
# - Ontology entities
# - Notebook snippets and workflows
# - ReadTheDocs documentation
# - Literature methods
VS Code Extension
PaleoPal includes a VS Code extension for integrated research workflows:
- Open VS Code → Extensions view
- Click "..." menu → Install from VSIX
- Select
vscode-extension/paleopal-vscode-extension-0.1.0.vsix - Configure settings:
paleopal.backendUrl:http://localhost:8000/apipaleopal.defaultProvider:openai|anthropic|google|ollama|grok
System Requirements
- RAM: Minimum 4GB, recommended 8GB+
- Disk Space: 10GB+ for Docker volumes and dependencies
- API Keys: At least one LLM provider API key (OpenAI, Anthropic, Google, or xAI)
Frequently Asked Questions
What is PaleoPal?
PaleoPal is an AI-powered assistant specifically designed for paleoclimate research. It helps researchers by generating SPARQL queries, writing analysis code, and planning multi-step workflows using a knowledge base of notebooks, papers, and ontologies.
Do I need to know programming to use PaleoPal?
While PaleoPal can help generate code, some familiarity with Python and data analysis concepts is helpful. The system is designed to assist researchers who already work with paleoclimate data, making their workflows more efficient rather than replacing domain expertise.
Which LLM providers are supported?
PaleoPal supports multiple LLM providers:
- OpenAI (GPT-4o)
- Anthropic (Claude 3.5)
- Google (Gemini 2.5)
- xAI (Grok 3)
- Ollama (DeepSeek-R1, for local deployment)
How does the knowledge base work?
PaleoPal uses semantic search (via Qdrant vector database) to retrieve relevant context from:
- Notebook snippets and workflows
- Research paper methods
- API documentation
- Ontology entities and relationships
- SPARQL query templates
Can I use PaleoPal without Docker?
Yes, you can run the backend and frontend locally, but you'll still need Qdrant running (either via Docker or a local installation). The Docker setup is recommended for easier deployment and data persistence.
How do I add my own notebooks to the knowledge base?
Place your Jupyter notebooks in backend/libraries/notebook_library/my_notebooks/ and run the indexing script:
cd backend/libraries
python notebook_library/index_notebooks.py --keep-invalid --no-synth-imports notebook_library/my_notebooks
Is my data stored securely?
PaleoPal stores conversations and execution state locally (in SQLite) or in Docker volumes. Vector embeddings are stored in your local Qdrant instance. No data is sent to external services except the LLM providers you configure. Always review your API key permissions and data handling policies.
Can I use PaleoPal offline?
Partially. While you can run PaleoPal locally with Ollama for LLM inference, you'll still need internet access for:
- Initial setup and dependency installation
- Accessing LinkedEarth GraphDB (if using SPARQL queries)
- Downloading embedding models (first time only)
How do I report bugs or request features?
Please open an issue on the GitHub repository with:
- A clear description of the problem or feature request
- Steps to reproduce (for bugs)
- Your system configuration (OS, Python version, etc.)
Is PaleoPal free to use?
PaleoPal itself is open-source software. However, you'll need API keys from LLM providers, which may have usage costs depending on your provider and usage volume. Some providers offer free tiers for development and research.
Demos
Watch these videos to see PaleoPal in action:
Getting Started with PaleoPal
Video coming soon
Learn how to set up PaleoPal and start your first conversation
SPARQL Query Generation
Video coming soon
See how PaleoPal generates and refines SPARQL queries for LinkedEarth datasets
Code Generation for Analysis
Video coming soon
Watch PaleoPal generate Python code for paleoclimate data analysis
Workflow Planning
Video coming soon
Explore how PaleoPal plans multi-step research workflows
Adding Your Own Demo Videos
To add demo videos to this page, replace the video placeholders with embedded video iframes. For example:
<div class="demo-item">
<h3>Your Demo Title</h3>
<div class="video-container">
<iframe
src="https://www.youtube.com/embed/YOUR_VIDEO_ID"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen>
</iframe>
</div>
</div>