Optional Dependencies Guide
This guide covers optional software components that enhance Lobster AI with specialized capabilities. None of these are required for basic functionality, but...
This guide covers optional software components that enhance Lobster AI with specialized capabilities. None of these are required for basic functionality, but they unlock advanced features for specific use cases.
Table of Contents
- Overview
- PyMOL - Protein Structure Visualization
- Docling - Advanced PDF Parsing
- Semantic Search - Ontology Matching
- System Libraries by Platform
- Testing Optional Dependencies
Overview
Lobster AI works out-of-the-box for most bioinformatics workflows. Optional dependencies add capabilities for specialized analyses:
| Dependency | Purpose | When Needed | Installation Effort |
|---|---|---|---|
| PyMOL | 3D protein structure visualization | Protein structure analysis, linking to omics data | Medium (macOS/Linux), High (Windows) |
| Docling | Advanced PDF parsing | Extracting methods from complex publications | Low (pip install) |
| Semantic Search | Ontology term matching via vector embeddings | Disease, tissue, cell type standardization | Low (pip install) |
| System Libraries | Compilation support | Native installation on Linux | Low (apt/dnf install) |
Installation Strategy:
- Start with core Lobster AI installation
- Add optional dependencies as needed for your analyses
- Use Docker if optional dependencies are difficult to install natively
PyMOL - Protein Structure Visualization
What is PyMOL?
PyMOL is an industry-standard molecular visualization system for displaying and analyzing 3D protein structures. Lobster AI integrates with PyMOL to:
- Fetch protein structures from PDB database
- Visualize structures with customizable styles
- Highlight specific residues or mutations
- Link protein structures to omics data (e.g., RNA-seq expression levels)
- Generate publication-quality structure images
Version Required: PyMOL 2.4+
When Do You Need PyMOL?
PyMOL is optional but recommended if you:
- Analyze protein-coding genes and want to visualize their 3D structures
- Study mutations and their structural context
- Need to link transcriptomics/proteomics data to protein structure
- Create figures showing protein structure for publications
Without PyMOL, Lobster AI can still:
- Perform all RNA-seq and proteomics analyses
- Download sequence data
- Run differential expression and enrichment
- Fetch protein structure metadata
Installation
macOS
Option 1: Automated (Recommended)
cd lobster
make install-pymolOption 2: Homebrew
# Add homebrew-science tap
brew install brewsci/bio/pymol
# Verify installation
pymol -c -Q
which pymolOption 3: Open-Source Build
# Install dependencies
brew install glew glm freetype libpng python@3.12
# Build from source (advanced)
git clone https://github.com/schrodinger/pymol-open-source.git
cd pymol-open-source
python setup.py build installLinux (Ubuntu/Debian)
Option 1: Package Manager (Easy)
sudo apt-get update
sudo apt-get install pymol
# Verify
which pymol
pymol --versionOption 2: Conda/Mamba
# If you use conda/mamba environments
conda install -c conda-forge pymol-open-source
# Or with mamba (faster)
mamba install -c conda-forge pymol-open-sourceOption 3: Build from Source
# Install dependencies
sudo apt-get install build-essential python3-dev \
libglew-dev libpng-dev libfreetype6-dev \
libxml2-dev libmsgpack-dev python3-pyqt5.qtopengl
# Clone and build
git clone https://github.com/schrodinger/pymol-open-source.git
cd pymol-open-source
python setup.py build install --prefix=$HOME/.localLinux (Fedora/RHEL/CentOS)
# Enable EPEL repository (if needed)
sudo dnf install epel-release
# Install PyMOL
sudo dnf install pymol
# Verify
pymol --versionWindows
⚠️ PyMOL installation on Windows is complex. We recommend using:
Option 1: Docker (Recommended)
- PyMOL is pre-installed in Lobster Docker images
- No manual setup needed
- Run:
docker-compose run --rm lobster-cli
Option 2: Windows Subsystem for Linux (WSL)
- Install WSL 2 with Ubuntu
- Follow Linux installation instructions above
- Requires X11 server (VcXsrv or Xming) for GUI
Option 3: Commercial PyMOL
- Purchase from pymol.org
- Windows installer included
- Educational licenses available
Option 4: Conda (Windows)
# Install Miniconda if not already installed
# Download from: https://docs.conda.io/en/latest/miniconda.html
# Create environment with PyMOL
conda create -n pymol-env python=3.12
conda activate pymol-env
conda install -c conda-forge pymol-open-sourceVerification
After installation, verify PyMOL works:
# Test command-line mode
pymol -c -Q -d "fetch 1AKE; quit"
# Check version
pymol --version
# Test from Lobster
lobster chat
🦞 You: /status
# Should show: "PyMOL: Available (version X.X.X)"Usage in Lobster
Once installed, PyMOL integrates seamlessly:
# Start Lobster
lobster chat
# Fetch and visualize protein structure
🦞 You: "Fetch protein structure 1AKE"
🦞 You: "Visualize 1AKE with cartoon representation"
# Link to omics data
🦞 You: "Show expression levels of ADK gene on 1AKE structure"
# Advanced styling
🦞 You: "Highlight residues 50-100 on 1AKE structure"
🦞 You: "Color 1AKE by conservation score"Troubleshooting PyMOL
Issue: pymol: command not found
Solutions:
# Check if installed
which pymol
dpkg -l | grep pymol # Ubuntu/Debian
rpm -qa | grep pymol # Fedora/RHEL
# Add to PATH (if installed but not found)
export PATH=$PATH:$HOME/.local/bin
echo 'export PATH=$PATH:$HOME/.local/bin' >> ~/.bashrc
# Reinstall
sudo apt-get install --reinstall pymolIssue: ImportError: No module named pymol
Solutions:
- Ensure virtual environment is activated
- PyMOL must be installed in same Python environment as Lobster
- Try installing via conda in the same environment
Issue: Graphics/OpenGL errors
Solutions:
# Test command-line mode (no GUI)
pymol -c -Q
# On remote servers, use headless mode
export DISPLAY=:0
Xvfb :0 -screen 0 1024x768x24 &See Protein Structure Visualization Guide for complete usage details.
Docling - Advanced PDF Parsing
What is Docling?
Docling is a professional PDF parsing library that excels at extracting structured content from scientific publications. It provides:
- >90% accuracy for Methods section detection (vs 30% with basic parsers)
- Table extraction from complex multi-column layouts
- Formula recognition and LaTeX conversion
- Figure caption extraction with context
- Multi-language support for international publications
Version Required: Docling 2.60+
When Do You Need Docling?
Docling is optional but highly recommended if you:
- Frequently extract analysis parameters from publications
- Work with complex, multi-column scientific PDFs
- Need to extract tables or figures programmatically
- Analyze methods from large sets of papers
Without Docling, Lobster AI falls back to PyPDF2:
- Basic text extraction works
- Methods section detection ~30% accurate
- No table or formula extraction
- Simple single-column PDFs work fine
Installation
Docling is a Python package, easy to install:
Basic Installation:
# Activate Lobster virtual environment
source .venv/bin/activate
# Install Docling
pip install doclingFull Installation (All Features):
# With all optional features
pip install "docling[all]"
# With specific features
pip install "docling[table]" # Table extraction
pip install "docling[ocr]" # OCR supportDocker: Docling is pre-installed in Lobster Docker images - no additional setup needed.
Verification
# Test import
python -c "from docling.document_converter import DocumentConverter; print('✅ Docling installed')"
# Check version
python -c "import docling; print(docling.__version__)"
# Test in Lobster
lobster chat
🦞 You: /status
# Should show: "Docling: Available (version X.X.X)"Usage in Lobster
Docling works automatically when installed:
lobster chat
# Extract methods from publication
🦞 You: "Extract methods from PMID:38448586"
# With Docling: Returns detailed Methods section, parameters, tables
# Without Docling: Returns basic text extraction
# Read full publication
🦞 You: "Read full text of PMID:35042229"
# Extract from local PDF
🦞 You: "Extract methods from paper.pdf in my workspace"Troubleshooting Docling
Issue: Import errors after installation
Solutions:
# Ensure in correct environment
source .venv/bin/activate
# Reinstall
pip uninstall docling
pip install --no-cache-dir "docling[all]"
# Check dependencies
pip list | grep doclingIssue: Memory errors with large PDFs
Solutions:
- Increase available RAM
- Process PDFs in smaller batches
- Use cloud mode for large-scale extraction
Issue: Poor extraction quality
Solutions:
- Ensure PDF is text-based (not scanned image)
- For scanned PDFs, install OCR support:
pip install "docling[ocr]" - Try different extraction modes in Docling settings
See Publication Intelligence Guide for advanced usage.
Semantic Search - Ontology Matching
What is Semantic Search?
Lobster AI includes an optional semantic vector search infrastructure for matching biomedical terms against standardized ontology concepts. When installed, it upgrades disease, tissue, and cell type matching from simple keyword lookup to embedding-based semantic search using SapBERT (a PubMedBERT-based model trained on 4M+ UMLS synonym pairs).
Three pre-built ontologies included:
| Ontology | Source | Coverage |
|---|---|---|
| MONDO | Monarch Disease Ontology | ~30K disease concepts |
| UBERON | Uber-anatomy Ontology | ~15K anatomy/tissue terms |
| Cell Ontology | Cell Ontology (CL) | ~2.5K cell types |
When Do You Need Semantic Search?
Semantic search is optional but recommended if you:
- Standardize disease annotations across datasets (e.g., mapping "glioblastoma" to MONDO:0018177)
- Harmonize tissue labels from different studies to UBERON terms
- Match cell type annotations to Cell Ontology concepts
- Work with datasets that use inconsistent or non-standard terminology
Without Semantic Search, Lobster AI falls back to keyword matching:
- Disease matching limited to 4 hardcoded terms (CRC, UC, CD, Healthy)
- No tissue or cell type ontology matching
- No errors — agents work normally with reduced matching quality
Installation
# Via pip
pip install 'lobster-ai[vector-search]'
# As part of full install (includes all extras)
pip install 'lobster-ai[full]'
# Via lobster init (interactive prompt)
lobster init
# → Prompts: "Semantic Search (Optional)" → Choose 1 to install
# Via lobster init (non-interactive)
lobster init --install-vector-search --anthropic-key sk-...
# Via uv tool
uv tool install 'lobster-ai[vector-search,anthropic]'Verification
# Test imports
python -c "import chromadb; import sentence_transformers; print('Semantic Search available')"
# Check in Lobster
lobster status
# Should show: "Optional Capabilities: ✓ Semantic Search"Usage in Lobster
Once installed, semantic search is used automatically by agents:
lobster chat
# Disease matching (metadata_assistant)
You: "Match these disease terms to MONDO ontology: glioblastoma, lung adenocarcinoma, T2D"
# Tissue standardization (metadata_assistant)
You: "Standardize my tissue annotations to UBERON terms"
# Cell type matching (annotation_expert)
You: "What cell types match 'CD8+ T cells' in Cell Ontology?"Disk Space Requirements
| Component | Size |
|---|---|
| chromadb + sentence-transformers packages | ~150 MB |
| SapBERT model (downloaded on first use) | ~420 MB |
| Ontology data (3 tarballs from S3) | ~150 MB |
| ChromaDB vector store | ~200 MB |
| Total first-use download | ~800 MB |
Data is cached at ~/.lobster/ontology_cache/ and ~/.lobster/vector_store/ — subsequent runs reuse the cache.
Troubleshooting Semantic Search
Issue: Model download fails or times out
Solutions:
# Check network connectivity
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('cambridgeltl/SapBERT-from-PubMedBERT-fulltext')"
# If behind a proxy, set HuggingFace cache directory
export HF_HOME=/path/with/space/.cache/huggingfaceIssue: Out of disk space
Solutions:
- Ensure at least 1 GB free space
- Check cache locations:
~/.cache/huggingface/and~/.lobster/
Issue: Slow on CPU (no GPU)
Note: SapBERT runs on CPU by default. Embedding queries take ~50ms per term — fast enough for interactive use. No GPU required.
See Semantic Search Guide for the full reference.
System Libraries by Platform
macOS
Most dependencies handled by Xcode Command Line Tools:
# Install Xcode tools (if not already installed)
xcode-select --install
# Optional: Homebrew for easier package management
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install HDF5 (optional, for larger datasets)
brew install hdf5Ubuntu/Debian
Required for native Python package compilation:
sudo apt-get update
sudo apt-get install -y \
build-essential \
python3.12-dev \
libhdf5-dev \
libxml2-dev \
libxslt1-dev \
libffi-dev \
libssl-dev \
libblas-dev \
liblapack-devWhy These Are Needed:
- build-essential: gcc, g++, make compilers
- python3.12-dev: Python header files for C extensions
- libhdf5-dev: HDF5 file format (AnnData, MuData)
- libblas/liblapack-dev: Linear algebra (NumPy, SciPy)
- libxml2/libxslt-dev: XML parsing (web scraping, GEO)
- libffi/libssl-dev: Cryptography and foreign function interface
Fedora/RHEL/CentOS
sudo dnf install -y \
gcc gcc-c++ make \
python3.12-devel \
hdf5-devel \
libxml2-devel \
libxslt-devel \
openssl-devel \
libffi-devel \
blas-devel \
lapack-develWindows
Native installation may require:
- Visual Studio Build Tools: For compiling C extensions
- Download: https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2022
- Select: "Desktop development with C++"
- Size: ~2-3 GB
Recommendation: Use Docker to avoid compilation requirements on Windows.
Testing Optional Dependencies
Check Installation Status
Lobster provides a built-in system checker:
# Run pre-installation check
python check-system.py
# Check from within Lobster
lobster chat
🦞 You: /status
# Detailed system information
🦞 You: /dashboardTest Individual Components
PyMOL:
# Command line test
pymol -c -Q -d "fetch 1AKE; quit"
# Test in Lobster
lobster chat
🦞 You: "Test PyMOL by fetching structure 1AKE"Docling:
# Python test
python -c "from docling.document_converter import DocumentConverter; print('✅ OK')"
# Test in Lobster
lobster chat
🦞 You: "Test Docling by extracting methods from a sample publication"System Libraries (Linux):
# Check installed packages
dpkg -l | grep -E 'libhdf5|libblas|liblapack|libxml2' # Ubuntu/Debian
rpm -qa | grep -E 'hdf5|blas|lapack|libxml2' # Fedora/RHEL
# Verify pkg-config can find them
pkg-config --modversion hdf5
pkg-config --libs libxml-2.0Verification Script
Create a test script to check all optional dependencies:
#!/usr/bin/env python3
"""Test optional dependencies"""
import sys
def check_pymol():
try:
import pymol
print("✅ PyMOL: Available")
return True
except ImportError:
print("⚠️ PyMOL: Not installed")
return False
def check_docling():
try:
from docling.document_converter import DocumentConverter
import docling
print(f"✅ Docling: Available (version {docling.__version__})")
return True
except ImportError:
print("⚠️ Docling: Not installed")
return False
def check_vector_search():
try:
import chromadb
import sentence_transformers
print(f"✅ Semantic Search: Available (chromadb {chromadb.__version__})")
return True
except ImportError:
print("⚠️ Semantic Search: Not installed")
print(" Install: pip install 'lobster-ai[vector-search]'")
return False
def check_system_libs():
try:
import h5py
import lxml
print("✅ System libraries: Available")
return True
except ImportError as e:
print(f"⚠️ System libraries: Missing ({e.name})")
return False
if __name__ == "__main__":
print("Checking optional dependencies...\n")
pymol_ok = check_pymol()
docling_ok = check_docling()
vector_ok = check_vector_search()
libs_ok = check_system_libs()
print("\n" + "="*50)
if pymol_ok and docling_ok and vector_ok and libs_ok:
print("✅ All optional dependencies available")
sys.exit(0)
else:
print("⚠️ Some optional dependencies missing")
print("Lobster will work with reduced functionality")
sys.exit(0)Save as check-optional.py and run: python check-optional.py
Getting Help
If you encounter issues installing optional dependencies:
-
Check platform-specific installation guide:
- macOS: See Installation Guide section on macOS
- Ubuntu/Linux: Run
./install-ubuntu.sh - Windows: Use Docker Desktop (recommended) or see Installation Guide
-
Consider Docker:
- All optional dependencies pre-installed
- No compilation required
- Run:
docker-compose run --rm lobster-cli
-
Community Support:
- GitHub Issues: https://github.com/the-omics-os/lobster/issues
- Email: info@omics-os.com
- Documentation: Troubleshooting Guide
Summary
Quick Decision Guide:
| Your Situation | Recommended Setup |
|---|---|
| Standard RNA-seq/proteomics analysis | Core Lobster only (no optional deps) |
| + Literature mining with complex PDFs | + Docling |
| + Disease/tissue/cell type standardization | + Semantic Search |
| + Protein structure analysis | + PyMOL |
| Windows user | Use Docker (includes everything) |
| Can't install PyMOL | Use cloud mode or Docker |
| Production deployment | Docker (consistent environment) |
Installation Priority:
- Start: Core Lobster AI (required)
- Add if needed: Docling for better PDF parsing (easy install)
- Add if needed: Semantic Search for ontology matching (~800 MB download)
- Add if needed: PyMOL for structure visualization (moderate effort)
- Alternative: Use Docker and get everything pre-installed
Related Documentation:
- Installation Guide - Main installation instructions
- Protein Structure Visualization - Using PyMOL
- Publication Intelligence - Using Docling
- Troubleshooting - Common issues
Last Updated: 2025-01-16
Release Notes & Migration Guides
This document provides comprehensive release notes for Lobster AI versions, covering new features, breaking changes, and recommended upgrade paths for future...
User Guide Overview
Lobster AI is a multi-agent bioinformatics analysis platform that combines specialized AI agents with proven scientific tools to analyze complex multi-om...