User Guide Overview
Lobster AI is a multi-agent bioinformatics analysis platform that combines specialized AI agents with proven scientific tools to analyze complex multi-om...
How Lobster AI Works
Lobster AI is a multi-agent bioinformatics analysis platform that combines specialized AI agents with proven scientific tools to analyze complex multi-omics data. Instead of requiring users to learn complex software interfaces or programming languages, Lobster AI allows researchers to interact with their data using natural language.
Core Philosophy
Natural Language Interface: Simply describe what you want to do with your data:
- "Analyze the single-cell RNA-seq data and identify cell types"
- "Compare gene expression between treatment groups using DESeq2"
- "Generate a quality control report for my proteomics data"
- "Find datasets similar to mine in GEO database"
Agent-Based Architecture: Each agent specializes in specific analysis types:
- Single-Cell Expert: Handles scRNA-seq analysis, clustering, cell annotation
- Bulk RNA-seq Expert: Performs differential expression with pyDESeq2
- MS Proteomics Expert: Analyzes mass spectrometry data with database search
- Affinity Proteomics Expert: Processes Olink and antibody array data
- Protein Structure Expert (v0.2+): Fetches and visualizes 3D protein structures with PyMOL
- Data Expert: Manages file loading, format conversion, and GEO downloads
- Research Agent: Mines literature and identifies relevant datasets with Docling parsing (v0.2+)
How It Works
- Load Your Data: Use simple commands like
/read data.h5ador ask "Load my single-cell data" - Natural Language Analysis: Describe your analysis goals in plain English
- Agent Coordination: The system routes your request to the appropriate specialist agent
- Scientific Processing: Agents use established bioinformatics tools (scanpy, DESeq2, etc.)
- Interactive Results: View results, generate plots, and iterate on analysis
Key Features
Multi-Modal Data Support
- Single-cell RNA-seq: 10X, H5AD, CSV formats
- Bulk RNA-seq: Count matrices, normalized data
- Mass Spectrometry Proteomics: MaxQuant, Spectronaut output
- Affinity Proteomics: Olink NPX, antibody arrays
- Multi-omics: Integrated analysis across data types
Professional Analysis Workflows
- Quality Control: Automated QC metrics and visualizations
- Normalization: Method-appropriate normalization strategies
- Statistical Analysis: Proper statistical testing with FDR correction
- Visualization: Publication-quality interactive plots
- Reproducibility: Complete analysis provenance tracking
Advanced Capabilities
- Literature Integration: Automatic parameter extraction from publications (v0.2+ with Docling parsing)
- GEO Database Access: Download and analyze public datasets with robust queue system (v0.2+)
- Cloud/Local Flexibility: Seamless switching between execution modes
- Formula-Guided Analysis: R-style statistical formulas for complex designs (v0.2+)
- Protein Structure Visualization: 3D structure analysis with PyMOL integration (v0.2+)
- Two-Tier Caching: 30-50x speedup on repeat content access (v0.2+)
What's New in v0.2: ContentAccessService with provider infrastructure, protein structure visualization, download queue system, and enhanced caching. See Migration Guide for details.
Understanding Agent Responses
Agent Communication Patterns
Clarifying Questions: Agents may ask for clarification:
"I see you have single-cell data. Would you like me to:
1. Perform quality control analysis
2. Identify cell clusters and types
3. Find differentially expressed genes
4. All of the above in a complete workflow?"Status Updates: Agents provide progress information:
"Loading data... ✓
Calculating QC metrics... ✓
Filtering low-quality cells... ✓
Normalizing expression data... ✓"Recommendations: Agents suggest next steps:
"Analysis complete! Based on your data characteristics, I recommend:
- Examining cluster markers for cell type annotation
- Running trajectory analysis for developmental processes
- Performing differential expression between conditions"Understanding Analysis Results
Data Summaries
Agents provide structured summaries of your data:
- Shape: Number of observations (cells/samples) × variables (genes/proteins)
- Quality Metrics: Missing values, outliers, batch effects
- Processing Status: What analysis steps have been completed
Statistical Results
Results include appropriate statistical context:
- Significance Testing: P-values with multiple testing correction
- Effect Sizes: Log fold changes, confidence intervals
- Sample Sizes: Power calculations and adequacy assessments
Visualizations
Plots are automatically generated with:
- Scientific Accuracy: Proper scaling, error bars, statistical annotations
- Publication Quality: High-resolution, well-labeled plots
- Interactivity: Zoom, pan, hover information in HTML plots
Natural Language Interaction Patterns
Effective Communication
Be Specific About Goals:
- ✅ "Compare gene expression between control and treatment groups"
- ❌ "Analyze my data"
Provide Context:
- ✅ "I have single-cell RNA-seq data from mouse liver samples"
- ❌ "Here's my data file"
Ask for Explanations:
- ✅ "Why did you choose these normalization parameters?"
- ✅ "Can you explain the statistical test you used?"
Common Request Types
Exploratory Analysis:
- "Give me an overview of this dataset"
- "What does the data quality look like?"
- "Show me the main patterns in the data"
Specific Analysis:
- "Find differentially expressed genes between conditions"
- "Identify cell types in this single-cell data"
- "Perform pathway enrichment analysis"
Comparative Analysis:
- "Compare my results to similar studies"
- "Find public datasets like mine"
- "How do these results compare to the literature?"
Method Guidance:
- "What's the best normalization method for this data?"
- "How should I handle batch effects?"
- "What statistical test is appropriate here?"
Working with Results
Data Management
- Modalities: Data is organized by biological modality (transcriptomics, proteomics, etc.)
- Provenance: Complete history of analysis steps and parameters
- Versioning: Multiple processing stages saved with descriptive names
Visualization System
- Interactive Plots: HTML plots with zoom, pan, hover information
- Static Exports: PNG versions for publications
- Plot History: All generated plots saved and accessible
- Custom Styling: Scientific color schemes and layouts
Export and Sharing
- Data Packages: Complete analysis bundles with data, plots, and metadata
- Session Export: Save and restore analysis sessions
- Publication Formats: Export in formats suitable for papers and presentations
Getting Started Tips
First Steps
- Load Data: Start with
/read filenameor describe your data - Explore: Ask "What does this data look like?" or use
/datacommand - Analyze: Describe your research question in natural language
- Iterate: Refine analysis based on results and agent suggestions
Best Practices
- Start Broad: Begin with exploratory analysis before specific tests
- Ask Questions: Agents are designed to explain their methods and reasoning
- Iterate Gradually: Build analysis step-by-step rather than all at once
- Save Progress: Use
/saveto preserve important analysis states
Common Workflows
- Data Loading → Quality Control → Normalization → Analysis → Visualization
- Literature Review → Parameter Selection → Statistical Testing → Validation
- Exploratory Analysis → Hypothesis Formation → Targeted Testing → Results Integration
When Things Go Wrong
- Check Data Format: Ensure files are in supported formats
- Verify File Paths: Use absolute paths or check current directory
- Review Error Messages: Agents provide detailed error explanations
- Ask for Help: Use
/helpor ask "How do I..." questions
Advanced Features
Multi-Agent Coordination
Agents automatically hand off tasks to specialists:
- Data loading requests go to the Data Expert
- Statistical analysis goes to appropriate domain expert
- Literature searches go to the Research Agent
Cloud Integration
- Seamless Switching: Same interface for local and cloud execution
- Scalability: Handle larger datasets in cloud environment
- Collaboration: Share analyses across teams
Extensibility
- Custom Workflows: Combine multiple analysis types
- Parameter Optimization: Agents suggest optimal settings
- Method Comparison: Evaluate different analytical approaches
This overview provides the conceptual foundation for using Lobster AI. For detailed command references and specific workflows, see the following sections of this user guide.
Optional Dependencies Guide
This guide covers optional software components that enhance Lobster AI with specialized capabilities. None of these are required for basic functionality, but...
Semantic Search & Ontology Matching
Match biomedical terms to standardized ontology concepts using vector embeddings