Developer Overview - Lobster AI Architecture
This guide provides a comprehensive introduction to developing within the Lobster AI codebase, covering architecture patterns, design principles, and develop...
🏗️ Overview
This guide provides a comprehensive introduction to developing within the Lobster AI codebase, covering architecture patterns, design principles, and development workflows. Lobster AI is a professional multi-agent bioinformatics analysis platform that combines specialized AI agents with proven scientific tools.
🎯 Core Design Principles
1. Agent-Based Architecture
- Specialized Agents: Each agent handles specific bioinformatics domains (transcriptomics, proteomics)
- Entry Point Discovery: Agents discovered via
ComponentRegistryandlobster.agentsentry points - Natural Language Interface: Users describe analyses in plain English
2. Modular Service Design
- Stateless Services: All analysis services are stateless and return
(processed_adata, statistics_dict) - Separation of Concerns: Agents coordinate workflows, services handle computation
- Reusable Components: Services can be used independently or composed in workflows
3. Multi-Modal Data Management
- DataManagerV2: Centralized orchestrator for multi-omics data with modality management
- Professional Naming: Consistent naming conventions for dataset versions and analysis stages
- Provenance Tracking: W3C-PROV compliant analysis history for reproducibility
4. Cloud/Local Hybrid Architecture
- BaseClient Interface: Consistent API for local and cloud execution
- Seamless Switching: Automatic detection and fallback between cloud and local modes
- Unified CLI: Single interface supporting both execution environments
🏛️ Architecture Components
Core Directories
lobster/
├── agents/ # Specialized AI agents for bioinformatics domains
├── core/ # Data management, client infrastructure, interfaces
├── tools/ # Stateless analysis services
├── config/ # Configuration management and agent registry
├── cli.py # Modern terminal interface with autocomplete
└── utils/ # Shared utilities and loggingKey Architectural Patterns
1. Agent Discovery Pattern (v1.0.0+)
# Each agent package defines AGENT_CONFIG at module top
from lobster.core.registry import AgentRegistryConfig
AGENT_CONFIG = AgentRegistryConfig(
name='data_expert_agent',
display_name='Data Expert',
description='Handles data loading and management',
factory_function='lobster_research.agents.data_expert:create_data_expert',
tier_requirement='free',
package_name='lobster-research',
handoff_tool_name='handoff_to_data_expert',
)
# Registered via pyproject.toml entry points:
# [project.entry-points."lobster.agents"]
# data_expert_agent = "lobster_research.agents.data_expert:AGENT_CONFIG"2. Service Pattern
class QualityService:
"""Stateless service for data quality assessment."""
def assess_quality(self, adata: anndata.AnnData, **params) -> Tuple[anndata.AnnData, Dict]:
"""
Returns:
Tuple of (processed_adata, statistics_dict)
"""
# Stateless processing logic
return processed_adata, statistics3. Agent Tool Pattern
@tool
def assess_data_quality(modality_name: str, **params) -> str:
"""Standard pattern for all agent tools."""
# 1. Validate modality exists
if modality_name not in data_manager.list_modalities():
raise ModalityNotFoundError(f"Modality '{modality_name}' not found")
# 2. Get data and call stateless service
adata = data_manager.get_modality(modality_name)
result_adata, stats = service.assess_quality(adata, **params)
# 3. Store results with descriptive naming
new_modality = f"{modality_name}_quality_assessed"
data_manager.modalities[new_modality] = result_adata
# 4. Log operation for provenance
data_manager.log_tool_usage("assess_data_quality", params, stats)
return formatted_response(stats, new_modality)4. Client Adapter Pattern
# lobster/core/interfaces/base_client.py
class BaseClient(ABC):
@abstractmethod
def query(self, user_input: str, stream: bool = False) -> Dict[str, Any]:
pass
@abstractmethod
def get_status(self) -> Dict[str, Any]:
pass
# Implementations: AgentClient (local), CloudLobsterClient (cloud)🔧 Development Setup
1. Environment Setup
# Clone repository
git clone <repository-url>
cd lobster
# Install development dependencies
make dev-install
# Activate environment
source .venv/bin/activate
# Verify installation
python -m lobster --help2. Required Environment Variables
# Required API Keys
export AWS_BEDROCK_ACCESS_KEY="your-aws-access-key"
export AWS_BEDROCK_SECRET_ACCESS_KEY="your-aws-secret-key"
# Optional
export NCBI_API_KEY="your-ncbi-api-key"
export LOBSTER_CLOUD_KEY="your-cloud-api-key" # Enables cloud mode3. Development Commands
# Run all tests
make test
# Fast parallel testing
make test-fast
# Code formatting
make format
# Linting
make lint
# Type checking
make type-check
# Start CLI
lobster chat🧪 Scientific Workflows
Professional Naming Convention
geo_gse12345 # Raw downloaded data
├── geo_gse12345_quality_assessed # QC metrics added
├── geo_gse12345_filtered_normalized # Preprocessed data
├── geo_gse12345_doublets_detected # Doublet annotations
├── geo_gse12345_clustered # Leiden clustering + UMAP
├── geo_gse12345_markers # Differential expression
├── geo_gse12345_annotated # Cell type annotations
└── geo_gse12345_pseudobulk # Aggregated for DE analysisData Flow Architecture
User Input (CLI)
↓
LobsterClientAdapter → BaseClient (AgentClient | CloudLobsterClient)
↓
Agent Registry → Specialized Agent (data_expert, transcriptomics_expert, etc.)
↓
Agent Tools → Stateless Services (QualityService, ClusteringService, etc.)
↓
DataManagerV2 → Modality Management → Storage Backends (H5AD, MuData)
↓
Results → CLI Response with Visualizations🎨 Code Style Guidelines
1. Python Standards
- Follow PEP 8 style guidelines
- Use type hints for all functions and methods
- Line length: 88 characters (Black formatting)
- Comprehensive docstrings for all public functions
2. Scientific Accuracy
- Prioritize scientific accuracy over performance optimizations
- Include comprehensive QC metrics at each analysis step
- Support batch effect detection and correction
- Implement proper missing value handling strategies
3. Error Handling
# Use specific exceptions
class ModalityNotFoundError(Exception):
pass
class ServiceError(Exception):
pass
# Proper error handling in tools
try:
result = service.process(data)
except ServiceError as e:
logger.error(f"Service error: {e}")
return f"Analysis failed: {str(e)}"🚀 Development Workflow
1. Adding New Features
- Design First: Consider how the feature fits into existing patterns
- Use Entry Points: For agents, register via entry points instead of manual graph edits
- Follow Patterns: Use established service, tool, and adapter patterns
- Test Thoroughly: Include unit, integration, and scientific validation tests
- Document: Update relevant documentation files
2. Code Quality Checklist
- Type hints on all functions
- Comprehensive docstrings
- Error handling with specific exceptions
- Unit tests with 80%+ coverage
- Integration tests with real data
- Scientific validation where applicable
- CLI compatibility (local and cloud)
3. Pre-commit Hooks
# Install pre-commit hooks
pre-commit install
# Run manually
pre-commit run --all-files📊 Performance Considerations
1. Memory Management
- Use memory-efficient data loading for large datasets
- Implement lazy loading where possible
- Monitor memory usage in long-running analyses
2. Computation Optimization
- Leverage GPU acceleration when available (ScVI, rapids)
- Use efficient algorithms for large-scale data
- Implement progress tracking for long operations
3. Caching Strategy
- File operations: 60s cache for cloud, 10s for local
- Intelligent caching for expensive computations
- Clear cache invalidation strategies
🔍 Debugging and Troubleshooting
1. Common Issues
- Import Errors: Check environment activation and dependencies
- Agent Registry: Verify factory function paths are correct
- Data Loading: Check file permissions and formats
- Cloud Integration: Verify API keys and network connectivity
2. Debugging Tools
# Use structured logging
from lobster.utils.logger import get_logger
logger = get_logger(__name__)
# Enable debug mode
logger.setLevel(logging.DEBUG)
# Check system status
lobster chat
/status3. Testing Connectivity
# Test agent discovery
lobster agents list
# Test in Python
python -c "from lobster.core.registry import ComponentRegistry; print(ComponentRegistry().list_agents())"
# Test CLI with both clients
LOBSTER_CLOUD_KEY="" python -m lobster chat # Local mode
LOBSTER_CLOUD_KEY="key" python -m lobster chat # Cloud mode📚 Further Reading
- Creating Agents Guide - Detailed agent development
- Creating Services Guide - Service implementation patterns
- Creating Adapters Guide - Data adapter development
- Testing Guide - Comprehensive testing framework
- CLAUDE.md - Complete architectural documentation
🎯 Quick Reference
Key Files to Know
lobster/core/registry.py- ComponentRegistry for agent discoverylobster/core/interfaces/base_client.py- Client interface definitionlobster/core/data_manager_v2.py- Multi-modal data orchestratorlobster/cli.py- CLI implementation with autocompletetests/conftest.py- Test configuration and fixtures
Essential Commands
make dev-install # Development setup
make test # Run all tests
lobster chat # Start interactive CLI
/help # Show available commands
/status # System status
/files # List workspace filesThis overview provides the foundation for contributing to Lobster AI. Each component follows established patterns that promote consistency, maintainability, and scientific rigor.
Maintaining Documentation - Wiki Maintenance Guide
This guide explains how to maintain the Lobster AI wiki documentation using the automated quality systems.
Testing Guide - Lobster AI Testing Framework
This guide provides comprehensive documentation for the Lobster AI testing framework, targeting 95%+ code coverage with scientifically accurate testing scena...