Examples Cookbook
This comprehensive cookbook provides practical code snippets, analysis recipes, and real-world solutions for common bioinformatics tasks using Lobster AI. Ea...
This comprehensive cookbook provides practical code snippets, analysis recipes, and real-world solutions for common bioinformatics tasks using Lobster AI. Each example includes complete workflows, expected outputs, and troubleshooting tips.
Table of Contents
- Quick Start Recipes
- Data Loading & Management
- Example Datasets Reference
- Single-Cell Analysis Recipes
- Bulk RNA-seq Workflows
- Proteomics Analysis Patterns
- Multi-Omics Integration
- Visualization Recipes
- Advanced Analysis Techniques
- Automation & Scripting
- Performance Optimization
Quick Start Recipes
🚀 Basic Analysis Pipeline
# Complete single-cell analysis in 5 commands
🦞 You: "Download GSE109564 from GEO"
🦞 You: "Assess quality and filter low-quality cells"
🦞 You: "Normalize, find variable genes, and cluster cells"
🦞 You: "Find marker genes and annotate cell types"
🦞 You: "Create comprehensive visualization dashboard"🧬 Proteomics Quick Analysis
# MS proteomics analysis pipeline
🦞 You: "Load MaxQuant proteinGroups.txt file"
🦞 You: "Perform quality control with missing value analysis"
🦞 You: "Apply log2 transformation and normalization"
🦞 You: "Run differential expression analysis treatment vs control"
🦞 You: "Generate volcano plots and pathway analysis"📊 Bulk RNA-seq Differential Expression
# Bulk RNA-seq with complex design
🦞 You: "Load counts.csv and metadata.csv with treatment, batch, and time factors"
🦞 You: "Design matrix using formula: ~treatment + batch + time + treatment:time"
🦞 You: "Run pyDESeq2 differential expression analysis"
🦞 You: "Test specific contrasts and create visualizations"Data Loading & Management
Loading Different Data Formats
GEO Datasets
# Download and load GEO datasets
🦞 You: "Download GSE12345 from GEO and show dataset metadata"
🦞 You: "Download multiple datasets: GSE11111, GSE22222, GSE33333"
🦞 You: "Search GEO for single-cell datasets related to cancer immunotherapy"Pre-Download Metadata Validation (v0.2+)
Recommended Practice: Validate dataset metadata before downloading to save time and ensure datasets contain required fields.
Basic Validation - Check Required Fields
# Validate that a dataset has required metadata fields
🦞 You: "Validate GSE200997 for required fields: cell_type, tissue"
# Expected output:
## Metadata Validation Report for GSE200997
**Recommendation:** ✅ **PROCEED**
**Confidence Score:** 1.00/1.00
**Total Samples:** 23
### Field Analysis:
- **cell_type**: ✅ 100.0% coverage (values: 'Colon,Right,Cecum', 'Colon,Left,Sigmoid', ...)
- **tissue**: ✅ 100.0% coverage (values: 'Colorectal cancer')
### 💡 Recommendation Rationale:
All required fields are present with sufficient coverage. Dataset is suitable for analysis.Validation with Specific Values - Drug Discovery
# Check if dataset has treatment response field with specific values
🦞 You: "Check if GSE179994 has treatment_response field with responder and non-responder values"
# This validates both field presence AND value content
# Useful for drug discovery and biomarker studiesComparing Multiple Datasets
# Real-world scenario: Find best dataset for smoking study
🦞 You: "Search GEO for lung cancer single-cell datasets"
# Returns: GSE131907, GSE139555, GSE148071
# Validate each dataset for required metadata
🦞 You: "Validate GSE131907 for required fields: smoking_status, cancer_stage, treatment_history"
# Result: ⚠️ MANUAL_CHECK - Only 60% samples have smoking_status
🦞 You: "Validate GSE139555 for required fields: smoking_status, cancer_stage, treatment_history"
# Result: ✅ PROCEED - 100% coverage for all fields
🦞 You: "Validate GSE148071 for required fields: smoking_status, cancer_stage, treatment_history"
# Result: ❌ SKIP - Missing smoking_status field entirely
# Decision: Download GSE139555 based on metadata validation
🦞 You: "Download GSE139555 and prepare for analysis"Time-Series Study Example
# Validate dataset has required time point information
🦞 You: "Validate GSE145281 for required fields: time_point, treatment, replicate"
# Check specific time point values are present
🦞 You: "Check if GSE145281 has time_point field with values: 0h, 6h, 12h, 24h"Understanding Validation Results
Recommendation Types:
- ✅ PROCEED (Confidence ≥0.8): All required fields present with ≥80% coverage
- ⚠️ MANUAL_CHECK (Confidence 0.5-0.8): Partial coverage between 50-80%
- ❌ SKIP (Confidence <0.5): Missing critical fields or <50% coverage
Benefits:
- ⏱️ Save time: 2-5 seconds validation vs 5-30 minutes full download
- 💾 Save storage: Avoid downloading datasets missing critical metadata
- 🎯 Better selection: Compare metadata across multiple candidates
- 📊 Field coverage: See actual sample-level completeness
Common Use Cases:
- Drug discovery: Validate treatment response fields
- Biomarker studies: Check clinical outcome metadata
- Multi-dataset analysis: Filter by metadata completeness
- Time series: Verify timepoint field exists
Local Files
# Load various file formats
🦞 You: "Load the H5AD file from /path/to/data.h5ad"
🦞 You: "Load 10X data from /path/to/10x/directory with matrix.mtx, barcodes.tsv, features.tsv"
🦞 You: "Load CSV file with first column as gene names and samples as columns"
🦞 You: "Load Excel file from sheet 'RNAseq_counts' with genes as rows"Proteomics Files
# Load proteomics data
🦞 You: "Load MaxQuant proteinGroups.txt file from /path/to/file"
🦞 You: "Load Olink NPX data from olink_results.xlsx"
🦞 You: "Load Spectronaut output with protein intensity values"Data Management Commands
# Workspace management
🦞 You: "/files" # List all loaded files
🦞 You: "/data" # Show current dataset info
🦞 You: "/workspace" # Show workspace status
🦞 You: "/tree" # Directory tree view
# Data operations
🦞 You: "/read filename.csv" # Read and display file contents
🦞 You: "/plots" # List generated visualizations
🦞 You: "/export results" # Export analysis resultsExample Datasets Reference
This section provides curated, publicly accessible datasets for learning Lobster AI. Each dataset includes quickstart commands, suggested analyses, and cross-references to detailed tutorials. All datasets are hosted on GEO and can be downloaded directly through Lobster.
Single-Cell RNA-seq Datasets
GSE109564 - Peripheral Blood Mononuclear Cells (PBMC)
Details:
- Organism: Human
- Technology: 10X Chromium (3' v2)
- Cells: ~15,000 cells across 8 samples
- Samples: Healthy donors, replicate experiments
- Description: High-quality PBMC dataset ideal for learning cell type annotation and clustering
Quickstart:
🦞 You: "Download GSE109564 from GEO"
🦞 You: "Assess quality and filter low-quality cells"
🦞 You: "Normalize, find variable genes, and cluster cells"
🦞 You: "Find marker genes and annotate cell types"Suggested Analyses:
- Cell type annotation (T cells, B cells, monocytes, NK cells)
- Quality control and filtering
- Batch effect assessment across replicates
- UMAP visualization with cell type labels
Tutorial Reference: Single-Cell Tutorial
GSE131907 - Lung Cancer with Smoking Status
Details:
- Organism: Human
- Technology: 10X Chromium
- Cells: ~52,000 cells
- Samples: 44 patients (lung cancer vs normal)
- Description: Comprehensive lung cancer dataset with smoking status metadata
Quickstart:
🦞 You: "Validate GSE131907 for required fields: smoking_status, cancer_stage"
🦞 You: "Download GSE131907 from GEO"
🦞 You: "Compare cell type composition between cancer and normal samples"
🦞 You: "Find differentially expressed genes in tumor-infiltrating immune cells"Suggested Analyses:
- Cell type annotation (epithelial, immune, stromal)
- Differential composition analysis (cancer vs normal)
- Smoking-associated gene expression changes
- Tumor microenvironment characterization
Tutorial Reference: Single-Cell Tutorial
GSE139555 - Colorectal Cancer Organoids
Details:
- Organism: Human
- Technology: 10X Chromium
- Cells: ~23,000 cells
- Samples: Patient-derived organoids
- Description: Cancer organoid model with treatment response metadata
Quickstart:
🦞 You: "Validate GSE139555 for required fields: cell_type, tissue"
🦞 You: "Download GSE139555 from GEO"
🦞 You: "Identify stem cell populations in organoids"
🦞 You: "Analyze differentiation trajectories using pseudotime"Suggested Analyses:
- Trajectory analysis (stem cell → differentiated)
- Cell cycle analysis
- Treatment response biomarkers
- Organoid vs primary tissue comparison
Tutorial Reference: Single-Cell Tutorial, Section on Trajectory Analysis
Bulk RNA-seq Datasets
GSE180759 - Time Series Treatment Response
Details:
- Organism: Human
- Technology: Illumina HiSeq
- Samples: 32 samples (4 timepoints × 2 conditions × 4 replicates)
- Description: Drug treatment time series with matched controls
Quickstart:
🦞 You: "Download GSE180759 from GEO"
🦞 You: "Design matrix for time series: ~treatment + time + treatment:time"
🦞 You: "Run pyDESeq2 differential expression analysis"
🦞 You: "Identify genes with different temporal patterns between treatment and control"Suggested Analyses:
- Time series differential expression
- Treatment × time interaction effects
- Temporal gene clustering
- Pathway enrichment analysis
Tutorial Reference: Bulk RNA-seq Tutorial, Section on Time Series Analysis
GSE165595 - Multi-Factor Experimental Design
Details:
- Organism: Mouse
- Technology: Illumina NovaSeq
- Samples: 48 samples (3 genotypes × 2 treatments × 8 replicates)
- Description: Complex factorial design with genotype and treatment factors
Quickstart:
🦞 You: "Download GSE165595 from GEO"
🦞 You: "Create design matrix: ~genotype + treatment + genotype:treatment"
🦞 You: "Test genotype×treatment interaction effects"
🦞 You: "Generate volcano plots for each contrast"Suggested Analyses:
- Factorial design analysis
- Interaction effect testing
- Genotype-specific treatment responses
- Multi-contrast comparisons
Tutorial Reference: Bulk RNA-seq Tutorial, Section on Complex Designs
Mass Spectrometry Proteomics Datasets
PXD020394 - DIA Plasma Proteomics
Details:
- Organism: Human
- Technology: DIA (Orbitrap)
- Samples: 40 samples (disease vs control)
- Description: Data-independent acquisition workflow with moderate missing values
Quickstart:
🦞 You: "Load Spectronaut output from PXD020394 processed files"
🦞 You: "Analyze missing value patterns and apply MNAR imputation"
🦞 You: "Perform TMM normalization"
🦞 You: "Run limma differential expression disease vs control"Suggested Analyses:
- Missing value analysis and imputation
- Batch effect correction
- Differential protein expression
- Pathway enrichment analysis
Tutorial Reference: Proteomics Tutorial, Section on MS Proteomics
Affinity Proteomics Datasets
Olink Explore Panel - Inflammation Study
Details:
- Organism: Human
- Technology: Olink Explore 3072
- Samples: 100 samples (multiple inflammatory conditions)
- Description: High-throughput targeted proteomics with low missing values
Quickstart:
🦞 You: "Load Olink NPX data from inflammation_study.xlsx"
🦞 You: "Calculate coefficient of variation for all proteins"
🦞 You: "Run ANOVA across multiple conditions"
🦞 You: "Create heatmap of significant proteins"Suggested Analyses:
- Quality assessment (CV, detection frequency)
- Multi-group comparison (ANOVA)
- Protein correlation networks
- Inflammation pathway analysis
Tutorial Reference: Proteomics Tutorial, Section on Affinity Proteomics
Dataset Selection Tips
For Learning Basics:
- Start with GSE109564 (PBMC) for single-cell analysis
- Use GSE180759 for bulk RNA-seq time series
- Try Olink data for proteomics (low missing values, easier QC)
For Advanced Workflows:
- GSE131907 (lung cancer) for complex metadata validation
- GSE165595 (multi-factor) for interaction effects
- PXD020394 (DIA) for missing value handling
For Multi-Omics Integration:
- Look for paired datasets with matching samples
- Search GEO for studies with both transcriptomics and proteomics
- Use
/searchcommand:🦞 You: "Search GEO for paired RNA-seq and proteomics data"
Validation Before Download:
# Always validate metadata first (v0.2+)
🦞 You: "Validate GSE##### for required fields: field1, field2"
# Expected output shows:
# - Recommendation: ✅ PROCEED, ⚠️ MANUAL_CHECK, or ❌ SKIP
# - Confidence score (0-1)
# - Field coverage percentages
# - Sample unique valuesThis saves time by ensuring datasets have required metadata before downloading (5-30 minutes saved per validation).
Single-Cell Analysis Recipes
Quality Control Patterns
Standard QC Pipeline
🦞 You: "Calculate QC metrics including mitochondrial genes, ribosomal genes, and total UMI counts"
🦞 You: "Generate QC violin plots showing distributions across samples"
🦞 You: "Identify outlier cells with >25% mitochondrial genes or <200 total genes"
🦞 You: "Filter cells and genes based on QC thresholds"Advanced QC
🦞 You: "Detect doublets using scrublet algorithm"
🦞 You: "Analyze batch effects using PCA and show batch contributions"
🦞 You: "Calculate and visualize sample mixing scores"
🦞 You: "Generate comprehensive QC report with all metrics"Preprocessing Recipes
Basic Preprocessing
🦞 You: "Normalize to 10,000 UMI per cell and log-transform"
🦞 You: "Find highly variable genes using seurat method with 2000 genes"
🦞 You: "Scale data and regress out mitochondrial gene effects"Batch Correction
🦞 You: "Apply Harmony batch correction for samples from different batches"
🦞 You: "Use Combat for batch correction and compare before/after PCA plots"
🦞 You: "Apply scanorama integration for multiple samples"Clustering & Annotation
Basic Clustering
🦞 You: "Perform PCA with 50 components and generate elbow plot"
🦞 You: "Build neighbor graph with 15 neighbors and compute UMAP"
🦞 You: "Run Leiden clustering with resolution 0.5 and evaluate cluster stability"Advanced Clustering
🦞 You: "Test multiple clustering resolutions from 0.1 to 2.0 and compare results"
🦞 You: "Perform hierarchical clustering and cut dendrogram at different levels"
🦞 You: "Use Louvain clustering and compare with Leiden results"Cell Type Annotation
🦞 You: "Find marker genes for each cluster using Wilcoxon test"
🦞 You: "Annotate clusters using canonical immune cell markers"
🦞 You: "Use automated cell type annotation with CellTypist"
🦞 You: "Create manual annotation based on expert knowledge"Trajectory Analysis
🦞 You: "Infer pseudotime using diffusion pseudotime (DPT)"
🦞 You: "Perform RNA velocity analysis to show differentiation dynamics"
🦞 You: "Create trajectory plots showing cellular transitions"
🦞 You: "Identify genes that change along the trajectory"Bulk RNA-seq Workflows
Experimental Design Recipes
Simple Two-Group Comparison
🦞 You: "Design matrix for treatment vs control comparison"
🦞 You: "Run DESeq2 with formula ~condition"
🦞 You: "Generate MA plot and volcano plot"
🦞 You: "Export significant genes with log2FC > 1 and FDR < 0.05"Multi-Factor Design
🦞 You: "Create design matrix: ~treatment + sex + age + treatment:sex"
🦞 You: "Test main effect of treatment controlling for sex and age"
🦞 You: "Test treatment×sex interaction term"
🦞 You: "Generate contrast for treatment effect in females only"Time Course Analysis
🦞 You: "Model time as continuous variable: ~condition + time + condition:time"
🦞 You: "Identify genes with linear temporal changes"
🦞 You: "Find genes with different temporal patterns between conditions"
🦞 You: "Cluster genes by temporal expression profiles"Batch Effect Handling
🦞 You: "Include batch in design: ~batch + condition"
🦞 You: "Apply ComBat batch correction before analysis"
🦞 You: "Use RUVSeq to identify and remove unwanted variation"
🦞 You: "Compare results with and without batch correction"Statistical Analysis Patterns
Multiple Contrasts
🦞 You: "Define custom contrasts: early_treatment, late_treatment, time_effect"
🦞 You: "Test all pairwise comparisons between 4 conditions"
🦞 You: "Apply different FDR thresholds: 0.01, 0.05, 0.1"
🦞 You: "Compare results across different statistical methods"Effect Size Analysis
🦞 You: "Calculate effect sizes (Cohen's d) for all significant genes"
🦞 You: "Filter results by both significance and effect size"
🦞 You: "Generate effect size distribution plots"
🦞 You: "Identify genes with large effects but moderate significance"Proteomics Analysis Patterns
MS Proteomics Workflows
Data Preprocessing
🦞 You: "Load MaxQuant data and assess missing value patterns"
🦞 You: "Apply MNAR imputation for low-abundance proteins"
🦞 You: "Perform TMM normalization and batch correction"
🦞 You: "Filter proteins present in >50% of samples"Differential Analysis
🦞 You: "Run limma differential analysis with empirical Bayes"
🦞 You: "Test multiple contrasts with appropriate FDR correction"
🦞 You: "Generate volcano plots with protein annotations"
🦞 You: "Export results with UniProt annotations"Affinity Proteomics (Olink)
Quality Assessment
🦞 You: "Calculate coefficient of variation for all proteins"
🦞 You: "Assess antibody performance metrics"
🦞 You: "Generate QC dashboard with detection frequencies"
🦞 You: "Identify failed samples and low-quality antibodies"Statistical Analysis
🦞 You: "Perform ANOVA across multiple conditions"
🦞 You: "Run pairwise t-tests with multiple testing correction"
🦞 You: "Generate heatmap of significant proteins"
🦞 You: "Create protein correlation network"Multi-Omics Integration
RNA-seq + Proteomics Integration
Correlation Analysis
🦞 You: "Correlate mRNA and protein levels for matched genes"
🦞 You: "Identify genes with high RNA-protein correlation (r > 0.7)"
🦞 You: "Find proteins regulated post-translationally (low correlation)"
🦞 You: "Generate RNA vs protein scatter plots"Pathway-Level Integration
🦞 You: "Perform pathway analysis on both RNA and protein data"
🦞 You: "Identify pathways significant in both datasets"
🦞 You: "Create integrated pathway heatmaps"
🦞 You: "Generate multi-omics pathway networks"Single-Cell + Spatial Integration
🦞 You: "Map single-cell clusters to spatial locations"
🦞 You: "Identify spatial patterns of cell type distribution"
🦞 You: "Analyze cell-cell communication in spatial context"
🦞 You: "Generate integrated spatial-molecular visualizations"Visualization Recipes
Basic Plotting Commands
Single-Cell Visualizations
🦞 You: "Create UMAP plot colored by cell type annotations"
🦞 You: "Generate violin plots of marker genes by cluster"
🦞 You: "Create feature plots showing gene expression on UMAP"
🦞 You: "Make dotplot of top marker genes by cell type"Bulk RNA-seq Plots
🦞 You: "Create volcano plot with gene labels for top hits"
🦞 You: "Generate MA plot showing fold-change vs abundance"
🦞 You: "Make heatmap of top 50 differentially expressed genes"
🦞 You: "Create PCA plot colored by experimental conditions"Proteomics Visualizations
🦞 You: "Generate missing value heatmap for MS data"
🦞 You: "Create volcano plot for protein differential expression"
🦞 You: "Make correlation network of significantly changed proteins"
🦞 You: "Generate QC dashboard for Olink data"Advanced Visualization Techniques
Interactive Dashboards
🦞 You: "Create interactive dashboard with multiple panels"
🦞 You: "Generate plotly-based exploration interface"
🦞 You: "Make filterable data tables with visualizations"
🦞 You: "Create animated plots showing temporal changes"Publication-Ready Figures
🦞 You: "Export high-resolution figures (300 DPI) in SVG format"
🦞 You: "Create multi-panel figures with consistent styling"
🦞 You: "Generate figures with publication-appropriate fonts and colors"
🦞 You: "Export figure data for manual customization"Advanced Analysis Techniques
Machine Learning Integration
Dimensionality Reduction
🦞 You: "Apply t-SNE with different perplexity values"
🦞 You: "Use UMAP with custom distance metrics"
🦞 You: "Perform diffusion maps for trajectory inference"
🦞 You: "Apply autoencoders for non-linear dimension reduction"Classification and Prediction
🦞 You: "Train classifier to predict cell types from expression"
🦞 You: "Build regression model for continuous phenotypes"
🦞 You: "Perform cross-validation and assess model performance"
🦞 You: "Use feature selection to identify predictive genes"Network Analysis
🦞 You: "Build gene co-expression networks using WGCNA"
🦞 You: "Create protein-protein interaction networks"
🦞 You: "Analyze network topology and identify hubs"
🦞 You: "Perform network-based pathway analysis"Functional Analysis
Pathway Enrichment
🦞 You: "Run Gene Ontology enrichment analysis"
🦞 You: "Perform KEGG pathway analysis"
🦞 You: "Use Reactome for pathway annotation"
🦞 You: "Create enrichment plots with significance levels"Gene Set Analysis
🦞 You: "Perform GSEA with custom gene sets"
🦞 You: "Test enrichment of MSigDB collections"
🦞 You: "Create leading edge analysis plots"
🦞 You: "Generate enrichment heatmaps"Automation & Scripting
Batch Processing Workflows
Process Multiple Datasets
# Python script for batch processing
#!/usr/bin/env python3
from lobster.core.client import AgentClient
from lobster.core.data_manager_v2 import DataManagerV2
from pathlib import Path
def batch_process_datasets(dataset_paths, output_dir):
"""Process multiple datasets with standard pipeline."""
for dataset_path in dataset_paths:
print(f"Processing {dataset_path}")
# Initialize fresh workspace
workspace = Path(output_dir) / f"analysis_{dataset_path.stem}"
data_manager = DataManagerV2(workspace_path=workspace)
client = AgentClient(data_manager=data_manager)
# Standard analysis pipeline
queries = [
f"Load data from {dataset_path}",
"Perform quality control and filtering",
"Normalize and find variable genes",
"Run clustering analysis",
"Find marker genes and annotate cell types",
"Export results and visualizations"
]
for query in queries:
result = client.query(query)
if not result['success']:
print(f"Failed: {query}")
break
print(f"Completed analysis for {dataset_path}")
# Usage
dataset_paths = [
Path("data/dataset1.h5ad"),
Path("data/dataset2.h5ad"),
Path("data/dataset3.h5ad")
]
batch_process_datasets(dataset_paths, "batch_results")Automated Report Generation
🦞 You: "Generate automated analysis report for all loaded datasets"
🦞 You: "Create summary statistics table comparing all samples"
🦞 You: "Export standardized figure set for publication"
🦞 You: "Generate methods description for manuscript"Parameter Optimization
Systematic Parameter Testing
🦞 You: "Test clustering resolutions from 0.1 to 2.0 in steps of 0.1"
🦞 You: "Compare different normalization methods and show results"
🦞 You: "Optimize PCA components by testing 10, 20, 30, 40, 50"
🦞 You: "Test different QC thresholds and compare cell numbers"Performance Optimization
Memory-Efficient Processing
Large Dataset Handling
🦞 You: "Process large dataset >100k cells using chunked analysis"
🦞 You: "Use memory-efficient file formats (H5AD, Zarr)"
🦞 You: "Apply subsampling for initial exploration"
🦞 You: "Use sparse matrix operations for memory efficiency"Parallel Processing
🦞 You: "Run analysis using multiple CPU cores"
🦞 You: "Process samples in parallel for batch analysis"
🦞 You: "Use GPU acceleration for large matrix operations"
🦞 You: "Optimize I/O operations for network storage"Cloud Integration Patterns
# Set up cloud processing
export LOBSTER_CLOUD_KEY="your-api-key"
🦞 You: "Upload large dataset to cloud for processing"
🦞 You: "Run memory-intensive analysis on cloud infrastructure"
🦞 You: "Download results and visualizations locally"
🦞 You: "Switch between local and cloud processing seamlessly"Common Analysis Combinations
Complete Single-Cell Pipeline
# Full single-cell analysis workflow
🦞 You: "Download GSE datasets for single-cell immune atlas"
🦞 You: "Merge multiple samples and batch correct"
🦞 You: "Perform comprehensive quality control"
🦞 You: "Apply clustering and cell type annotation"
🦞 You: "Generate trajectory analysis and pseudotime"
🦞 You: "Create publication dashboard"Integrated Multi-Omics Analysis
# Multi-omics integration workflow
🦞 You: "Load paired RNA-seq and proteomics data"
🦞 You: "Perform quality control on both datasets"
🦞 You: "Run differential analysis for each platform"
🦞 You: "Correlate changes across omics layers"
🦞 You: "Perform integrated pathway analysis"
🦞 You: "Generate multi-omics summary report"Time Series Analysis
# Temporal analysis workflow
🦞 You: "Load time series data with multiple time points"
🦞 You: "Model temporal patterns using spline regression"
🦞 You: "Identify genes with significant time trends"
🦞 You: "Cluster genes by temporal expression patterns"
🦞 You: "Create animated visualizations of changes"Troubleshooting Recipes
Common Issues and Solutions
Data Loading Problems
# File format issues
🦞 You: "My CSV file has gene names in the first row - how to load correctly?"
🦞 You: "The H5AD file seems corrupted - can you validate and repair it?"
🦞 You: "Excel file has multiple sheets - load from specific sheet 'RNAseq'"
# Memory issues
🦞 You: "Dataset too large for memory - use chunked processing"
🦞 You: "Convert dense matrix to sparse format to save memory"Analysis Issues
# Quality control
🦞 You: "No cells pass QC filters - adjust thresholds more liberally"
🦞 You: "Too many cells filtered out - show QC distribution plots"
# Clustering problems
🦞 You: "Clusters look poorly separated - try different resolution"
🦞 You: "Getting too many/few clusters - optimize parameters"
# Statistical issues
🦞 You: "No significant genes found - check sample sizes and effect sizes"
🦞 You: "P-value distribution looks biased - investigate data quality"Performance Issues
# Speed optimization
🦞 You: "Analysis taking too long - use faster approximate methods"
🦞 You: "Enable parallel processing to speed up computation"
🦞 You: "Use cloud processing for large dataset analysis"Tips and Best Practices
Natural Language Best Practices
Effective Query Patterns
# ✅ Good queries - specific and clear
🦞 You: "Load single-cell data from GSE12345 and perform quality control"
🦞 You: "Run differential expression between condition A and B using DESeq2"
🦞 You: "Create UMAP plot colored by cell type with cluster labels"
# ❌ Avoid vague queries
🦞 You: "Analyze my data"
🦞 You: "Make some plots"
🦞 You: "Fix the problem"Providing Context
# ✅ Include relevant context
🦞 You: "I have Visium spatial data with 2000 spots - cluster into tissue regions"
🦞 You: "This is proteomics data from MaxQuant with 40% missing values"
🦞 You: "Time series RNA-seq with samples at 0h, 6h, 12h, 24h timepoints"Analysis Strategy Tips
- Start Simple: Begin with basic analyses before complex workflows
- Check Quality: Always perform QC before downstream analysis
- Validate Results: Cross-check findings with different methods
- Document Parameters: Keep track of analysis settings
- Save Checkpoints: Export intermediate results regularly
Performance Tips
- Use Appropriate Data Types: Sparse matrices for single-cell, dense for bulk
- Optimize Memory: Filter unnecessary genes/cells early
- Parallel Processing: Leverage multiple cores when available
- Cloud Resources: Use cloud for large-scale analyses
- Caching: Reuse preprocessed data when possible
This cookbook provides a comprehensive collection of practical examples for using Lobster AI effectively. Each recipe can be adapted to your specific datasets and analysis needs. For more detailed tutorials, see the individual tutorial documents for single-cell, bulk RNA-seq, and proteomics analysis.
Claude Code Integration
Lobster AI integrates with Claude Code as a custom skill, enabling bioinformatics analyses directly from your IDE through natural l...
Creating Custom Agents Tutorial
This comprehensive tutorial demonstrates how to create, integrate, and deploy custom AI agents in Lobster AI using the centralized agent registry system and ...