Machine Learning

In Development — This package is not yet published to PyPI. APIs, tool signatures, and agent behavior will change before release.

lobster-ml

FreeAdvanced

ML data preparation: feature selection, survival analysis, cross-validation, and model interpretability for omics data

Input

AnnDataExpression MatricesSurvival DataMulti-Omics

Output

Selected FeaturesCox ModelsSHAP ValuesMOFA FactorsEnriched Pathways

Agents (3)

└── machine_learning_expert — ML preparation and sub-agent routing

├── feature_selection_expert — Biomarker discovery and feature ranking

└── survival_analysis_expert — Cox models and Kaplan-Meier analysis

pip install lobster-ml

Agents

machine_learning_expert

The main orchestrator for machine learning workflows, coordinating between specialized sub-agents.

Capabilities:

ML data preparation and feature engineering
Data splitting and framework export (PyTorch, TensorFlow)
Delegation to feature selection and survival analysis sub-agents
Multi-omics integration via MOFA
Pathway enrichment analysis via INDRA

feature_selection_expert

Specialized agent for biomarker discovery and feature selection in high-dimensional omics data.

Capabilities:

Stability selection (Meinshausen & Buhlmann probability)
LASSO and Elastic Net regularization
Variance-based filtering with chunked computation for large matrices
Importance ranking and automatic feature detection

survival_analysis_expert

Specialized agent for time-to-event analysis and risk stratification.

Capabilities:

Cox proportional hazards models (unregularized and regularized)
Kaplan-Meier survival curves with median survival and RMST
C-index reporting with three-tier validation (test, CV, training)
Threshold optimization with censoring-aware handling
Risk stratification and hazard ratio computation

Example Workflows

ML Feature Preparation

User: Prepare my scRNA-seq data for machine learning classification

[machine_learning_expert]
- Loads AnnData expression matrix
- Scales features and handles missing values
- Applies SMOTE for class imbalance (marks synthetic samples)
- Splits into train/test sets
- Exports to PyTorch-compatible format

Biomarker Discovery (Feature Selection)

User: Find the most stable biomarkers that distinguish alpha
      cells from beta cells in my pancreas scRNA-seq data

[machine_learning_expert delegates to feature_selection_expert]
- Runs stability selection (50 bootstrap rounds)
- Uses Random Forest or XGBoost importance scoring
- Applies variance filter to remove low-information features
- Reports top stable features with selection probabilities
- Expects biologically meaningful genes (e.g., INS, GCG, SST)

Survival Analysis

User: Run survival analysis using the selected biomarkers
      with my clinical outcome data

[machine_learning_expert delegates to survival_analysis_expert]
- Fits Cox PH model on selected features
- Validates with C-index (test set preferred)
- Generates Kaplan-Meier curves for risk groups
- Reports hazard ratios and confidence intervals
- Saves model to workspace/models/

Multi-Omics Integration

User: Integrate my transcriptomics and proteomics data
      and run feature selection on the combined space

[machine_learning_expert]
- Validates sample overlap between modalities
- Runs MOFA-based integration (factors in adata.obsm['X_mofa'])
- Delegates to feature_selection_expert with feature_space_key="X_mofa"
- Reports top factors and pathway enrichment via INDRA

Dependencies

lobster-ml requires lobster-ai as its core dependency. Domain-specific libraries are organized as optional extras:

Extra	Libraries	Purpose
ml	torch, scvi-tools	Deep learning, scVI embeddings
survival	scikit-survival	Cox models, Kaplan-Meier
interpretability	shap, interpret	SHAP values, model explanations
imbalanced	imbalanced-learn	SMOTE, class balancing
tuning	hyperopt	Hyperparameter optimization
full	All of the above	Complete ML stack

Install with extras:

pip install lobster-ml[survival]          # Just survival analysis
pip install lobster-ml[full]              # Everything

Services

lobster-ml includes specialized ML services:

Service	Purpose
FeatureSelectionService	Stability selection, LASSO, variance filtering
SurvivalAnalysisService	Cox PH models, Kaplan-Meier, threshold optimization
CrossValidationService	Stratified k-fold, nested CV, time series CV
InterpretabilityService	SHAP values, per-class explanations
MLPreprocessingService	SMOTE balancing, scaling, missing value handling
MultiOmicsIntegrationService	MOFA-based multi-omics factor analysis
PathwayEnrichmentBridgeService	GO/Reactome enrichment via INDRA Discovery API

Services follow the standard 3-tuple return pattern and are accessed internally by the agents.

Configuration

Enable ML agents in your workspace config:

# .lobster_workspace/config.toml
enabled = ["machine_learning_expert", "feature_selection_expert", "survival_analysis_expert"]

Or use a preset:

preset = "ml-full"

Sub-Agent Architecture

machine_learning_expert (supervisor, accessible from main supervisor)
|-- feature_selection_expert (sub-agent, not directly accessible)
|-- survival_analysis_expert (sub-agent, not directly accessible)

Sub-agents are accessed through machine_learning_expert delegation, not directly from the supervisor. This architecture allows the orchestrator to:

Route tasks - Determine which sub-agent is needed based on the analysis type
Manage state - Track ML pipeline progress across preprocessing, selection, and modeling
Synthesize results - Combine outputs from feature selection and survival analysis into unified reports

NextProteomics

Machine Learning

Agents

machine_learning_expert

feature_selection_expert

survival_analysis_expert

Example Workflows

ML Feature Preparation

Biomarker Discovery (Feature Selection)

Survival Analysis

Multi-Omics Integration

Dependencies

Services

Configuration

Sub-Agent Architecture

What's Next?

Getting Started

Transcriptomics Agent

Configuration

On this page