Omics-OS Docs

Machine Learning

ML data preparation, feature selection, survival analysis, and interpretability

In Development — This package is not yet published to PyPI. APIs, tool signatures, and agent behavior will change before release.

lobster-ml
FreeAdvanced

ML data preparation: feature selection, survival analysis, cross-validation, and model interpretability for omics data

Input
AnnDataExpression MatricesSurvival DataMulti-Omics
Output
Selected FeaturesCox ModelsSHAP ValuesMOFA FactorsEnriched Pathways
Agents (3)
└── machine_learning_expertML preparation and sub-agent routing
├── feature_selection_expertBiomarker discovery and feature ranking
└── survival_analysis_expertCox models and Kaplan-Meier analysis
pip install lobster-ml

Agents

machine_learning_expert

The main orchestrator for machine learning workflows, coordinating between specialized sub-agents.

Capabilities:

  • ML data preparation and feature engineering
  • Data splitting and framework export (PyTorch, TensorFlow)
  • Delegation to feature selection and survival analysis sub-agents
  • Multi-omics integration via MOFA
  • Pathway enrichment analysis via INDRA

feature_selection_expert

Specialized agent for biomarker discovery and feature selection in high-dimensional omics data.

Capabilities:

  • Stability selection (Meinshausen & Buhlmann probability)
  • LASSO and Elastic Net regularization
  • Variance-based filtering with chunked computation for large matrices
  • Importance ranking and automatic feature detection

survival_analysis_expert

Specialized agent for time-to-event analysis and risk stratification.

Capabilities:

  • Cox proportional hazards models (unregularized and regularized)
  • Kaplan-Meier survival curves with median survival and RMST
  • C-index reporting with three-tier validation (test, CV, training)
  • Threshold optimization with censoring-aware handling
  • Risk stratification and hazard ratio computation

Example Workflows

ML Feature Preparation

User: Prepare my scRNA-seq data for machine learning classification

[machine_learning_expert]
- Loads AnnData expression matrix
- Scales features and handles missing values
- Applies SMOTE for class imbalance (marks synthetic samples)
- Splits into train/test sets
- Exports to PyTorch-compatible format

Biomarker Discovery (Feature Selection)

User: Find the most stable biomarkers that distinguish alpha
      cells from beta cells in my pancreas scRNA-seq data

[machine_learning_expert delegates to feature_selection_expert]
- Runs stability selection (50 bootstrap rounds)
- Uses Random Forest or XGBoost importance scoring
- Applies variance filter to remove low-information features
- Reports top stable features with selection probabilities
- Expects biologically meaningful genes (e.g., INS, GCG, SST)

Survival Analysis

User: Run survival analysis using the selected biomarkers
      with my clinical outcome data

[machine_learning_expert delegates to survival_analysis_expert]
- Fits Cox PH model on selected features
- Validates with C-index (test set preferred)
- Generates Kaplan-Meier curves for risk groups
- Reports hazard ratios and confidence intervals
- Saves model to workspace/models/

Multi-Omics Integration

User: Integrate my transcriptomics and proteomics data
      and run feature selection on the combined space

[machine_learning_expert]
- Validates sample overlap between modalities
- Runs MOFA-based integration (factors in adata.obsm['X_mofa'])
- Delegates to feature_selection_expert with feature_space_key="X_mofa"
- Reports top factors and pathway enrichment via INDRA

Dependencies

lobster-ml requires lobster-ai as its core dependency. Domain-specific libraries are organized as optional extras:

ExtraLibrariesPurpose
mltorch, scvi-toolsDeep learning, scVI embeddings
survivalscikit-survivalCox models, Kaplan-Meier
interpretabilityshap, interpretSHAP values, model explanations
imbalancedimbalanced-learnSMOTE, class balancing
tuninghyperoptHyperparameter optimization
fullAll of the aboveComplete ML stack

Install with extras:

pip install lobster-ml[survival]          # Just survival analysis
pip install lobster-ml[full]              # Everything

Services

lobster-ml includes specialized ML services:

ServicePurpose
FeatureSelectionServiceStability selection, LASSO, variance filtering
SurvivalAnalysisServiceCox PH models, Kaplan-Meier, threshold optimization
CrossValidationServiceStratified k-fold, nested CV, time series CV
InterpretabilityServiceSHAP values, per-class explanations
MLPreprocessingServiceSMOTE balancing, scaling, missing value handling
MultiOmicsIntegrationServiceMOFA-based multi-omics factor analysis
PathwayEnrichmentBridgeServiceGO/Reactome enrichment via INDRA Discovery API

Services follow the standard 3-tuple return pattern and are accessed internally by the agents.

Configuration

Enable ML agents in your workspace config:

# .lobster_workspace/config.toml
enabled = ["machine_learning_expert", "feature_selection_expert", "survival_analysis_expert"]

Or use a preset:

preset = "ml-full"

Sub-Agent Architecture

machine_learning_expert (supervisor, accessible from main supervisor)
|-- feature_selection_expert (sub-agent, not directly accessible)
|-- survival_analysis_expert (sub-agent, not directly accessible)

Sub-agents are accessed through machine_learning_expert delegation, not directly from the supervisor. This architecture allows the orchestrator to:

  1. Route tasks - Determine which sub-agent is needed based on the analysis type
  2. Manage state - Track ML pipeline progress across preprocessing, selection, and modeling
  3. Synthesize results - Combine outputs from feature selection and survival analysis into unified reports

On this page