Machine Learning
ML data preparation, feature selection, survival analysis, and interpretability
In Development — This package is not yet published to PyPI. APIs, tool signatures, and agent behavior will change before release.
ML data preparation: feature selection, survival analysis, cross-validation, and model interpretability for omics data
Agents
machine_learning_expert
The main orchestrator for machine learning workflows, coordinating between specialized sub-agents.
Capabilities:
- ML data preparation and feature engineering
- Data splitting and framework export (PyTorch, TensorFlow)
- Delegation to feature selection and survival analysis sub-agents
- Multi-omics integration via MOFA
- Pathway enrichment analysis via INDRA
feature_selection_expert
Specialized agent for biomarker discovery and feature selection in high-dimensional omics data.
Capabilities:
- Stability selection (Meinshausen & Buhlmann probability)
- LASSO and Elastic Net regularization
- Variance-based filtering with chunked computation for large matrices
- Importance ranking and automatic feature detection
survival_analysis_expert
Specialized agent for time-to-event analysis and risk stratification.
Capabilities:
- Cox proportional hazards models (unregularized and regularized)
- Kaplan-Meier survival curves with median survival and RMST
- C-index reporting with three-tier validation (test, CV, training)
- Threshold optimization with censoring-aware handling
- Risk stratification and hazard ratio computation
Example Workflows
ML Feature Preparation
User: Prepare my scRNA-seq data for machine learning classification
[machine_learning_expert]
- Loads AnnData expression matrix
- Scales features and handles missing values
- Applies SMOTE for class imbalance (marks synthetic samples)
- Splits into train/test sets
- Exports to PyTorch-compatible formatBiomarker Discovery (Feature Selection)
User: Find the most stable biomarkers that distinguish alpha
cells from beta cells in my pancreas scRNA-seq data
[machine_learning_expert delegates to feature_selection_expert]
- Runs stability selection (50 bootstrap rounds)
- Uses Random Forest or XGBoost importance scoring
- Applies variance filter to remove low-information features
- Reports top stable features with selection probabilities
- Expects biologically meaningful genes (e.g., INS, GCG, SST)Survival Analysis
User: Run survival analysis using the selected biomarkers
with my clinical outcome data
[machine_learning_expert delegates to survival_analysis_expert]
- Fits Cox PH model on selected features
- Validates with C-index (test set preferred)
- Generates Kaplan-Meier curves for risk groups
- Reports hazard ratios and confidence intervals
- Saves model to workspace/models/Multi-Omics Integration
User: Integrate my transcriptomics and proteomics data
and run feature selection on the combined space
[machine_learning_expert]
- Validates sample overlap between modalities
- Runs MOFA-based integration (factors in adata.obsm['X_mofa'])
- Delegates to feature_selection_expert with feature_space_key="X_mofa"
- Reports top factors and pathway enrichment via INDRADependencies
lobster-ml requires lobster-ai as its core dependency. Domain-specific libraries are organized as optional extras:
| Extra | Libraries | Purpose |
|---|---|---|
| ml | torch, scvi-tools | Deep learning, scVI embeddings |
| survival | scikit-survival | Cox models, Kaplan-Meier |
| interpretability | shap, interpret | SHAP values, model explanations |
| imbalanced | imbalanced-learn | SMOTE, class balancing |
| tuning | hyperopt | Hyperparameter optimization |
| full | All of the above | Complete ML stack |
Install with extras:
pip install lobster-ml[survival] # Just survival analysis
pip install lobster-ml[full] # EverythingServices
lobster-ml includes specialized ML services:
| Service | Purpose |
|---|---|
| FeatureSelectionService | Stability selection, LASSO, variance filtering |
| SurvivalAnalysisService | Cox PH models, Kaplan-Meier, threshold optimization |
| CrossValidationService | Stratified k-fold, nested CV, time series CV |
| InterpretabilityService | SHAP values, per-class explanations |
| MLPreprocessingService | SMOTE balancing, scaling, missing value handling |
| MultiOmicsIntegrationService | MOFA-based multi-omics factor analysis |
| PathwayEnrichmentBridgeService | GO/Reactome enrichment via INDRA Discovery API |
Services follow the standard 3-tuple return pattern and are accessed internally by the agents.
Configuration
Enable ML agents in your workspace config:
# .lobster_workspace/config.toml
enabled = ["machine_learning_expert", "feature_selection_expert", "survival_analysis_expert"]Or use a preset:
preset = "ml-full"Sub-Agent Architecture
machine_learning_expert (supervisor, accessible from main supervisor)
|-- feature_selection_expert (sub-agent, not directly accessible)
|-- survival_analysis_expert (sub-agent, not directly accessible)Sub-agents are accessed through machine_learning_expert delegation, not directly from the supervisor. This architecture allows the orchestrator to:
- Route tasks - Determine which sub-agent is needed based on the analysis type
- Manage state - Track ML pipeline progress across preprocessing, selection, and modeling
- Synthesize results - Combine outputs from feature selection and survival analysis into unified reports