# Machine Learning import { AgentHero } from '@/components/AgentHero'; **In Development** — This package is not yet published to PyPI. APIs, tool signatures, and agent behavior will change before release. Agents [#agents] machine_learning_expert [#machine_learning_expert] The main orchestrator for machine learning workflows, coordinating between specialized sub-agents. **Capabilities:** * ML data preparation and feature engineering * Data splitting and framework export (PyTorch, TensorFlow) * Delegation to feature selection and survival analysis sub-agents * Multi-omics integration via MOFA * Pathway enrichment analysis via INDRA feature_selection_expert [#feature_selection_expert] Specialized agent for biomarker discovery and feature selection in high-dimensional omics data. **Capabilities:** * Stability selection (Meinshausen & Buhlmann probability) * LASSO and Elastic Net regularization * Variance-based filtering with chunked computation for large matrices * Importance ranking and automatic feature detection survival_analysis_expert [#survival_analysis_expert] Specialized agent for time-to-event analysis and risk stratification. **Capabilities:** * Cox proportional hazards models (unregularized and regularized) * Kaplan-Meier survival curves with median survival and RMST * C-index reporting with three-tier validation (test, CV, training) * Threshold optimization with censoring-aware handling * Risk stratification and hazard ratio computation Example Workflows [#example-workflows] ML Feature Preparation [#ml-feature-preparation] ```text User: Prepare my scRNA-seq data for machine learning classification [machine_learning_expert] - Loads AnnData expression matrix - Scales features and handles missing values - Applies SMOTE for class imbalance (marks synthetic samples) - Splits into train/test sets - Exports to PyTorch-compatible format ``` Biomarker Discovery (Feature Selection) [#biomarker-discovery-feature-selection] ```text User: Find the most stable biomarkers that distinguish alpha cells from beta cells in my pancreas scRNA-seq data [machine_learning_expert delegates to feature_selection_expert] - Runs stability selection (50 bootstrap rounds) - Uses Random Forest or XGBoost importance scoring - Applies variance filter to remove low-information features - Reports top stable features with selection probabilities - Expects biologically meaningful genes (e.g., INS, GCG, SST) ``` Survival Analysis [#survival-analysis] ```text User: Run survival analysis using the selected biomarkers with my clinical outcome data [machine_learning_expert delegates to survival_analysis_expert] - Fits Cox PH model on selected features - Validates with C-index (test set preferred) - Generates Kaplan-Meier curves for risk groups - Reports hazard ratios and confidence intervals - Saves model to workspace/models/ ``` Multi-Omics Integration [#multi-omics-integration] ```text User: Integrate my transcriptomics and proteomics data and run feature selection on the combined space [machine_learning_expert] - Validates sample overlap between modalities - Runs MOFA-based integration (factors in adata.obsm['X_mofa']) - Delegates to feature_selection_expert with feature_space_key="X_mofa" - Reports top factors and pathway enrichment via INDRA ``` Dependencies [#dependencies] lobster-ml requires `lobster-ai` as its core dependency. Domain-specific libraries are organized as optional extras: | Extra | Libraries | Purpose | | -------------------- | ----------------- | ------------------------------- | | **ml** | torch, scvi-tools | Deep learning, scVI embeddings | | **survival** | scikit-survival | Cox models, Kaplan-Meier | | **interpretability** | shap, interpret | SHAP values, model explanations | | **imbalanced** | imbalanced-learn | SMOTE, class balancing | | **tuning** | hyperopt | Hyperparameter optimization | | **full** | All of the above | Complete ML stack | Install with extras: ```bash pip install lobster-ml[survival] # Just survival analysis pip install lobster-ml[full] # Everything ``` Services [#services] lobster-ml includes specialized ML services: | Service | Purpose | | ---------------------------------- | --------------------------------------------------- | | **FeatureSelectionService** | Stability selection, LASSO, variance filtering | | **SurvivalAnalysisService** | Cox PH models, Kaplan-Meier, threshold optimization | | **CrossValidationService** | Stratified k-fold, nested CV, time series CV | | **InterpretabilityService** | SHAP values, per-class explanations | | **MLPreprocessingService** | SMOTE balancing, scaling, missing value handling | | **MultiOmicsIntegrationService** | MOFA-based multi-omics factor analysis | | **PathwayEnrichmentBridgeService** | GO/Reactome enrichment via INDRA Discovery API | Services follow the standard 3-tuple return pattern and are accessed internally by the agents. Configuration [#configuration] Enable ML agents in your workspace config: ```toml # .lobster_workspace/config.toml enabled = ["machine_learning_expert", "feature_selection_expert", "survival_analysis_expert"] ``` Or use a preset: ```toml preset = "ml-full" ``` Sub-Agent Architecture [#sub-agent-architecture] ```text machine_learning_expert (supervisor, accessible from main supervisor) |-- feature_selection_expert (sub-agent, not directly accessible) |-- survival_analysis_expert (sub-agent, not directly accessible) ``` Sub-agents are accessed through machine\_learning\_expert delegation, not directly from the supervisor. This architecture allows the orchestrator to: 1. **Route tasks** - Determine which sub-agent is needed based on the analysis type 2. **Manage state** - Track ML pipeline progress across preprocessing, selection, and modeling 3. **Synthesize results** - Combine outputs from feature selection and survival analysis into unified reports import { NextSteps } from '@/components/NextSteps'; import { Rocket, GraduationCap, Settings } from 'lucide-react'; }, { href: "/docs/agents/transcriptomics", title: "Transcriptomics Agent", description: "Single-cell and bulk RNA-seq analysis to prepare data for ML workflows", icon: }, { href: "/docs/getting-started/configuration", title: "Configuration", description: "Configure agent settings, model profiles, and workspace preferences", icon: } ]} />