Creating Custom Agents Tutorial

This comprehensive tutorial demonstrates how to create, integrate, and deploy custom AI agents in Lobster AI using the centralized agent registry system and ...

This comprehensive tutorial demonstrates how to create, integrate, and deploy custom AI agents in Lobster AI using the centralized agent registry system and modular architecture.

Overview

In this tutorial, you will learn to:

Understand Lobster AI's agent architecture and registry system
Create a custom agent with specialized tools
Integrate with the DataManagerV2 system
Register your agent in the central registry
Test and deploy your custom agent
Handle agent handoffs and tool integration

Prerequisites

Lobster AI development environment set up
Python 3.11-3.14 with development dependencies (3.12+ recommended)
Understanding of LangChain/LangGraph concepts
Basic knowledge of bioinformatics workflows
Familiarity with Lobster's modular architecture

Example: Creating a Spatial Omics Expert Agent

We'll create a Spatial Omics Expert Agent that specializes in spatial transcriptomics and spatial proteomics analysis (Visium, MERFISH, CosMx, etc.).

Step 1: Understanding the Agent Architecture

Agent Discovery Pattern (v1.0.0+)

Lobster uses entry-point-based discovery via ComponentRegistry:

from lobster.core.registry import AgentRegistryConfig

# Define AGENT_CONFIG at module top for fast discovery
AGENT_CONFIG = AgentRegistryConfig(
    name='your_agent',
    display_name='Your Agent',
    description='What this agent does',
    factory_function='your_package.agents.your_agent:create_your_agent',
    tier_requirement='free',
    package_name='your-package',
    handoff_tool_name='handoff_to_your_agent',
    handoff_tool_description='When to use this agent'
)

Agent Tool Pattern

All agents follow this standardized pattern:

@tool
def analyze_modality(modality_name: str, **params) -> str:
    """Standard pattern for all agent tools."""
    try:
        # 1. Validate modality exists
        if modality_name not in data_manager.list_modalities():
            raise ModalityNotFoundError(f"Modality '{modality_name}' not found")

        adata = data_manager.get_modality(modality_name)

        # 2. Call stateless service (returns tuple)
        result_adata, stats = service.analyze(adata, **params)

        # 3. Store results with descriptive naming
        new_modality = f"{modality_name}_analyzed"
        data_manager.modalities[new_modality] = result_adata

        # 4. Log operation for provenance
        data_manager.log_tool_usage("analyze_modality", params, stats)

        return formatted_response(stats, new_modality)

    except ServiceError as e:
        logger.error(f"Service error: {e}")
        return f"Analysis failed: {str(e)}"

Step 2: Create the Spatial Service Layer

First, create the stateless service that handles spatial analysis:

Create `/lobster/tools/spatial_omics_service.py`

"""
Spatial Omics Service for specialized spatial transcriptomics and proteomics analysis.
"""

import numpy as np
import pandas as pd
import scanpy as sc
import squidpy as sq
from typing import Dict, Any, Tuple, Optional, List
from scipy.spatial.distance import pdist, squareform
from sklearn.cluster import KMeans
from sklearn.neighbors import NearestNeighbors
import anndata as AnnData

from lobster.utils.logger import get_logger

logger = get_logger(__name__)


class SpatialOmicsError(Exception):
    """Base exception for spatial omics operations."""
    pass


class SpatialOmicsService:
    """Stateless service for spatial omics analysis."""

    def __init__(self):
        """Initialize spatial omics service."""
        self.logger = get_logger(self.__class__.__name__)

    def calculate_spatial_metrics(
        self,
        adata: AnnData,
        coord_type: str = "grid"
    ) -> Tuple[AnnData, Dict[str, Any]]:
        """Calculate spatial metrics including Moran's I and spatial autocorrelation."""
        try:
            self.logger.info("Calculating spatial metrics...")

            # Ensure spatial coordinates are available
            if 'spatial' not in adata.obsm:
                raise SpatialOmicsError("No spatial coordinates found in adata.obsm['spatial']")

            # Calculate spatial neighbors
            if coord_type == "grid":
                sq.gr.spatial_neighbors(adata, coord_type="grid")
            else:
                sq.gr.spatial_neighbors(adata, coord_type="generic", n_neighs=6)

            # Calculate Moran's I for highly variable genes
            if 'highly_variable' in adata.var.columns:
                hvg_genes = adata.var[adata.var['highly_variable']].index.tolist()
            else:
                # Select top 100 most variable genes
                sc.pp.highly_variable_genes(adata, n_top_genes=100)
                hvg_genes = adata.var[adata.var['highly_variable']].index.tolist()

            # Calculate spatial autocorrelation
            sq.gr.spatial_autocorr(
                adata,
                mode="moran",
                genes=hvg_genes[:50],  # Limit for performance
                n_perms=100,
                n_jobs=1
            )

            # Calculate spatial statistics
            spatial_stats = {
                "n_spots": adata.n_obs,
                "n_genes": adata.n_vars,
                "spatial_dimensions": adata.obsm['spatial'].shape[1],
                "coordinate_range": {
                    "x_min": float(adata.obsm['spatial'][:, 0].min()),
                    "x_max": float(adata.obsm['spatial'][:, 0].max()),
                    "y_min": float(adata.obsm['spatial'][:, 1].min()),
                    "y_max": float(adata.obsm['spatial'][:, 1].max())
                },
                "n_spatially_variable_genes": int((adata.var['moranI'] > 0.1).sum()) if 'moranI' in adata.var else 0,
                "mean_moran_i": float(adata.var['moranI'].mean()) if 'moranI' in adata.var else 0.0
            }

            self.logger.info(f"Spatial metrics calculated for {adata.n_obs} spots")
            return adata, spatial_stats

        except Exception as e:
            raise SpatialOmicsError(f"Spatial metrics calculation failed: {str(e)}")

    def spatial_clustering(
        self,
        adata: AnnData,
        n_clusters: int = 10,
        method: str = "leiden"
    ) -> Tuple[AnnData, Dict[str, Any]]:
        """Perform spatial-aware clustering."""
        try:
            self.logger.info(f"Performing spatial clustering with {method}...")

            if method == "leiden":
                # Use spatial neighbors for clustering
                sc.tl.leiden(adata, resolution=0.5, key_added="spatial_clusters")
                cluster_key = "spatial_clusters"
            elif method == "kmeans":
                # K-means on spatial coordinates
                kmeans = KMeans(n_clusters=n_clusters, random_state=42)
                spatial_clusters = kmeans.fit_predict(adata.obsm['spatial'])
                adata.obs['spatial_kmeans'] = pd.Categorical(spatial_clusters)
                cluster_key = "spatial_kmeans"
            else:
                raise SpatialOmicsError(f"Unknown clustering method: {method}")

            # Calculate clustering statistics
            cluster_stats = {
                "method": method,
                "n_clusters": len(adata.obs[cluster_key].cat.categories),
                "cluster_sizes": dict(adata.obs[cluster_key].value_counts()),
                "silhouette_score": self._calculate_spatial_silhouette(adata, cluster_key),
                "spatial_coherence": self._calculate_spatial_coherence(adata, cluster_key)
            }

            self.logger.info(f"Spatial clustering complete: {cluster_stats['n_clusters']} clusters")
            return adata, cluster_stats

        except Exception as e:
            raise SpatialOmicsError(f"Spatial clustering failed: {str(e)}")

    def identify_spatial_domains(
        self,
        adata: AnnData,
        method: str = "banksy"
    ) -> Tuple[AnnData, Dict[str, Any]]:
        """Identify spatial domains using advanced spatial methods."""
        try:
            self.logger.info(f"Identifying spatial domains using {method}...")

            if method == "banksy":
                # Use squidpy's implementation of spatial domain detection
                sq.gr.spatial_neighbors(adata, n_neighs=6)
                sq.gr.nhood_enrichment(adata, cluster_key="leiden")

                # Calculate spatial domains based on neighborhood composition
                sq.tl.co_occurrence(adata, cluster_key="leiden")

                domain_key = "spatial_domains"
                # Simple spatial domain assignment based on neighborhood enrichment
                adata.obs[domain_key] = adata.obs["leiden"]  # Placeholder implementation

            else:
                raise SpatialOmicsError(f"Unknown spatial domain method: {method}")

            domain_stats = {
                "method": method,
                "n_domains": len(adata.obs[domain_key].cat.categories),
                "domain_sizes": dict(adata.obs[domain_key].value_counts()),
                "avg_domain_size": float(adata.obs[domain_key].value_counts().mean())
            }

            self.logger.info(f"Spatial domain identification complete: {domain_stats['n_domains']} domains")
            return adata, domain_stats

        except Exception as e:
            raise SpatialOmicsError(f"Spatial domain identification failed: {str(e)}")

    def _calculate_spatial_silhouette(self, adata: AnnData, cluster_key: str) -> float:
        """Calculate spatial-aware silhouette score."""
        try:
            from sklearn.metrics import silhouette_score

            # Use spatial coordinates for silhouette calculation
            coords = adata.obsm['spatial']
            labels = adata.obs[cluster_key].cat.codes

            if len(np.unique(labels)) < 2:
                return 0.0

            score = silhouette_score(coords, labels)
            return float(score)

        except Exception:
            return 0.0

    def _calculate_spatial_coherence(self, adata: AnnData, cluster_key: str) -> float:
        """Calculate spatial coherence of clusters."""
        try:
            # Simple spatial coherence: fraction of neighbors with same cluster
            if 'spatial_connectivities' not in adata.obsp:
                return 0.0

            conn_matrix = adata.obsp['spatial_connectivities']
            labels = adata.obs[cluster_key].cat.codes.values

            coherence_scores = []
            for i in range(len(labels)):
                neighbors = conn_matrix[i].indices
                if len(neighbors) > 0:
                    same_cluster = (labels[neighbors] == labels[i]).sum()
                    coherence = same_cluster / len(neighbors)
                    coherence_scores.append(coherence)

            return float(np.mean(coherence_scores)) if coherence_scores else 0.0

        except Exception:
            return 0.0


def create_spatial_omics_service() -> SpatialOmicsService:
    """Factory function to create spatial omics service."""
    return SpatialOmicsService()

Step 3: Create the Spatial Omics Agent

Create `/lobster/agents/spatial_omics_expert.py`

"""
Spatial Omics Expert Agent for spatial transcriptomics and proteomics analysis.

This agent specializes in spatial omics data analysis using Lobster's modular
DataManagerV2 system with spatial-specific quality control and visualization.
"""

from typing import List, Union
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langchain_aws import ChatBedrockConverse
from datetime import date

from lobster.config.settings import get_settings
from lobster.core.data_manager_v2 import DataManagerV2
from lobster.tools.spatial_omics_service import SpatialOmicsService, SpatialOmicsError
from lobster.utils.logger import get_logger

logger = get_logger(__name__)


class SpatialOmicsAgentError(Exception):
    """Base exception for spatial omics agent operations."""
    pass


class ModalityNotFoundError(SpatialOmicsAgentError):
    """Raised when requested modality doesn't exist."""
    pass


def spatial_omics_expert(
    data_manager: DataManagerV2,
    callback_handler=None,
    agent_name: str = "spatial_omics_expert_agent",
    delegation_tools: List = None,
    workspace_path: Path = None
):
    """Create spatial omics expert agent using DataManagerV2 and modular services."""

    settings = get_settings()
    model_params = settings.get_agent_llm_params('spatial_omics_expert')
    llm = ChatBedrockConverse(**model_params)

    if callback_handler and hasattr(llm, 'with_config'):
        llm = llm.with_config(callbacks=[callback_handler])

    # Initialize stateless service
    spatial_service = SpatialOmicsService()

    analysis_results = {"summary": "", "details": {}}

    # -------------------------
    # DATA STATUS TOOLS
    # -------------------------
    @tool
    def check_spatial_data_status(modality_name: str = "") -> str:
        """Check if spatial omics data is loaded and has required spatial coordinates."""
        try:
            if modality_name == "":
                modalities = data_manager.list_modalities()
                if not modalities:
                    return "No modalities loaded. Please load a spatial dataset first."

                # Look for spatial modalities
                spatial_modalities = []
                for mod_name in modalities:
                    adata = data_manager.get_modality(mod_name)
                    if 'spatial' in adata.obsm:
                        spatial_modalities.append(mod_name)

                if not spatial_modalities:
                    response = f"Available modalities ({len(modalities)}) but none have spatial coordinates:\n"
                    for mod_name in modalities[:5]:  # Show first 5
                        adata = data_manager.get_modality(mod_name)
                        response += f"- **{mod_name}**: {adata.n_obs} spots × {adata.n_vars} genes\n"
                    response += "\nTo use spatial analysis, data must have spatial coordinates in adata.obsm['spatial']."
                    return response

                response = f"Found {len(spatial_modalities)} spatial modalities:\n"
                for mod_name in spatial_modalities:
                    adata = data_manager.get_modality(mod_name)
                    coords_shape = adata.obsm['spatial'].shape if 'spatial' in adata.obsm else "None"
                    response += f"- **{mod_name}**: {adata.n_obs} spots × {adata.n_vars} genes, coords: {coords_shape}\n"

                return response

            # Check specific modality
            if modality_name not in data_manager.list_modalities():
                return f"Modality '{modality_name}' not found. Use check_spatial_data_status() to see available modalities."

            adata = data_manager.get_modality(modality_name)

            # Check for spatial coordinates
            has_spatial = 'spatial' in adata.obsm
            if not has_spatial:
                return f"Modality '{modality_name}' exists but lacks spatial coordinates. Spatial analysis requires coordinates in adata.obsm['spatial']."

            coords = adata.obsm['spatial']
            response = f"✅ **{modality_name}** - Ready for spatial analysis\n"
            response += f"- Spots: {adata.n_obs:,}\n"
            response += f"- Genes: {adata.n_vars:,}\n"
            response += f"- Spatial coordinates: {coords.shape} ({coords.shape[1]}D)\n"
            response += f"- Coordinate range: X[{coords[:, 0].min():.1f}, {coords[:, 0].max():.1f}], Y[{coords[:, 1].min():.1f}, {coords[:, 1].max():.1f}]\n"

            # Check for existing spatial analysis
            spatial_features = []
            if 'moranI' in adata.var.columns:
                spatial_features.append("Moran's I calculated")
            if 'spatial_connectivities' in adata.obsp:
                spatial_features.append("Spatial neighbors computed")
            if any('spatial' in col for col in adata.obs.columns):
                spatial_features.append("Spatial clustering performed")

            if spatial_features:
                response += f"- Previous spatial analysis: {', '.join(spatial_features)}\n"

            return response

        except Exception as e:
            logger.error(f"Error checking spatial data status: {e}")
            return f"Error checking spatial data status: {str(e)}"

    # -------------------------
    # SPATIAL ANALYSIS TOOLS
    # -------------------------
    @tool
    def calculate_spatial_statistics(
        modality_name: str,
        coord_type: str = "grid"
    ) -> str:
        """Calculate spatial statistics including Moran's I and spatial autocorrelation."""
        try:
            if modality_name not in data_manager.list_modalities():
                raise ModalityNotFoundError(f"Modality '{modality_name}' not found")

            adata = data_manager.get_modality(modality_name)

            # Check for spatial coordinates
            if 'spatial' not in adata.obsm:
                return f"❌ Modality '{modality_name}' lacks spatial coordinates. Cannot perform spatial analysis."

            # Call service
            result_adata, stats = spatial_service.calculate_spatial_metrics(adata, coord_type)

            # Store result
            new_modality = f"{modality_name}_spatial_metrics"
            data_manager.modalities[new_modality] = result_adata

            # Log operation
            data_manager.log_tool_usage("calculate_spatial_statistics",
                                      {"coord_type": coord_type}, stats)

            # Format response
            response = f"✅ **Spatial Statistics Calculated** for {modality_name}\n\n"
            response += f"📊 **Dataset Overview:**\n"
            response += f"- Spots analyzed: {stats['n_spots']:,}\n"
            response += f"- Genes analyzed: {stats['n_genes']:,}\n"
            response += f"- Spatial dimensions: {stats['spatial_dimensions']}D\n\n"

            response += f"📐 **Coordinate Range:**\n"
            response += f"- X: [{stats['coordinate_range']['x_min']:.1f}, {stats['coordinate_range']['x_max']:.1f}]\n"
            response += f"- Y: [{stats['coordinate_range']['y_min']:.1f}, {stats['coordinate_range']['y_max']:.1f}]\n\n"

            response += f"🧬 **Spatial Gene Analysis:**\n"
            response += f"- Spatially variable genes: {stats['n_spatially_variable_genes']}\n"
            response += f"- Mean Moran's I: {stats['mean_moran_i']:.3f}\n\n"

            response += f"💾 **Results stored as:** `{new_modality}`\n"
            response += f"Next: Use `perform_spatial_clustering()` to identify spatial domains."

            return response

        except SpatialOmicsError as e:
            logger.error(f"Spatial statistics calculation failed: {e}")
            return f"❌ Spatial statistics calculation failed: {str(e)}"
        except Exception as e:
            logger.error(f"Unexpected error in spatial statistics: {e}")
            return f"❌ Unexpected error: {str(e)}"

    @tool
    def perform_spatial_clustering(
        modality_name: str,
        n_clusters: int = 10,
        method: str = "leiden"
    ) -> str:
        """Perform spatial-aware clustering to identify tissue regions."""
        try:
            if modality_name not in data_manager.list_modalities():
                raise ModalityNotFoundError(f"Modality '{modality_name}' not found")

            adata = data_manager.get_modality(modality_name)

            # Call service
            result_adata, stats = spatial_service.spatial_clustering(adata, n_clusters, method)

            # Store result
            new_modality = f"{modality_name}_spatial_clustered"
            data_manager.modalities[new_modality] = result_adata

            # Log operation
            data_manager.log_tool_usage("perform_spatial_clustering",
                                      {"n_clusters": n_clusters, "method": method}, stats)

            # Format response
            response = f"✅ **Spatial Clustering Complete** using {stats['method']}\n\n"
            response += f"🎯 **Clustering Results:**\n"
            response += f"- Clusters identified: {stats['n_clusters']}\n"
            response += f"- Silhouette score: {stats['silhouette_score']:.3f}\n"
            response += f"- Spatial coherence: {stats['spatial_coherence']:.3f}\n\n"

            response += f"📊 **Cluster Distribution:**\n"
            for cluster_id, size in sorted(stats['cluster_sizes'].items()):
                percentage = (size / sum(stats['cluster_sizes'].values())) * 100
                response += f"- Cluster {cluster_id}: {size:,} spots ({percentage:.1f}%)\n"

            response += f"\n💾 **Results stored as:** `{new_modality}`\n"
            response += f"Next: Use `identify_spatial_domains()` for advanced domain detection."

            return response

        except SpatialOmicsError as e:
            logger.error(f"Spatial clustering failed: {e}")
            return f"❌ Spatial clustering failed: {str(e)}"
        except Exception as e:
            logger.error(f"Unexpected error in spatial clustering: {e}")
            return f"❌ Unexpected error: {str(e)}"

    @tool
    def identify_spatial_domains(
        modality_name: str,
        method: str = "banksy"
    ) -> str:
        """Identify spatial domains using advanced spatial analysis methods."""
        try:
            if modality_name not in data_manager.list_modalities():
                raise ModalityNotFoundError(f"Modality '{modality_name}' not found")

            adata = data_manager.get_modality(modality_name)

            # Call service
            result_adata, stats = spatial_service.identify_spatial_domains(adata, method)

            # Store result
            new_modality = f"{modality_name}_spatial_domains"
            data_manager.modalities[new_modality] = result_adata

            # Log operation
            data_manager.log_tool_usage("identify_spatial_domains",
                                      {"method": method}, stats)

            # Format response
            response = f"✅ **Spatial Domains Identified** using {stats['method']}\n\n"
            response += f"🗺️ **Domain Analysis:**\n"
            response += f"- Spatial domains found: {stats['n_domains']}\n"
            response += f"- Average domain size: {stats['avg_domain_size']:.1f} spots\n\n"

            response += f"📊 **Domain Distribution:**\n"
            for domain_id, size in sorted(stats['domain_sizes'].items()):
                percentage = (size / sum(stats['domain_sizes'].values())) * 100
                response += f"- Domain {domain_id}: {size:,} spots ({percentage:.1f}%)\n"

            response += f"\n💾 **Results stored as:** `{new_modality}`\n"
            response += f"Spatial domain analysis complete!"

            return response

        except SpatialOmicsError as e:
            logger.error(f"Spatial domain identification failed: {e}")
            return f"❌ Spatial domain identification failed: {str(e)}"
        except Exception as e:
            logger.error(f"Unexpected error in spatial domain identification: {e}")
            return f"❌ Unexpected error: {str(e)}"

    # -------------------------
    # AGENT CREATION
    # -------------------------

    # Collect all tools
    tools = [
        check_spatial_data_status,
        calculate_spatial_statistics,
        perform_spatial_clustering,
        identify_spatial_domains
    ]

    # Add handoff tools if provided
    if delegation_tools:
        tools.extend(delegation_tools)

    # Create agent with enhanced prompt
    system_prompt = f"""You are the Spatial Omics Expert, specialized in spatial transcriptomics and spatial proteomics analysis.

**Your Expertise:**
- Spatial transcriptomics (Visium, MERFISH, seqFISH, CosMx)
- Spatial proteomics (IMC, MIBI-TOF, GeoMx)
- Spatial statistics and autocorrelation analysis
- Tissue domain identification and spatial clustering
- Spatial neighborhood analysis and cell-cell interactions

**Available Tools:**
- check_spatial_data_status: Check if spatial data is properly loaded
- calculate_spatial_statistics: Compute Moran's I and spatial autocorrelation
- perform_spatial_clustering: Identify tissue regions using spatial clustering
- identify_spatial_domains: Advanced spatial domain detection

**Guidelines:**
1. Always check data status before analysis
2. Use spatial coordinates for clustering when available
3. Interpret spatial patterns in biological context
4. Provide clear descriptions of spatial domains and their potential biological significance
5. Suggest appropriate follow-up analyses based on results

**Data Requirements:**
- Spatial coordinates must be in adata.obsm['spatial']
- For best results, data should be quality-controlled and normalized
- Spatial analysis works best with >1000 spots and >500 genes

Today's date: {date.today()}
"""

    # Create the agent
    agent = create_react_agent(
        llm,
        tools,
        state_modifier=system_prompt
    )

    return agent


# Agent factory function for registry
def create_spatial_omics_expert(data_manager: DataManagerV2, **kwargs):
    """Factory function to create spatial omics expert agent."""
    return spatial_omics_expert(data_manager, **kwargs)

Step 4: Register Your Custom Agent (v1.0.0+)

Define AGENT_CONFIG at the top of your agent module and register via entry points:

# At the top of lobster/agents/spatial_omics_expert.py
from lobster.core.registry import AgentRegistryConfig

AGENT_CONFIG = AgentRegistryConfig(
    name='spatial_omics_expert_agent',
    display_name='Spatial Omics Expert',
    description='Handles spatial transcriptomics and spatial proteomics analysis including tissue domain identification, spatial clustering, and neighborhood analysis',
    factory_function='lobster.agents.spatial_omics_expert:spatial_omics_expert',
    tier_requirement='free',
    package_name='lobster-spatial',
    handoff_tool_name='handoff_to_spatial_omics_expert',
    handoff_tool_description='Assign spatial omics analysis tasks including spatial transcriptomics (Visium, MERFISH) and spatial proteomics (IMC, MIBI) to the spatial omics expert'
)

[project.entry-points."lobster.agents"]
spatial_omics_expert_agent = "lobster.agents.spatial_omics_expert:AGENT_CONFIG"

Step 5: Create Unit Tests

Create comprehensive tests in /tests/unit/test_spatial_omics_agent.py:

"""
Unit tests for Spatial Omics Expert Agent.
"""

import pytest
import numpy as np
import pandas as pd
import anndata as ad
from unittest.mock import Mock, patch
import tempfile
from pathlib import Path

from lobster.core.data_manager_v2 import DataManagerV2
from lobster.agents.spatial_omics_expert import spatial_omics_expert, create_spatial_omics_expert
from lobster.tools.spatial_omics_service import SpatialOmicsService


@pytest.fixture
def temp_workspace():
    """Create temporary workspace for testing."""
    with tempfile.TemporaryDirectory() as temp_dir:
        workspace_path = Path(temp_dir) / ".lobster_workspace"
        workspace_path.mkdir(exist_ok=True)
        yield workspace_path


@pytest.fixture
def mock_spatial_data():
    """Create mock spatial transcriptomics data."""
    n_spots = 1000
    n_genes = 2000

    # Create count matrix
    X = np.random.negative_binomial(5, 0.3, (n_spots, n_genes)).astype(float)

    # Create spatial coordinates (hexagonal grid)
    coords = []
    for i in range(32):  # 32x32 grid ≈ 1000 spots
        for j in range(32):
            if len(coords) >= n_spots:
                break
            x = i + 0.5 * (j % 2)  # Hexagonal offset
            y = j * 0.866  # Hexagonal spacing
            coords.append([x, y])

    coords = np.array(coords[:n_spots])

    # Create AnnData object
    adata = ad.AnnData(X=X)
    adata.obs_names = [f"spot_{i}" for i in range(n_spots)]
    adata.var_names = [f"gene_{i}" for i in range(n_genes)]
    adata.obsm['spatial'] = coords

    # Add some metadata
    adata.obs['tissue_type'] = np.random.choice(['tumor', 'normal'], n_spots)
    adata.var['highly_variable'] = np.random.choice([True, False], n_genes, p=[0.1, 0.9])

    return adata


@pytest.fixture
def data_manager_with_spatial(temp_workspace, mock_spatial_data):
    """Create DataManagerV2 with spatial data loaded."""
    data_manager = DataManagerV2(workspace_path=temp_workspace)
    data_manager.modalities['spatial_visium'] = mock_spatial_data
    return data_manager


def test_spatial_omics_service_initialization():
    """Test spatial omics service can be initialized."""
    service = SpatialOmicsService()
    assert service is not None


def test_spatial_metrics_calculation(mock_spatial_data):
    """Test spatial metrics calculation."""
    service = SpatialOmicsService()

    result_adata, stats = service.calculate_spatial_metrics(mock_spatial_data)

    assert result_adata is not None
    assert 'n_spots' in stats
    assert 'n_genes' in stats
    assert 'spatial_dimensions' in stats
    assert stats['spatial_dimensions'] == 2
    assert stats['n_spots'] == 1000


def test_spatial_clustering(mock_spatial_data):
    """Test spatial clustering functionality."""
    service = SpatialOmicsService()

    # First calculate spatial metrics to set up neighbors
    adata_with_metrics, _ = service.calculate_spatial_metrics(mock_spatial_data)

    # Then perform clustering
    result_adata, stats = service.spatial_clustering(adata_with_metrics, n_clusters=5, method="kmeans")

    assert 'spatial_kmeans' in result_adata.obs.columns
    assert 'method' in stats
    assert 'n_clusters' in stats
    assert stats['method'] == 'kmeans'


def test_agent_creation(data_manager_with_spatial):
    """Test that the spatial omics agent can be created."""
    agent = spatial_omics_expert(data_manager_with_spatial)
    assert agent is not None


def test_factory_function(data_manager_with_spatial):
    """Test the factory function."""
    agent = create_spatial_omics_expert(data_manager_with_spatial)
    assert agent is not None


@pytest.mark.integration
def test_check_spatial_data_status_tool(data_manager_with_spatial):
    """Test the check_spatial_data_status tool."""
    agent = spatial_omics_expert(data_manager_with_spatial)

    # Get the tool function
    tools = agent.get_graph().get_state_schema()
    # Note: This is a simplified test - in practice you'd invoke the agent

    # Test with existing spatial modality
    result = "Found spatial modalities with coordinates"  # Expected result
    assert isinstance(result, str)


def test_missing_spatial_coordinates():
    """Test handling of data without spatial coordinates."""
    # Create data without spatial coordinates
    adata = ad.AnnData(X=np.random.randn(100, 50))

    service = SpatialOmicsService()

    with pytest.raises(Exception):  # Should raise SpatialOmicsError
        service.calculate_spatial_metrics(adata)


if __name__ == "__main__":
    pytest.main([__file__])

Step 6: Integration Testing

Create integration tests in /tests/integration/test_spatial_omics_integration.py:

"""
Integration tests for Spatial Omics Expert Agent with full workflow.
"""

import pytest
import tempfile
from pathlib import Path
import numpy as np
import anndata as ad

from lobster.core.data_manager_v2 import DataManagerV2
from lobster.core.client import AgentClient


@pytest.fixture
def full_spatial_workspace():
    """Create a full workspace with spatial data for integration testing."""
    with tempfile.TemporaryDirectory() as temp_dir:
        workspace_path = Path(temp_dir) / ".lobster_workspace"
        workspace_path.mkdir(exist_ok=True)

        # Create realistic spatial data
        n_spots = 2000
        n_genes = 3000

        # Create spatial coordinates in tissue shape
        coords = create_tissue_coordinates(n_spots)

        # Create count matrix with spatial patterns
        X = create_spatial_expression_data(coords, n_genes)

        # Create AnnData
        adata = ad.AnnData(X=X)
        adata.obs_names = [f"spot_{i:04d}" for i in range(n_spots)]
        adata.var_names = [f"gene_{i:04d}" for i in range(n_genes)]
        adata.obsm['spatial'] = coords

        # Create data manager and load data
        data_manager = DataManagerV2(workspace_path=workspace_path)
        data_manager.modalities['visium_spatial'] = adata

        yield data_manager


def create_tissue_coordinates(n_spots: int) -> np.ndarray:
    """Create realistic tissue-shaped spatial coordinates."""
    # Create circular tissue section
    coords = []
    radius = 20
    center = [25, 25]

    for i in range(50):
        for j in range(50):
            x, y = i, j
            distance = np.sqrt((x - center[0])**2 + (y - center[1])**2)
            if distance <= radius and len(coords) < n_spots:
                # Add some noise
                x += np.random.normal(0, 0.1)
                y += np.random.normal(0, 0.1)
                coords.append([x, y])

    return np.array(coords[:n_spots])


def create_spatial_expression_data(coords: np.ndarray, n_genes: int) -> np.ndarray:
    """Create expression data with spatial patterns."""
    n_spots = coords.shape[0]
    X = np.random.negative_binomial(3, 0.3, (n_spots, n_genes))

    # Add spatial gradients to some genes
    for g in range(0, min(100, n_genes), 10):  # Every 10th gene in first 100
        # Create spatial gradient
        gradient = coords[:, 0] / coords[:, 0].max()  # X-gradient
        spatial_effect = (gradient * 2 + 1).astype(int)  # 1x to 3x multiplier
        X[:, g] = X[:, g] * spatial_effect

    return X.astype(float)


@pytest.mark.integration
def test_full_spatial_workflow(full_spatial_workspace):
    """Test complete spatial analysis workflow."""

    # Create agent client
    client = AgentClient(data_manager=full_spatial_workspace)

    # Test 1: Check data status
    result1 = client.query("Check if spatial data is loaded and ready for analysis")
    assert result1['success']
    assert "spatial" in result1['response'].lower()

    # Test 2: Calculate spatial statistics
    result2 = client.query("Calculate spatial statistics for the loaded spatial data")
    assert result2['success']
    assert "moran" in result2['response'].lower()

    # Test 3: Perform spatial clustering
    result3 = client.query("Perform spatial clustering to identify tissue domains using leiden clustering")
    assert result3['success']
    assert "cluster" in result3['response'].lower()

    # Test 4: Identify spatial domains
    result4 = client.query("Identify spatial domains using advanced spatial methods")
    assert result4['success']

    # Verify data was stored correctly
    modalities = full_spatial_workspace.list_modalities()
    assert len(modalities) >= 4  # Original + 3 analysis results


if __name__ == "__main__":
    pytest.main([__file__, "-v"])

Step 7: Testing Your Custom Agent

Run your tests to ensure everything works:

# Run unit tests
pytest tests/unit/test_spatial_omics_agent.py -v

# Run integration tests
pytest tests/integration/test_spatial_omics_integration.py -v

# Run all tests
make test

Step 8: Using Your Custom Agent

Now you can use your custom agent through Lobster AI:

# Start Lobster AI
lobster chat

# Use your custom agent
🦞 You: "I have spatial transcriptomics data from a Visium experiment. Can you help me analyze the spatial patterns and identify tissue domains?"

Expected Agent Handoff:

🦞 Lobster: I'll help you analyze your spatial transcriptomics data. Let me hand this over to our Spatial Omics Expert who specializes in spatial analysis.

🧬 Spatial Omics Expert: I'll analyze your Visium spatial transcriptomics data...

✓ Checking spatial data status...
✓ Found spatial coordinates for analysis
✓ Calculating spatial statistics and autocorrelation...
✓ Performing spatial clustering to identify tissue domains...
✓ Generating spatial visualizations...

Your Visium data analysis is complete! I've identified 8 distinct spatial domains with strong spatial coherence (0.85). The results show clear tissue organization with immune-rich and stromal regions.

Step 9: Advanced Features

Add Visualization Tools

Extend your agent with specialized visualization capabilities:

@tool
def create_spatial_plots(
    modality_name: str,
    plot_type: str = "spatial_clusters"
) -> str:
    """Create spatial-specific visualizations."""
    try:
        adata = data_manager.get_modality(modality_name)

        if plot_type == "spatial_clusters":
            # Create spatial scatter plot with cluster colors
            # Implementation would use squidpy.pl.spatial_scatter
            pass
        elif plot_type == "spatial_heatmap":
            # Create spatial heatmap of gene expression
            pass

        return f"✅ Spatial visualization created: {plot_type}"

    except Exception as e:
        return f"❌ Visualization failed: {str(e)}"

Add Integration Capabilities

@tool
def integrate_with_histology(
    modality_name: str,
    histology_image_path: str
) -> str:
    """Integrate spatial omics data with histology images."""
    try:
        # Load histology image and overlay spatial data
        # Implementation would use squidpy image processing
        return "✅ Histology integration complete"
    except Exception as e:
        return f"❌ Integration failed: {str(e)}"

Best Practices for Custom Agent Development

1. Follow the Service Pattern

Keep agents stateless
Implement logic in separate service classes
Return tuples from services: (result_adata, statistics)

2. Error Handling

Create specific exception classes
Handle edge cases gracefully
Provide helpful error messages

3. Logging and Provenance

Log all operations using data_manager.log_tool_usage()
Include parameter information
Track analysis history

4. Documentation

Write comprehensive docstrings
Include parameter descriptions
Provide usage examples

5. Testing

Create unit tests for services
Write integration tests for full workflows
Test edge cases and error conditions

Deployment and Distribution

Option 1: Local Development

Keep your custom agent in your local Lobster installation for personal use.

Option 2: Plugin System

Package your agent as a plugin that others can install:

# setup.py for your custom agent plugin
from setuptools import setup, find_packages

setup(
    name="lobster-spatial-omics",
    version="1.0.0",
    packages=find_packages(),
    install_requires=[
        "lobster-ai>=2.0.0",
        "squidpy>=1.3.0",
        "scanpy>=1.9.0"
    ],
    entry_points={
        "lobster_agents": [
            "spatial_omics = lobster_spatial_omics.agent:spatial_omics_expert"
        ]
    }
)

Option 3: Contribute to Core

Submit a pull request to add your agent to the main Lobster AI repository.

Next Steps

After creating your custom agent:

Extend Functionality: Add more specialized tools for your domain
Create Visualizations: Develop domain-specific plotting functions
Integration: Enable data exchange with other agents
Optimization: Profile and optimize performance for large datasets
Documentation: Create user guides and tutorials
Community: Share your agent with the Lobster community

Summary

You have successfully:

✅ Created a custom Spatial Omics Expert Agent
✅ Implemented the modular service pattern
✅ Integrated with Lobster's agent registry system
✅ Added comprehensive error handling and logging
✅ Created unit and integration tests
✅ Deployed and tested your custom agent
✅ Learned best practices for agent development

Your custom agent is now ready to handle spatial omics analysis tasks through natural language interaction, seamlessly integrated with Lobster AI's multi-agent system!

Agent Architecture Documentation - Technical details
API Documentation - Core API reference
Testing Guide - Comprehensive testing strategies
Examples Cookbook - More advanced patterns

PreviousExamples Cookbook

NextProteomics Analysis Tutorial

Creating Custom Agents Tutorial

On this page