Testing Agents

Reference Document: This page provides detailed reference for testing agent packages.

For step-by-step agent creation, see Creating Agent Packages.

This guide covers testing patterns for Lobster agent packages. The lobster.testing module provides mock objects, fixtures, and contract validation mixins to ensure your agents work correctly.

The lobster.testing Module

The lobster.testing module provides everything you need to test agents without real LLM calls or data persistence:

from lobster.testing import (
    # Mock objects
    MockDataManager,
    MockLLM,
    MockLLMResponse,
    MockProvenanceTracker,
    # Contract validation
    AgentContractTestMixin,
    # Fixtures
    create_test_workspace,
)

MockDataManager

MockDataManager provides a fully mocked DataManagerV2 with in-memory behavior:

from pathlib import Path
from lobster.testing import MockDataManager

# Create mock data manager (workspace_path MUST be Path, not str)
dm = MockDataManager(workspace_path=Path("/tmp/test_workspace"))

# Add test data
import anndata as ad
import numpy as np

adata = ad.AnnData(X=np.random.randn(100, 50))
dm.add_modality("test_data", adata)

# Use in tests
assert dm.has_modality("test_data")
assert "test_data" in dm.list_modalities()
retrieved = dm.get_modality("test_data")
assert retrieved.shape == (100, 50)

Key Features

In-memory storage: No disk I/O, fast tests
Tool usage tracking: Records all log_tool_usage() calls
Provenance tracking: Mock ProvenanceTracker for IR validation
Plot management: Stores plot data in memory

Critical: workspace_path must be a Path instance, not a string. This enforces the Phase 5 decision (05-02) that workspace_path is always Path type.

# CORRECT
dm = MockDataManager(workspace_path=Path("/tmp/test"))

# WRONG - raises TypeError
dm = MockDataManager(workspace_path="/tmp/test")

Tracking Tool Usage

Verify your agent tools log provenance correctly:

from lobster.testing import MockDataManager
from pathlib import Path

dm = MockDataManager(workspace_path=Path("/tmp/test"))

# Your tool calls dm.log_tool_usage(...)
dm.log_tool_usage(
    tool_name="cluster_modality",
    params={"resolution": 0.5},
    stats={"n_clusters": 10},
    ir=mock_analysis_step,
)

# Verify in tests
history = dm.get_tool_usage_history()
assert len(history) == 1
assert history[0]["tool_name"] == "cluster_modality"
assert history[0]["ir"] is not None  # IR is mandatory

MockLLM

MockLLM simulates LLM responses without API calls:

from lobster.testing import MockLLM

# Basic usage
llm = MockLLM(default_response="Analysis complete")
response = llm.invoke("Cluster the data")
assert response.content == "Analysis complete"

# Keyword-based responses
llm = MockLLM(default_response="Unknown request")
llm.add_response("cluster", "Clustering started")
llm.add_response("annotate", "Annotation complete")

r1 = llm.invoke("Please cluster cells")  # Contains "cluster"
assert r1.content == "Clustering started"

r2 = llm.invoke("Annotate cell types")  # Contains "annotate"
assert r2.content == "Annotation complete"

Response Sequences

For multi-turn conversations:

from lobster.testing import MockLLM

llm = MockLLM()
llm.set_response_sequence([
    "First, I'll check the data quality.",
    "Now clustering the cells.",
    "Found 12 clusters. Running annotation.",
    "Analysis complete!"
])

# Each invoke() returns the next response in sequence
r1 = llm.invoke("Start analysis")
assert r1.content == "First, I'll check the data quality."

r2 = llm.invoke("Continue")
assert r2.content == "Now clustering the cells."

# After sequence exhausts, falls back to default_response

Call Tracking

Verify your agent sends correct prompts:

from lobster.testing import MockLLM

llm = MockLLM()
llm.invoke("Analyze GSE12345")
llm.invoke("Find marker genes")

# Check call history
assert llm.get_call_count() == 2
assert "GSE12345" in llm.get_last_prompt()

# Full history
for call in llm.call_history:
    print(f"Prompt: {call['prompt'][:50]}...")

AgentContractTestMixin

The AgentContractTestMixin validates that your agent follows the Plugin Contract:

from lobster.testing import AgentContractTestMixin


class TestMyAgent(AgentContractTestMixin):
    """Test my_agent follows the plugin contract."""

    # Required: module containing AGENT_CONFIG
    agent_module = "lobster.agents.my_domain.my_agent"

    # Required: factory function name
    factory_name = "my_agent"

    # Optional: if factory is in different module than AGENT_CONFIG
    # factory_module = "lobster.agents.my_domain.my_agent_impl"

    # Optional: verify specific tier
    # expected_tier = "free"

What It Validates

The mixin provides these test methods (run automatically with pytest):

Test Method	What It Checks
`test_factory_has_standard_params`	Factory has `data_manager`, `callback_handler`, `agent_name`, `delegation_tools`, `workspace_path`
`test_no_deprecated_handoff_tools`	Factory doesn't use deprecated `handoff_tools` param
`test_agent_config_exists`	`AGENT_CONFIG` is defined at module level
`test_agent_config_has_name`	`AGENT_CONFIG.name` is set
`test_agent_config_has_tier_requirement`	`AGENT_CONFIG.tier_requirement` is valid ("free", "premium", "enterprise")

Running Contract Tests

# Run contract tests for your package
pytest tests/test_contract.py -v

# Example output:
# test_contract.py::TestMyAgent::test_factory_has_standard_params PASSED
# test_contract.py::TestMyAgent::test_no_deprecated_handoff_tools PASSED
# test_contract.py::TestMyAgent::test_agent_config_exists PASSED
# test_contract.py::TestMyAgent::test_agent_config_has_name PASSED
# test_contract.py::TestMyAgent::test_agent_config_has_tier_requirement PASSED

Split Module Pattern

For packages where AGENT_CONFIG is in the agent file (standard pattern):

class TestTranscriptomicsExpert(AgentContractTestMixin):
    # AGENT_CONFIG in lobster/agents/transcriptomics/transcriptomics_expert.py
    agent_module = "lobster.agents.transcriptomics.transcriptomics_expert"

    # Factory in same module as AGENT_CONFIG
    factory_name = "transcriptomics_expert"

If you're testing via the __init__.py re-export (also valid):

class TestTranscriptomicsExpert(AgentContractTestMixin):
    # AGENT_CONFIG imported in lobster/agents/transcriptomics/__init__.py
    agent_module = "lobster.agents.transcriptomics"

    # Factory module where it's actually defined
    factory_module = "lobster.agents.transcriptomics.transcriptomics_expert"
    factory_name = "transcriptomics_expert"

Per-Package Test Structure

Each agent package should have its own test suite:

lobster-{domain}/
├── lobster/
│   └── agents/
│       └── {domain}/
│           └── ...
└── tests/
    ├── __init__.py
    ├── conftest.py          # Shared fixtures
    ├── test_contract.py     # Plugin contract validation
    ├── test_{agent}.py      # Agent-specific tests
    └── test_integration.py  # Integration tests

conftest.py Pattern

# tests/conftest.py
import pytest
from pathlib import Path
from lobster.testing import MockDataManager, MockLLM, create_test_workspace


@pytest.fixture
def workspace(tmp_path):
    """Create a test workspace."""
    return create_test_workspace(tmp_path)


@pytest.fixture
def mock_data_manager(workspace):
    """Create MockDataManager with test workspace."""
    return MockDataManager(workspace_path=workspace)


@pytest.fixture
def mock_llm():
    """Create MockLLM with default responses."""
    llm = MockLLM(default_response="Analysis complete")
    llm.add_response("error", "I encountered an error")
    return llm


@pytest.fixture
def sample_adata():
    """Create sample AnnData for testing."""
    from lobster.testing.fixtures import synthetic_single_cell_data
    return synthetic_single_cell_data(n_cells=100, n_genes=200)

Agent Test Pattern

# tests/test_my_agent.py
import pytest
from lobster.testing import MockDataManager, MockLLM
from lobster.agents.my_domain.my_agent import my_agent, AGENT_CONFIG


class TestMyAgent:
    """Test my_agent functionality."""

    def test_agent_config_fields(self):
        """Verify AGENT_CONFIG has expected values."""
        assert AGENT_CONFIG.name == "my_agent"
        assert AGENT_CONFIG.tier_requirement == "free"
        assert AGENT_CONFIG.factory_function.endswith("my_agent")

    def test_agent_creation(self, mock_data_manager):
        """Verify agent can be created."""
        agent = my_agent(
            data_manager=mock_data_manager,
            callback_handler=None,
            agent_name="test_my_agent",
            delegation_tools=[],
            workspace_path=mock_data_manager.workspace_path,
        )
        assert agent is not None

    def test_tool_logging(self, mock_data_manager, sample_adata):
        """Verify tools log usage correctly."""
        mock_data_manager.add_modality("test_data", sample_adata)

        # Run your tool...
        # my_tool(mock_data_manager, "test_data")

        # Verify logging
        history = mock_data_manager.get_tool_usage_history()
        assert len(history) >= 1
        assert history[0]["ir"] is not None  # IR is mandatory

Synthetic Test Data

The lobster.testing.fixtures module provides realistic synthetic data:

from lobster.testing.fixtures import (
    synthetic_single_cell_data,
    synthetic_bulk_rnaseq_data,
    synthetic_proteomics_data,
)

# Single-cell RNA-seq (sparse, cell types, batches)
adata_sc = synthetic_single_cell_data(
    n_cells=1000,
    n_genes=2000,
    sparsity=0.7,  # 70% zeros (typical for scRNA-seq)
)
assert "cell_type" in adata_sc.obs.columns
assert "batch" in adata_sc.obs.columns

# Bulk RNA-seq (balanced design, conditions)
adata_bulk = synthetic_bulk_rnaseq_data(
    n_samples=24,
    n_genes=2000,
)
assert "condition" in adata_bulk.obs.columns

# Proteomics (missing values, intensities)
adata_prot = synthetic_proteomics_data(
    n_samples=48,
    n_proteins=500,
    missing_rate=0.2,  # 20% NaN values
)

Running Tests and CI

Local Testing

# Install package in development mode
cd lobster-{domain}
pip install -e ".[dev]"

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=lobster.agents.{domain} --cov-report=term-missing

CI Integration

Add a test job to your GitHub Actions workflow:

# .github/workflows/test.yml
name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: |
          pip install -e ".[dev]"

      - name: Run tests
        run: |
          pytest tests/ -v --cov=lobster.agents.{domain}

      - name: Contract validation
        run: |
          pytest tests/test_contract.py -v

Next Steps

Review Package Structure for directory layout
Follow the Plugin Contract for API requirements
See Creating Agents for a quick start guide

PreviousPlugin Contract

NextConfiguration Guide

Testing Agents

On this page