Testing Guide - Lobster AI Testing Framework
This guide provides comprehensive documentation for the Lobster AI testing framework, targeting 95%+ code coverage with scientifically accurate testing scena...
🎯 Overview
This guide provides comprehensive documentation for the Lobster AI testing framework, targeting 95%+ code coverage with scientifically accurate testing scenarios. The testing infrastructure covers unit tests, integration tests, system tests, and performance benchmarks across all bioinformatics workflows.
🏗️ Testing Architecture
Test Categories
1. Unit Tests (tests/unit/)
- Purpose: Test individual functions, classes, and methods in isolation
- Duration: ~2 minutes for full suite
- Coverage: Individual components (services, agents, utilities)
- Execution:
pytest tests/unit/
2. Integration Tests (tests/integration/)
- Purpose: Test component interactions and workflows
- Duration: ~15 minutes for full suite
- Coverage: Agent-service integration, data flow validation
- Execution:
pytest tests/integration/
3. System Tests (tests/system/)
- Purpose: Test complete end-to-end workflows
- Duration: ~30 minutes for full suite
- Coverage: Full analysis pipelines, CLI interactions
- Execution:
pytest tests/system/
4. Performance Tests (tests/performance/)
- Purpose: Benchmark performance and memory usage
- Duration: ~45 minutes for full suite
- Coverage: Large dataset handling, algorithmic efficiency
- Execution:
pytest tests/performance/
Directory Structure
tests/
├── 📁 unit/ # Unit tests (20+ files)
│ ├── core/ # Core system components
│ │ ├── test_data_manager_v2.py
│ │ ├── test_client.py
│ │ └── test_adapters.py
│ ├── agents/ # AI agent functionality
│ │ ├── test_data_expert.py (tests agents/data_expert/ module)
│ │ └── test_transcriptomics_expert.py
│ ├── services/ # Analysis services (7+ files)
│ │ ├── test_quality_service.py
│ │ ├── test_clustering_service.py
│ │ └── test_differential_service.py
│ └── tools/ # Analysis tools (12+ files)
├── 📁 integration/ # Integration tests (5 files)
│ ├── test_agent_workflows.py
│ ├── test_data_pipelines.py
│ └── test_service_chains.py
├── 📁 system/ # System tests (3 files)
│ ├── test_end_to_end.py
│ ├── test_cli_commands.py
│ └── test_multi_modal.py
├── 📁 performance/ # Performance tests (3 files)
├── 📁 mock_data/ # Synthetic data generation
│ ├── generators.py # High-level data generators
│ ├── factories.py # Data factory classes
│ └── base.py # Base configurations
├── conftest.py # Global fixtures and configuration
└── README.md # Testing documentation🔧 Test Configuration
Pytest Configuration (pytest.ini)
[tool:pytest]
minversion = 6.0
addopts =
--strict-markers
--disable-warnings
--verbose
--tb=short
--cov=lobster
--cov-report=html:htmlcov
--cov-report=xml:coverage.xml
--cov-report=term-missing
--cov-fail-under=80
--durations=10
testpaths = tests
markers =
unit: mark test as a unit test
integration: mark test as an integration test
system: mark test as a system test
performance: mark test as a performance benchmark
slow: mark test as slow running
requires_gpu: mark test as requiring GPU
requires_network: mark test as requiring network access
filterwarnings =
ignore::DeprecationWarning
ignore::PendingDeprecationWarning
ignore::FutureWarning:scanpyGlobal Test Fixtures (conftest.py)
# Core fixtures available to all tests
import pytest
import tempfile
from pathlib import Path
from unittest.mock import Mock
from lobster.core.data_manager_v2 import DataManagerV2
from lobster.core.client import AgentClient
from tests.mock_data.generators import (
generate_synthetic_single_cell,
generate_synthetic_bulk_rnaseq,
generate_synthetic_proteomics
)
@pytest.fixture
def temp_workspace():
"""Create temporary workspace for testing."""
with tempfile.TemporaryDirectory(prefix="lobster_test_") as tmpdir:
workspace_path = Path(tmpdir)
yield workspace_path
@pytest.fixture
def mock_data_manager(temp_workspace):
"""Create DataManagerV2 instance with temporary workspace."""
return DataManagerV2(workspace_path=temp_workspace)
@pytest.fixture
def sample_single_cell_data():
"""Generate synthetic single-cell data for testing."""
return generate_synthetic_single_cell(
n_cells=100,
n_genes=50,
n_cell_types=3
)
@pytest.fixture
def sample_bulk_data():
"""Generate synthetic bulk RNA-seq data."""
return generate_synthetic_bulk_rnaseq(
n_samples=12,
n_genes=100
)
@pytest.fixture
def sample_proteomics_data():
"""Generate synthetic proteomics data."""
return generate_synthetic_proteomics(
n_samples=20,
n_proteins=80,
missing_rate=0.3
)
@pytest.fixture
def mock_agent_client(mock_data_manager):
"""Create mock AgentClient for testing."""
client = Mock(spec=AgentClient)
client.data_manager = mock_data_manager
return client
@pytest.fixture(scope="session")
def test_config():
"""Test configuration parameters."""
return {
'timeout': 300,
'max_memory': '2GB',
'test_data_size': 'medium',
'enable_gpu': False
}🧪 Writing Unit Tests
Service Unit Test Template
# tests/unit/tools/test_your_service.py
import pytest
import numpy as np
import pandas as pd
from unittest.mock import Mock, patch, MagicMock
from lobster.tools.your_service import YourService, YourServiceError
from tests.mock_data.generators import generate_synthetic_single_cell
class TestYourService:
"""Comprehensive unit tests for YourService."""
@pytest.fixture
def service(self):
"""Create service instance for testing."""
return YourService()
@pytest.fixture
def mock_adata(self):
"""Create mock AnnData for testing."""
return generate_synthetic_single_cell(n_cells=50, n_genes=30)
def test_service_initialization(self, service):
"""Test service initializes correctly."""
assert service is not None
assert service.progress_callback is None
assert service.current_progress == 0
assert hasattr(service, 'total_steps')
def test_progress_callback_setting(self, service):
"""Test progress callback functionality."""
callback_calls = []
def mock_callback(progress, message):
callback_calls.append((progress, message))
service.set_progress_callback(mock_callback)
assert service.progress_callback is not None
# Test progress update
service.total_steps = 2
service._update_progress("Test step")
assert len(callback_calls) == 1
assert callback_calls[0][0] == 50 # 50% progress
assert "Test step" in callback_calls[0][1]
def test_main_analysis_success(self, service, mock_adata):
"""Test successful analysis execution."""
# Test with default parameters
result_adata, statistics = service.main_analysis_method(
mock_adata,
parameter1=1.0,
parameter2="default"
)
# Validate results structure
assert result_adata is not None
assert isinstance(statistics, dict)
# Validate AnnData structure preservation
assert result_adata.n_obs == mock_adata.n_obs
assert result_adata.n_vars == mock_adata.n_vars
# Validate statistics content
required_stats = ['n_observations', 'n_features', 'analysis_timestamp']
for stat in required_stats:
assert stat in statistics
# Validate analysis metadata stored
assert 'your_analysis' in result_adata.uns
assert result_adata.uns['your_analysis']['method'] == 'default'
def test_parameter_validation(self, service, mock_adata):
"""Test comprehensive parameter validation."""
# Test invalid parameter1 (should be positive)
with pytest.raises(ValueError, match="Parameter1 must be positive"):
service.main_analysis_method(mock_adata, parameter1=-1.0)
with pytest.raises(ValueError, match="Parameter1 must be positive"):
service.main_analysis_method(mock_adata, parameter1=0.0)
# Test invalid parameter2 (should be from allowed options)
with pytest.raises(ValueError, match="Invalid parameter2"):
service.main_analysis_method(mock_adata, parameter2="invalid_option")
# Test empty parameter3 list
with pytest.raises(ValueError, match="Parameter3 cannot be empty list"):
service.main_analysis_method(mock_adata, parameter3=[])
def test_empty_data_handling(self, service):
"""Test handling of edge cases in data."""
# Empty observations
empty_obs_adata = generate_synthetic_single_cell(n_cells=0, n_genes=10)
with pytest.raises(ValueError, match="Input data is empty"):
service.main_analysis_method(empty_obs_adata)
# Empty features
empty_vars_adata = generate_synthetic_single_cell(n_cells=10, n_genes=0)
with pytest.raises(ValueError, match="Input data has no features"):
service.main_analysis_method(empty_vars_adata)
def test_statistical_accuracy(self, service, mock_adata):
"""Test statistical calculations are mathematically correct."""
result_adata, statistics = service.main_analysis_method(mock_adata)
# Verify basic statistics match input data
assert statistics['n_observations'] == mock_adata.n_obs
assert statistics['n_features'] == mock_adata.n_vars
# Test statistical calculations if applicable
if 'mean_expression' in statistics:
expected_mean = np.mean(mock_adata.X)
np.testing.assert_almost_equal(
statistics['mean_expression'],
expected_mean,
decimal=5
)
def test_error_propagation(self, service, mock_adata, monkeypatch):
"""Test error handling and propagation."""
# Mock internal method to raise exception
def mock_preprocess_error(*args, **kwargs):
raise RuntimeError("Preprocessing failed")
monkeypatch.setattr(service, '_preprocess_data', mock_preprocess_error)
# Should wrap in service-specific error
with pytest.raises(YourServiceError, match="Unexpected error"):
service.main_analysis_method(mock_adata)
def test_reproducibility(self, service, mock_adata):
"""Test that analyses are reproducible."""
# Run same analysis twice
result1_adata, stats1 = service.main_analysis_method(
mock_adata, parameter1=1.5, parameter2="option1"
)
result2_adata, stats2 = service.main_analysis_method(
mock_adata.copy(), parameter1=1.5, parameter2="option2"
)
# Results should be deterministic for same inputs
if 'option1' == 'option1': # Same parameters
np.testing.assert_array_almost_equal(
result1_adata.X, result1_adata.X # Compare with self for structure
)
@pytest.mark.parametrize("param1,param2,expected_error", [
(-1.0, "default", ValueError),
(1.0, "invalid", ValueError),
(1.0, "default", None), # Should succeed
])
def test_parameter_combinations(self, service, mock_adata, param1, param2, expected_error):
"""Test various parameter combinations."""
if expected_error:
with pytest.raises(expected_error):
service.main_analysis_method(mock_adata, parameter1=param1, parameter2=param2)
else:
result_adata, statistics = service.main_analysis_method(
mock_adata, parameter1=param1, parameter2=param2
)
assert result_adata is not None
assert isinstance(statistics, dict)
def test_memory_efficiency(self, service):
"""Test memory usage doesn't grow excessively."""
import psutil
import os
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss
# Run analysis on moderately sized data
large_adata = generate_synthetic_single_cell(n_cells=1000, n_genes=500)
result_adata, statistics = service.main_analysis_method(large_adata)
final_memory = process.memory_info().rss
memory_growth = final_memory - initial_memory
# Memory growth should be reasonable (less than 500MB for this test)
assert memory_growth < 500 * 1024 * 1024 # 500MB threshold
def test_concurrent_usage(self, service, mock_adata):
"""Test service can be used concurrently (stateless requirement)."""
import threading
results = []
errors = []
def run_analysis(data):
try:
result = service.main_analysis_method(data.copy())
results.append(result)
except Exception as e:
errors.append(e)
# Run multiple analyses concurrently
threads = []
for i in range(3):
thread = threading.Thread(target=run_analysis, args=(mock_adata,))
threads.append(thread)
thread.start()
# Wait for all to complete
for thread in threads:
thread.join()
# Should have no errors and expected number of results
assert len(errors) == 0
assert len(results) == 3Agent Unit Test Template
# tests/unit/agents/test_your_agent.py
import pytest
from unittest.mock import Mock, patch, MagicMock
from lobster.agents.your_agent import your_agent_factory
from lobster.core.data_manager_v2 import DataManagerV2
class TestYourAgent:
"""Unit tests for YourAgent."""
@pytest.fixture
def mock_data_manager(self):
"""Create mock DataManagerV2."""
mock_dm = Mock(spec=DataManagerV2)
mock_dm.list_modalities.return_value = []
mock_dm.get_modality.return_value = None
mock_dm.log_tool_usage = Mock()
return mock_dm
@pytest.fixture
def agent(self, mock_data_manager):
"""Create agent instance for testing."""
with patch('lobster.agents.your_agent.get_settings') as mock_settings:
mock_settings.return_value.get_agent_llm_params.return_value = {
'model': 'test-model',
'temperature': 0.1
}
return your_agent_factory(mock_data_manager)
def test_agent_creation(self, mock_data_manager):
"""Test agent factory creates agent successfully."""
with patch('lobster.agents.your_agent.get_settings') as mock_settings:
mock_settings.return_value.get_agent_llm_params.return_value = {}
agent = your_agent_factory(mock_data_manager)
assert agent is not None
def test_agent_tools_available(self, agent):
"""Test that agent has expected tools."""
# Agent tools are typically accessible via agent.get_graph().nodes
# Implementation depends on LangGraph structure
# This is a conceptual test - actual implementation may vary
tools = getattr(agent, 'tools', [])
tool_names = [tool.name for tool in tools if hasattr(tool, 'name')]
expected_tools = ['check_available_modalities', 'perform_domain_analysis']
for expected_tool in expected_tools:
# Check if tool exists (test implementation may need adjustment)
pass
@patch('lobster.agents.your_agent.YourService')
def test_tool_service_integration(self, mock_service_class, agent, mock_data_manager):
"""Test that agent tools properly integrate with services."""
# Setup mock service
mock_service = Mock()
mock_service.perform_analysis.return_value = (Mock(), {'metric': 1.0})
mock_service_class.return_value = mock_service
# Setup data manager
mock_adata = Mock()
mock_data_manager.list_modalities.return_value = ['test_data']
mock_data_manager.get_modality.return_value = mock_adata
# This test would need to be implemented based on how tools are exposed
# in the actual LangGraph agent structure🔗 Writing Integration Tests
Agent-Service Integration Template
# tests/integration/test_agent_service_integration.py
import pytest
from lobster.core.client import AgentClient
from lobster.core.data_manager_v2 import DataManagerV2
from tests.mock_data.generators import generate_synthetic_single_cell
class TestAgentServiceIntegration:
"""Integration tests for agent-service workflows."""
@pytest.fixture
def client_with_data(self, temp_workspace, sample_single_cell_data):
"""Create client with loaded data."""
data_manager = DataManagerV2(workspace_path=temp_workspace)
data_manager.modalities['test_data'] = sample_single_cell_data
client = AgentClient(data_manager=data_manager)
return client
def test_quality_assessment_workflow(self, client_with_data):
"""Test complete quality assessment workflow."""
# Request quality assessment
response = client_with_data.query(
"Assess the quality of test_data modality"
)
# Validate response structure
assert response['success'] is True
assert 'quality_assessed' in response['response']
# Check that new modality was created
modalities = client_with_data.data_manager.list_modalities()
assert any('quality_assessed' in mod for mod in modalities)
def test_clustering_analysis_workflow(self, client_with_data):
"""Test complete clustering workflow."""
# First, assess quality
quality_response = client_with_data.query(
"Assess quality of test_data"
)
assert quality_response['success']
# Then perform clustering
cluster_response = client_with_data.query(
"Perform clustering analysis on the quality-assessed data"
)
assert cluster_response['success']
assert 'clustered' in cluster_response['response']
def test_multi_agent_handoff(self, client_with_data):
"""Test handoffs between different agents."""
# Start with data expert
data_response = client_with_data.query(
"Load and prepare the test data for transcriptomics analysis"
)
assert data_response['success']
# Should handoff to transcriptomics expert
analysis_response = client_with_data.query(
"Now perform clustering and find marker genes"
)
assert analysis_response['success']
def test_error_handling_integration(self, client_with_data):
"""Test error handling across agent-service boundaries."""
# Request analysis on non-existent data
error_response = client_with_data.query(
"Analyze the modality called 'nonexistent_data'"
)
# Should handle error gracefully
assert error_response['success'] is False
assert 'not found' in error_response['response'].lower()🌐 Writing System Tests
End-to-End Test Template
# tests/system/test_end_to_end.py
import pytest
from pathlib import Path
import tempfile
from lobster.core.client import AgentClient
from lobster.core.data_manager_v2 import DataManagerV2
class TestEndToEndWorkflows:
"""System tests for complete analysis workflows."""
@pytest.fixture
def full_client_setup(self):
"""Set up complete client with real configuration."""
with tempfile.TemporaryDirectory() as tmpdir:
workspace = Path(tmpdir)
data_manager = DataManagerV2(workspace_path=workspace)
client = AgentClient(data_manager=data_manager)
yield client
def test_complete_single_cell_pipeline(self, full_client_setup):
"""Test complete single-cell analysis pipeline."""
client = full_client_setup
# Step 1: Load synthetic data
response1 = client.query(
"Generate synthetic single-cell data with 500 cells and 100 genes"
)
assert response1['success']
# Step 2: Quality assessment
response2 = client.query(
"Assess the quality of the generated data"
)
assert response2['success']
# Step 3: Preprocessing
response3 = client.query(
"Filter and normalize the data based on quality metrics"
)
assert response3['success']
# Step 4: Clustering
response4 = client.query(
"Perform clustering analysis and generate UMAP visualization"
)
assert response4['success']
# Step 5: Marker gene analysis
response5 = client.query(
"Find marker genes for each cluster"
)
assert response5['success']
# Validate final state
modalities = client.data_manager.list_modalities()
expected_stages = ['generated', 'quality_assessed', 'normalized', 'clustered']
for stage in expected_stages:
assert any(stage in mod for mod in modalities)
def test_bulk_rnaseq_differential_expression(self, full_client_setup):
"""Test bulk RNA-seq differential expression workflow."""
client = full_client_setup
# Generate bulk RNA-seq data with conditions
response1 = client.query(
"Generate bulk RNA-seq data with treatment and control conditions"
)
assert response1['success']
# Perform differential expression
response2 = client.query(
"Perform differential expression analysis between conditions"
)
assert response2['success']
# Validate results contain expected elements
assert 'differential' in response2['response']
assert any('differential' in mod for mod in client.data_manager.list_modalities())
@pytest.mark.slow
def test_large_dataset_handling(self, full_client_setup):
"""Test system performance with larger datasets."""
client = full_client_setup
# Generate larger dataset
response = client.query(
"Generate single-cell data with 10000 cells and 2000 genes"
)
assert response['success']
# Perform computationally intensive analysis
cluster_response = client.query(
"Perform clustering with high resolution and generate comprehensive visualizations"
)
assert cluster_response['success']📊 Performance Testing
Performance Test Template
# tests/performance/test_service_performance.py
import pytest
import time
import psutil
import os
from pathlib import Path
from lobster.tools.clustering_service import ClusteringService
from tests.mock_data.generators import generate_synthetic_single_cell
class TestServicePerformance:
"""Performance benchmarks for services."""
@pytest.mark.performance
def test_clustering_performance_small(self, benchmark):
"""Benchmark clustering on small dataset."""
service = ClusteringService()
adata = generate_synthetic_single_cell(n_cells=1000, n_genes=500)
def run_clustering():
return service.cluster_and_visualize(adata)
result = benchmark(run_clustering)
# Validate performance
assert benchmark.stats.stats.mean < 30.0 # Should complete in <30 seconds
@pytest.mark.performance
@pytest.mark.slow
def test_clustering_performance_large(self, benchmark):
"""Benchmark clustering on larger dataset."""
service = ClusteringService()
adata = generate_synthetic_single_cell(n_cells=10000, n_genes=2000)
def run_clustering():
return service.cluster_and_visualize(adata, demo_mode=True)
result = benchmark(run_clustering)
# Should still complete within reasonable time in demo mode
assert benchmark.stats.stats.mean < 120.0 # 2 minutes max
def test_memory_usage_monitoring(self):
"""Monitor memory usage during analysis."""
service = ClusteringService()
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss
# Run analysis
adata = generate_synthetic_single_cell(n_cells=5000, n_genes=1000)
result_adata, stats = service.cluster_and_visualize(adata)
peak_memory = process.memory_info().rss
memory_growth = peak_memory - initial_memory
# Memory growth should be reasonable (less than 2GB)
assert memory_growth < 2 * 1024**3
# Clean up and check memory returns close to initial
del adata, result_adata
import gc
gc.collect()
final_memory = process.memory_info().rss
memory_leak = final_memory - initial_memory
# Should not have significant memory leaks (less than 100MB)
assert memory_leak < 100 * 1024**2🛠️ Mock Data Generation
Synthetic Data Generators
# tests/mock_data/custom_generators.py
"""Custom generators for specific test scenarios."""
import numpy as np
import pandas as pd
import anndata as ad
from typing import Optional, Dict, Any
def generate_realistic_single_cell(
n_cells: int = 1000,
n_genes: int = 500,
cell_types: Optional[list] = None,
batch_effects: bool = False,
doublet_rate: float = 0.05
) -> ad.AnnData:
"""
Generate realistic single-cell data with biological features.
Args:
n_cells: Number of cells
n_genes: Number of genes
cell_types: List of cell type names
batch_effects: Whether to include batch effects
doublet_rate: Proportion of doublet cells
Returns:
AnnData with realistic single-cell features
"""
if cell_types is None:
cell_types = ['T_cells', 'B_cells', 'NK_cells', 'Monocytes']
# Generate base expression matrix with biological structure
np.random.seed(42)
# Create cell type-specific expression patterns
n_cell_types = len(cell_types)
cells_per_type = n_cells // n_cell_types
X = np.zeros((n_cells, n_genes))
cell_type_labels = []
for i, cell_type in enumerate(cell_types):
start_idx = i * cells_per_type
end_idx = start_idx + cells_per_type if i < n_cell_types - 1 else n_cells
# Base expression for this cell type
base_expression = np.random.negative_binomial(10, 0.3, size=(end_idx - start_idx, n_genes))
# Add cell type-specific marker genes
marker_genes = slice(i * 50, (i + 1) * 50) # 50 markers per type
base_expression[:, marker_genes] *= np.random.uniform(2, 5, size=(end_idx - start_idx, 50))
X[start_idx:end_idx, :] = base_expression
cell_type_labels.extend([cell_type] * (end_idx - start_idx))
# Create AnnData object
adata = ad.AnnData(X=X.astype(np.float32))
# Add cell metadata
adata.obs['cell_type'] = cell_type_labels
adata.obs['n_genes'] = (adata.X > 0).sum(axis=1)
adata.obs['total_counts'] = adata.X.sum(axis=1)
# Add mitochondrial genes
mt_genes = np.random.choice(n_genes, size=int(0.05 * n_genes), replace=False)
adata.var['mt'] = False
adata.var.iloc[mt_genes, adata.var.columns.get_loc('mt')] = True
# Calculate mitochondrial percentage
adata.obs['pct_counts_mt'] = (
adata[:, adata.var['mt']].X.sum(axis=1) / adata.obs['total_counts'] * 100
)
# Add batch effects if requested
if batch_effects:
n_batches = 3
batch_assignments = np.random.choice(n_batches, size=n_cells)
adata.obs['batch'] = [f'batch_{i}' for i in batch_assignments]
# Apply batch-specific scaling
for batch_id in range(n_batches):
batch_mask = batch_assignments == batch_id
batch_effect = np.random.uniform(0.8, 1.2)
adata.X[batch_mask, :] *= batch_effect
# Add doublets if requested
if doublet_rate > 0:
n_doublets = int(n_cells * doublet_rate)
doublet_indices = np.random.choice(n_cells, size=n_doublets, replace=False)
adata.obs['is_doublet'] = False
adata.obs.iloc[doublet_indices, adata.obs.columns.get_loc('is_doublet')] = True
# Doublets have higher total counts
adata.X[doublet_indices, :] *= np.random.uniform(1.5, 2.0, size=(n_doublets, 1))
# Add gene metadata
gene_names = [f'Gene_{i:04d}' for i in range(n_genes)]
adata.var.index = gene_names
adata.var['highly_variable'] = False
# Mark some genes as highly variable
hv_genes = np.random.choice(n_genes, size=int(0.2 * n_genes), replace=False)
adata.var.iloc[hv_genes, adata.var.columns.get_loc('highly_variable')] = True
return adata
def generate_differential_expression_data(
n_samples_per_group: int = 6,
n_genes: int = 1000,
n_de_genes: int = 100,
effect_size: float = 2.0
) -> ad.AnnData:
"""Generate bulk RNA-seq data with known differential expression."""
n_samples = n_samples_per_group * 2
# Generate base counts
X = np.random.negative_binomial(20, 0.3, size=(n_samples, n_genes))
# Add differential expression
de_gene_indices = np.random.choice(n_genes, size=n_de_genes, replace=False)
# Treatment group gets higher expression for DE genes
treatment_samples = slice(n_samples_per_group, n_samples)
X[treatment_samples, :][:, de_gene_indices] *= effect_size
# Create AnnData
adata = ad.AnnData(X=X.astype(np.float32))
# Add sample metadata
conditions = ['control'] * n_samples_per_group + ['treatment'] * n_samples_per_group
adata.obs['condition'] = conditions
adata.obs['sample_id'] = [f'sample_{i:02d}' for i in range(n_samples)]
# Add gene metadata
gene_names = [f'Gene_{i:04d}' for i in range(n_genes)]
adata.var.index = gene_names
adata.var['is_de'] = False
adata.var.iloc[de_gene_indices, adata.var.columns.get_loc('is_de')] = True
return adata🚀 Running Tests
Basic Test Execution
# Run all tests
make test
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests only
pytest tests/system/ # System tests only
pytest tests/performance/ # Performance tests only
# Run tests with specific markers
pytest -m "unit and not slow" # Fast unit tests only
pytest -m "integration" # Integration tests
pytest -m "performance" # Performance benchmarks
# Run specific test files
pytest tests/unit/test_clustering_service.py
pytest tests/integration/test_agent_workflows.py
# Run with coverage reporting
pytest --cov=lobster --cov-report=html
# Run with performance benchmarks
pytest --benchmark-only # Only benchmark tests
pytest --benchmark-compare # Compare with previous runsAdvanced Test Options
# Run tests in parallel (requires pytest-xdist)
pytest -n auto # Auto-detect CPU cores
pytest -n 4 # Use 4 processes
# Run with specific output formats
pytest --tb=long # Long traceback format
pytest --tb=short # Short traceback format
pytest -v # Verbose output
pytest -s # Don't capture stdout
# Run tests matching pattern
pytest -k "test_clustering" # Tests with 'clustering' in name
pytest -k "not slow" # Skip slow tests
# Run with timeout (requires pytest-timeout)
pytest --timeout=300 # 5 minute timeout per test
# Generate test reports
pytest --html=reports/report.html # HTML report
pytest --junitxml=reports/junit.xml # JUnit XML report🎯 End-to-End Acceptance Testing System
Lobster AI features a sophisticated 3-tier testing architecture that includes a comprehensive End-to-End acceptance testing system for validating complete user workflows.
🌟 Testing Architecture Overview
- Traditional pytest Framework - Unit/Integration/System/Performance tests
- End-to-End Acceptance Testing - Natural language workflow validation
- Hybrid Execution - Combined reporting and analytics
🚀 E2E Testing Components
Core Files
tests/test_cases.json- 30+ realistic user scenarios with validation criteriatests/run_integration_tests.py- Advanced test runner with performance monitoringtests/run_tests.sh- User-friendly bash wrapper for easy execution
Key Features
- Natural Language Testing: Validates actual conversational interface users experience
- Performance Monitoring: Real-time CPU, memory, disk I/O tracking during execution
- Scalable Architecture: Tag-based filtering, priorities, parallel execution
- Response Validation: Keyword matching, length checks, error detection
- Workspace Management: Isolated test environments with automatic cleanup
🎮 Quick Start Commands
# User-friendly bash wrapper (recommended)
./tests/run_tests.sh # Run all scenarios sequentially
./tests/run_tests.sh --parallel # Run in parallel
./tests/run_tests.sh --parallel -w 8 # 8 parallel workers
# Advanced Python runner with full control
python tests/run_integration_tests.py --categories basic,advanced --parallel
python tests/run_integration_tests.py --performance-monitoring --workers 4
python tests/run_integration_tests.py --run-pytest-integration --output results.json📊 Test Categorization & Filtering
# Filter by categories
python tests/run_integration_tests.py --categories basic,advanced,performance,error_handling
# Filter by biological domains
python tests/run_integration_tests.py --tags geo,proteomics,multiomics,spatial,qc
# Filter by priority levels (1-5)
python tests/run_integration_tests.py --priorities 1,2,3
# Combine filters for targeted testing
python tests/run_integration_tests.py --categories advanced --tags geo,qc --parallel🧪 Test Scenarios (30+ Realistic Workflows)
Categories:
basic- Simple workflows (GEO download, basic QC)advanced- Complex analysis (multi-omics, trajectory analysis)performance- Large dataset processingerror_handling- Edge cases and error recovery
Biological Domain Tags:
geo- GEO dataset workflowsqc- Quality control processesvisualization- Plotting and visual analysismultiomics- Cross-platform integrationspatial- Spatial transcriptomicsproteomics- Mass spec and affinity proteomicsclustering- Cell/sample grouping analysis
Example Test Scenarios:
{
"test_geo_download_with_qc_umap": {
"inputs": [
"Download GEO dataset GSE291670 and do the quality control",
"Generate the UMAP with resolution 0.7"
],
"category": "basic",
"description": "Test complete workflow from download to UMAP visualization",
"expected_duration": 120.0,
"timeout": 400.0,
"tags": ["geo", "qc", "umap", "visualization"],
"priority": 3,
"validation_criteria": {
"input_0": {
"required_keywords": ["quality control", "downloaded"],
"no_errors": true
},
"input_1": {
"required_keywords": ["UMAP", "resolution"],
"no_errors": true
}
}
}
}⚡ Performance Monitoring
The E2E system includes comprehensive performance monitoring:
# Enable performance monitoring
python tests/run_integration_tests.py --performance-monitoring
# Features monitored:
# - CPU usage percentage (average and peak)
# - Memory consumption (RSS, peak usage)
# - Disk I/O operations (read/write MB)
# - Network activity (sent/received MB)
# - Test execution duration vs expected
# - Resource usage trends across test categories🔄 Hybrid pytest Integration
Combine traditional pytest tests with E2E scenarios for comprehensive validation:
# Run both pytest and E2E tests together
python tests/run_integration_tests.py --run-pytest-integration
# Features:
# - Unified success/failure reporting
# - Combined coverage analytics
# - Category-wise performance breakdowns
# - Comprehensive JSON output with both test types🎯 Advanced E2E Features
Dependency Resolution
Tests can specify dependencies for automatic ordering:
{
"dependencies": ["test_geo_download", "test_basic_qc"],
"priority": 4
}Retry Logic
Configurable retry attempts for flaky tests:
{
"retry_count": 2,
"timeout": 300.0
}Response Validation
Sophisticated validation of AI responses:
{
"validation_criteria": {
"input_0": {
"required_keywords": ["downloaded", "GSE109564"],
"forbidden_keywords": ["error", "failed"],
"min_length": 50,
"no_errors": true
}
}
}📋 Adding New E2E Test Scenarios
Add realistic user scenarios to tests/test_cases.json:
{
"test_my_custom_workflow": {
"inputs": [
"Download GSE123456 and perform quality control",
"Apply batch correction using Harmony",
"Create publication-ready UMAP plot"
],
"category": "advanced",
"description": "Test batch correction workflow",
"tags": ["geo", "batch_correction", "visualization"],
"priority": 3,
"timeout": 600.0,
"expected_duration": 240.0,
"validation_criteria": {
"input_0": {
"required_keywords": ["downloaded", "quality control"],
"no_errors": true
},
"input_1": {
"required_keywords": ["batch correction", "Harmony"],
"no_errors": true
},
"input_2": {
"required_keywords": ["UMAP", "publication"],
"no_errors": true
}
}
}
}📊 E2E Test Results & Analytics
The E2E system generates comprehensive reports:
{
"summary": {
"test_execution_summary": {
"total_tests": 25,
"passed_tests": 23,
"failed_tests": 2,
"success_rate": 0.92,
"total_duration": 1800.0,
"average_duration": 72.0
},
"category_breakdown": {
"basic": {"passed": 8, "failed": 0, "total": 8},
"advanced": {"passed": 12, "failed": 2, "total": 14},
"performance": {"passed": 3, "failed": 0, "total": 3}
},
"performance_summary": {
"avg_cpu_percent": 15.2,
"avg_memory_mb": 1024.5,
"max_memory_mb": 2048.0
}
}
}📈 Coverage and Quality Metrics
Coverage Requirements
- Minimum Coverage: 80% (enforced by CI)
- Target Coverage: 95%
- Critical Components: 100% coverage required for core services and agents
Coverage Analysis
# Generate coverage report
pytest --cov=lobster --cov-report=html --cov-report=term
# View coverage in browser
open htmlcov/index.html
# Check coverage for specific modules
pytest --cov=lobster.tools --cov-report=term-missing
# Fail build if coverage below threshold
pytest --cov=lobster --cov-fail-under=80Quality Metrics
- Test Execution Time: Unit tests <2min, Integration <15min, System <30min
- Memory Usage: No test should use >2GB RAM
- Test Reliability: <1% flaky test rate
- Scientific Accuracy: All biological algorithms must be validated
🔍 Debugging Tests
Common Debugging Techniques
# Add debug logging in tests
import logging
logging.basicConfig(level=logging.DEBUG)
# Use pytest debugging
pytest --pdb # Drop to debugger on failures
pytest --pdbcls=IPython.terminal.debugger:TerminalPdb # Use IPython debugger
# Add debug prints (use capsys to capture)
def test_with_debug(capsys):
print("Debug information here")
# ... test code ...
captured = capsys.readouterr()
print(f"Captured output: {captured.out}")
# Temporary test isolation
pytest -x # Stop on first failure
pytest --lf # Run only last failed tests
pytest --ff # Run failures firstMock Data Debugging
# Inspect generated data
def test_inspect_mock_data():
adata = generate_synthetic_single_cell()
print(f"Shape: {adata.shape}")
print(f"Obs columns: {adata.obs.columns.tolist()}")
print(f"Var columns: {adata.var.columns.tolist()}")
print(f"Uns keys: {list(adata.uns.keys())}")
# Save for manual inspection
adata.write_h5ad('/tmp/debug_data.h5ad')🎯 Best Practices Summary
Test Design Principles
- Test Isolation: Each test should be independent
- Realistic Data: Use biologically plausible synthetic data
- Scientific Validation: Verify biological correctness, not just code correctness
- Performance Awareness: Monitor memory and time usage
- Error Coverage: Test both success and failure paths
Naming Conventions
class TestServiceName:
def test_method_success_case(self): # Happy path
pass
def test_method_edge_case_empty_data(self): # Edge cases
pass
def test_method_error_invalid_params(self): # Error conditions
pass
def test_method_performance_large_data(self): # Performance
passTest Organization
- One test class per component being tested
- Group related tests in the same class
- Use descriptive test names that explain the scenario
- Keep tests focused - one concept per test
- Use fixtures for common setup/teardown
This comprehensive testing guide ensures that the Lobster AI platform maintains high quality, reliability, and scientific accuracy across all bioinformatics workflows.
Developer Overview - Lobster AI Architecture
This guide provides a comprehensive introduction to developing within the Lobster AI codebase, covering architecture patterns, design principles, and develop...
25. Download Queue System
The Download Queue System is a structured handoff contract between researchagent and dataexpert, enabling coordinated dataset downloads with pre-valida...