10-category taxonomy for classifying Lobster AI tools — making the system introspectable, enforceable, and teachable

AQUADIF Tool Taxonomy

AQUADIF is the 10-category taxonomy for Lobster AI tools. Every tool declares what it does (category) and whether it must produce provenance — making the system introspectable, enforceable, and teachable to coding agents.

When designing tools for a new agent, internalize these categories first. Tools should be designed through the AQUADIF lens, not retrofitted with categories afterward.

Quick Reference

Category	Definition	Provenance Required	Example Tools
IMPORT	Load external data formats into workspace	Yes	`import_bulk_counts`, `load_10x_data`
QUALITY	Assess data integrity, calculate QC metrics	Yes	`assess_data_quality`, `detect_doublets`
FILTER	Subset data by removing samples or features	Yes	`filter_cells`, `filter_genes`
PREPROCESS	Transform data representation (normalize, scale, impute)	Yes	`normalize_counts`, `integrate_batches`
ANALYZE	Extract patterns, statistical tests, embeddings	Yes	`run_pca`, `cluster_cells`, `run_differential_expression`
ANNOTATE	Add biological meaning (cell types, gene labels)	Yes	`annotate_cell_types_auto`, `score_gene_signatures`
DELEGATE	Hand off to a specialist child agent	No	`handoff_to_annotation_expert`
SYNTHESIZE	Combine results across analyses	Yes	(future — no implementations yet)
UTILITY	Workspace management, status checks, export	No	`list_modalities`, `export_results`, `list_files`, `read_file`, `glob_files`, `grep_files`, `shell_execute`
CODE_EXEC	Custom code execution escape hatch	Conditional	`execute_custom_analysis`

Provenance required (7): IMPORT, QUALITY, FILTER, PREPROCESS, ANALYZE, ANNOTATE, SYNTHESIZE

Provenance not required (3): DELEGATE, UTILITY, CODE_EXEC (conditional — required if code modifies data)

Metadata Assignment Pattern

After creating a tool with the @tool decorator, assign AQUADIF metadata immediately after the decorator closes. This must happen after the decorator, not before.

from langchain_core.tools import tool

@tool
def assess_quality(modality_name: str) -> str:
    """Assess data quality for a modality.

    Args:
        modality_name: Name of the dataset to assess

    Returns:
        Summary of QC metrics and data fitness
    """
    adata = data_manager.get_modality(modality_name)
    result, stats, ir = quality_service.assess(adata)
    data_manager.log_tool_usage("assess_quality", {"modality_name": modality_name}, stats, ir=ir)
    return f"QC complete: {stats}"

# AQUADIF metadata assignment — MUST happen AFTER @tool decorator
assess_quality.metadata = {
    "categories": ["QUALITY"],  # 1-3 categories, first = primary
    "provenance": True           # True if primary category requires provenance
}
assess_quality.tags = ["QUALITY"]  # Same as categories — required for callback propagation

Three key rules:

Max 3 categories per tool — Use the primary category plus up to 2 secondary categories for substantial additional functionality. Primary category is always first and determines the provenance requirement.
String literals only — Use "ANALYZE" not AquadifCategory.ANALYZE. Enum imports in tool files are unnecessary coupling.
.metadata and .tags must match — LangChain callbacks receive .tags but not .metadata. Both fields must always contain the same category list.

Provenance Rules

Provenance tracks what each tool did to the data, enabling reproducibility and audit trails. Tools declare whether they produce provenance via the provenance boolean in .metadata.

Which tools require provenance

Any tool that transforms, loads, or analyzes data must call log_tool_usage() with an ir parameter (an AnalysisStep object from the service):

# CORRECT: IR passed explicitly
result, stats, ir = service.analyze(adata)
data_manager.log_tool_usage("analyze_modality", params, stats, ir=ir)

# INCORRECT: IR missing — contract tests will fail
result, stats, ir = service.analyze(adata)
data_manager.log_tool_usage("analyze_modality", params, stats)  # missing ir=

Which tools do not require provenance

DELEGATE and UTILITY tools do not log provenance. They either hand off to another agent (which tracks its own provenance) or provide read-only information:

@tool
def list_modalities() -> str:
    """List available datasets."""
    modalities = data_manager.list_modalities()
    return f"Available: {modalities}"

list_modalities.metadata = {"categories": ["UTILITY"], "provenance": False}
list_modalities.tags = ["UTILITY"]

Hollow provenance (`ir=None`)

When a tool is categorized as provenance-required (e.g., ANALYZE) but the service does not yet return a full AnalysisStep, pass ir=None as a bridge:

result, stats, _ = service.visualize(adata)
data_manager.log_tool_usage("create_umap", params, stats, ir=None)

create_umap.metadata = {"categories": ["ANALYZE"], "provenance": True}
create_umap.tags = ["ANALYZE"]

This satisfies the contract test's AST check (ir= keyword present) while acknowledging the provenance gap. Full IR should be wired when the service is updated.

Contract Testing

Lobster AI enforces AQUADIF compliance via automated contract tests that run on every CI push.

AgentContractTestMixin

The AgentContractTestMixin (14 test methods) validates every aspect of your agent's AQUADIF compliance. Use it by subclassing:

from lobster.testing import AgentContractTestMixin


class TestMyExpert(AgentContractTestMixin):
    """AQUADIF contract tests for my_expert agent."""

    agent_module = "lobster.agents.mydomain.my_expert"
    factory_name = "my_expert"
    is_parent_agent = True  # Set False for child agents or data-prep agents

What the 14 tests validate:

Category	Tests
Metadata presence	`.metadata` dict exists with `categories` and `provenance` keys
Category validity	All categories are from the AQUADIF 10-category set
Category constraints	Max 3 categories per tool; no duplicates
Provenance compliance	`provenance` boolean matches the primary category's requirement
Provenance call	Provenance-required tools contain `log_tool_usage(ir=ir)` call (AST check)
Tags consistency	`.tags` matches `.metadata["categories"]`
Parent agent MVP	Parent agents have at least one IMPORT + QUALITY + (ANALYZE or DELEGATE)
Ordering bypass	Primary category is actually provenance-required (prevents ordering tricks)

Running Contract Tests

# Run contract tests for all agents (CI uses this)
pytest -m contract

# Run for a specific package
pytest -m contract -k transcriptomics

# Run with verbose output
pytest -m contract -v

For setup details and fixture patterns, see Testing.

Runtime Monitoring

The AquadifMonitor service (lobster/core/aquadif_monitor.py) tracks AQUADIF compliance at runtime. It is wired into the callback chain automatically — plugin authors do not need to interact with it directly. Your tools are monitored once .metadata and .tags are assigned.

What it tracks per session:

Category distribution — count of tool invocations by category
Provenance status — per-tool status: real_ir (genuine AnalysisStep), hollow_ir (ir=None), or missing (no provenance call observed)
CODE_EXEC log — bounded circular buffer of custom code executions with agent attribution
Session summary — structured dict for cloud observability (get_session_summary())

How it's wired:

client.py constructs AquadifMonitor at session start
graph.py builds a tool_name → categories lookup from all agent tools and populates the monitor
TokenTrackingCallback.on_tool_start calls monitor.record_tool_invocation() (single injection point — no double-counting)
DataManagerV2.log_tool_usage calls monitor.record_provenance_call() — provenance detected by observation

Design: Pure stdlib (threading, deque), fail-open (monitor errors never crash tool invocations), thread-safe, bounded data structures.

Category Decision Guide

Use these rules to choose the right category when the answer isn't obvious.

The 80% Rule

If 80% or more of a tool's logic belongs to one category, use only that category. Secondary categories should only be added for substantial additional functionality.

A normalization tool that also filters out zero-variance genes: PREPROCESS primary, FILTER secondary
A quality assessment tool that calculates metrics and makes a plot: QUALITY primary (the plot is incidental infrastructure)

Boundary Cases

FILTER vs PREPROCESS

FILTER: Removes data elements (rows/columns). Output has fewer elements than input.
PREPROCESS: Transforms values within elements. Output has same elements, but values change.

QUALITY vs ANALYZE

QUALITY: Assesses fitness for purpose. Answers "Is this data good enough to analyze?"
ANALYZE: Extracts scientific patterns. Answers "What biological structure exists in the data?"

ANALYZE vs ANNOTATE

ANALYZE: Computes patterns, clusters, or statistical relationships from data.
ANNOTATE: Assigns biological meaning using external knowledge (ontologies, references, markers).

When to use UTILITY Use UTILITY for read-only tools that do not transform or analyze scientific data: listing, status checks, workspace inspection, file export. If a tool calls log_tool_usage() but only reads (not transforms), it is still UTILITY — the provenance: False declaration is what matters.

When to use DELEGATE Use DELEGATE only for inter-agent handoff tools created by graph.py's _create_lazy_delegation_tool. These are auto-tagged at creation time; you do not need to assign them manually.

Quick Decision Table

If your tool...	Category
Reads files from disk or downloads data into AnnData	IMPORT
Calculates QC metrics, checks data fitness, detects artifacts	QUALITY
Removes rows/columns, subsets observations or features	FILTER
Normalizes, batch corrects, scales, imputes, reshapes values	PREPROCESS
Clusters, embeds, runs statistics, computes trajectories	ANALYZE
Assigns labels using ontologies, references, or ID mappings	ANNOTATE
Hands off to a child agent	DELEGATE
Combines results from multiple analyses into interpretation	SYNTHESIZE
Lists datasets, shows status, exports files (read-only ops)	UTILITY
Executes arbitrary user code	CODE_EXEC

Next Steps

Testing

Set up AgentContractTestMixin and run AQUADIF contract tests locally

Plugin Contract

Full API contract: AGENT_CONFIG, factory signature, entry points

Package Structure

Directory layout, PEP 420 namespace, and scaffold generator

AQUADIF Tool Taxonomy

Next Steps

On this page