AQUADIF Tool Taxonomy
10-category taxonomy for classifying Lobster AI tools — making the system introspectable, enforceable, and teachable
AQUADIF Tool Taxonomy
AQUADIF is the 10-category taxonomy for Lobster AI tools. Every tool declares what it does (category) and whether it must produce provenance — making the system introspectable, enforceable, and teachable to coding agents.
When designing tools for a new agent, internalize these categories first. Tools should be designed through the AQUADIF lens, not retrofitted with categories afterward.
Quick Reference
| Category | Definition | Provenance Required | Example Tools |
|---|---|---|---|
| IMPORT | Load external data formats into workspace | Yes | import_bulk_counts, load_10x_data |
| QUALITY | Assess data integrity, calculate QC metrics | Yes | assess_data_quality, detect_doublets |
| FILTER | Subset data by removing samples or features | Yes | filter_cells, filter_genes |
| PREPROCESS | Transform data representation (normalize, scale, impute) | Yes | normalize_counts, integrate_batches |
| ANALYZE | Extract patterns, statistical tests, embeddings | Yes | run_pca, cluster_cells, run_differential_expression |
| ANNOTATE | Add biological meaning (cell types, gene labels) | Yes | annotate_cell_types_auto, score_gene_signatures |
| DELEGATE | Hand off to a specialist child agent | No | handoff_to_annotation_expert |
| SYNTHESIZE | Combine results across analyses | Yes | (future — no implementations yet) |
| UTILITY | Workspace management, status checks, export | No | list_modalities, export_results, list_files, read_file, glob_files, grep_files, shell_execute |
| CODE_EXEC | Custom code execution escape hatch | Conditional | execute_custom_analysis |
Provenance required (7): IMPORT, QUALITY, FILTER, PREPROCESS, ANALYZE, ANNOTATE, SYNTHESIZE
Provenance not required (3): DELEGATE, UTILITY, CODE_EXEC (conditional — required if code modifies data)
Metadata Assignment Pattern
After creating a tool with the @tool decorator, assign AQUADIF metadata immediately after the decorator closes. This must happen after the decorator, not before.
from langchain_core.tools import tool
@tool
def assess_quality(modality_name: str) -> str:
"""Assess data quality for a modality.
Args:
modality_name: Name of the dataset to assess
Returns:
Summary of QC metrics and data fitness
"""
adata = data_manager.get_modality(modality_name)
result, stats, ir = quality_service.assess(adata)
data_manager.log_tool_usage("assess_quality", {"modality_name": modality_name}, stats, ir=ir)
return f"QC complete: {stats}"
# AQUADIF metadata assignment — MUST happen AFTER @tool decorator
assess_quality.metadata = {
"categories": ["QUALITY"], # 1-3 categories, first = primary
"provenance": True # True if primary category requires provenance
}
assess_quality.tags = ["QUALITY"] # Same as categories — required for callback propagationThree key rules:
-
Max 3 categories per tool — Use the primary category plus up to 2 secondary categories for substantial additional functionality. Primary category is always first and determines the provenance requirement.
-
String literals only — Use
"ANALYZE"notAquadifCategory.ANALYZE. Enum imports in tool files are unnecessary coupling. -
.metadataand.tagsmust match — LangChain callbacks receive.tagsbut not.metadata. Both fields must always contain the same category list.
Provenance Rules
Provenance tracks what each tool did to the data, enabling reproducibility and audit trails. Tools declare whether they produce provenance via the provenance boolean in .metadata.
Which tools require provenance
Any tool that transforms, loads, or analyzes data must call log_tool_usage() with an ir parameter (an AnalysisStep object from the service):
# CORRECT: IR passed explicitly
result, stats, ir = service.analyze(adata)
data_manager.log_tool_usage("analyze_modality", params, stats, ir=ir)
# INCORRECT: IR missing — contract tests will fail
result, stats, ir = service.analyze(adata)
data_manager.log_tool_usage("analyze_modality", params, stats) # missing ir=Which tools do not require provenance
DELEGATE and UTILITY tools do not log provenance. They either hand off to another agent (which tracks its own provenance) or provide read-only information:
@tool
def list_modalities() -> str:
"""List available datasets."""
modalities = data_manager.list_modalities()
return f"Available: {modalities}"
list_modalities.metadata = {"categories": ["UTILITY"], "provenance": False}
list_modalities.tags = ["UTILITY"]Hollow provenance (ir=None)
When a tool is categorized as provenance-required (e.g., ANALYZE) but the service does not yet return a full AnalysisStep, pass ir=None as a bridge:
result, stats, _ = service.visualize(adata)
data_manager.log_tool_usage("create_umap", params, stats, ir=None)
create_umap.metadata = {"categories": ["ANALYZE"], "provenance": True}
create_umap.tags = ["ANALYZE"]This satisfies the contract test's AST check (ir= keyword present) while acknowledging the provenance gap. Full IR should be wired when the service is updated.
Contract Testing
Lobster AI enforces AQUADIF compliance via automated contract tests that run on every CI push.
AgentContractTestMixin
The AgentContractTestMixin (14 test methods) validates every aspect of your agent's AQUADIF compliance. Use it by subclassing:
from lobster.testing import AgentContractTestMixin
class TestMyExpert(AgentContractTestMixin):
"""AQUADIF contract tests for my_expert agent."""
agent_module = "lobster.agents.mydomain.my_expert"
factory_name = "my_expert"
is_parent_agent = True # Set False for child agents or data-prep agentsWhat the 14 tests validate:
| Category | Tests |
|---|---|
| Metadata presence | .metadata dict exists with categories and provenance keys |
| Category validity | All categories are from the AQUADIF 10-category set |
| Category constraints | Max 3 categories per tool; no duplicates |
| Provenance compliance | provenance boolean matches the primary category's requirement |
| Provenance call | Provenance-required tools contain log_tool_usage(ir=ir) call (AST check) |
| Tags consistency | .tags matches .metadata["categories"] |
| Parent agent MVP | Parent agents have at least one IMPORT + QUALITY + (ANALYZE or DELEGATE) |
| Ordering bypass | Primary category is actually provenance-required (prevents ordering tricks) |
Running Contract Tests
# Run contract tests for all agents (CI uses this)
pytest -m contract
# Run for a specific package
pytest -m contract -k transcriptomics
# Run with verbose output
pytest -m contract -vFor setup details and fixture patterns, see Testing.
Runtime Monitoring
The AquadifMonitor service (lobster/core/aquadif_monitor.py) tracks AQUADIF compliance at runtime. It is wired into the callback chain automatically — plugin authors do not need to interact with it directly. Your tools are monitored once .metadata and .tags are assigned.
What it tracks per session:
- Category distribution — count of tool invocations by category
- Provenance status — per-tool status:
real_ir(genuine AnalysisStep),hollow_ir(ir=None), ormissing(no provenance call observed) - CODE_EXEC log — bounded circular buffer of custom code executions with agent attribution
- Session summary — structured dict for cloud observability (
get_session_summary())
How it's wired:
client.pyconstructsAquadifMonitorat session startgraph.pybuilds atool_name → categorieslookup from all agent tools and populates the monitorTokenTrackingCallback.on_tool_startcallsmonitor.record_tool_invocation()(single injection point — no double-counting)DataManagerV2.log_tool_usagecallsmonitor.record_provenance_call()— provenance detected by observation
Design: Pure stdlib (threading, deque), fail-open (monitor errors never crash tool invocations), thread-safe, bounded data structures.
Category Decision Guide
Use these rules to choose the right category when the answer isn't obvious.
The 80% Rule
If 80% or more of a tool's logic belongs to one category, use only that category. Secondary categories should only be added for substantial additional functionality.
- A normalization tool that also filters out zero-variance genes: PREPROCESS primary, FILTER secondary
- A quality assessment tool that calculates metrics and makes a plot: QUALITY primary (the plot is incidental infrastructure)
Boundary Cases
FILTER vs PREPROCESS
- FILTER: Removes data elements (rows/columns). Output has fewer elements than input.
- PREPROCESS: Transforms values within elements. Output has same elements, but values change.
QUALITY vs ANALYZE
- QUALITY: Assesses fitness for purpose. Answers "Is this data good enough to analyze?"
- ANALYZE: Extracts scientific patterns. Answers "What biological structure exists in the data?"
ANALYZE vs ANNOTATE
- ANALYZE: Computes patterns, clusters, or statistical relationships from data.
- ANNOTATE: Assigns biological meaning using external knowledge (ontologies, references, markers).
When to use UTILITY
Use UTILITY for read-only tools that do not transform or analyze scientific data: listing, status checks, workspace inspection, file export. If a tool calls log_tool_usage() but only reads (not transforms), it is still UTILITY — the provenance: False declaration is what matters.
When to use DELEGATE
Use DELEGATE only for inter-agent handoff tools created by graph.py's _create_lazy_delegation_tool. These are auto-tagged at creation time; you do not need to assign them manually.
Quick Decision Table
| If your tool... | Category |
|---|---|
| Reads files from disk or downloads data into AnnData | IMPORT |
| Calculates QC metrics, checks data fitness, detects artifacts | QUALITY |
| Removes rows/columns, subsets observations or features | FILTER |
| Normalizes, batch corrects, scales, imputes, reshapes values | PREPROCESS |
| Clusters, embeds, runs statistics, computes trajectories | ANALYZE |
| Assigns labels using ontologies, references, or ID mappings | ANNOTATE |
| Hands off to a child agent | DELEGATE |
| Combines results from multiple analyses into interpretation | SYNTHESIZE |
| Lists datasets, shows status, exports files (read-only ops) | UTILITY |
| Executes arbitrary user code | CODE_EXEC |