Omics-OS Docs

AQUADIF Tool Taxonomy

10-category taxonomy for classifying Lobster AI tools — making the system introspectable, enforceable, and teachable

AQUADIF Tool Taxonomy

AQUADIF is the 10-category taxonomy for Lobster AI tools. Every tool declares what it does (category) and whether it must produce provenance — making the system introspectable, enforceable, and teachable to coding agents.

When designing tools for a new agent, internalize these categories first. Tools should be designed through the AQUADIF lens, not retrofitted with categories afterward.

Quick Reference

CategoryDefinitionProvenance RequiredExample Tools
IMPORTLoad external data formats into workspaceYesimport_bulk_counts, load_10x_data
QUALITYAssess data integrity, calculate QC metricsYesassess_data_quality, detect_doublets
FILTERSubset data by removing samples or featuresYesfilter_cells, filter_genes
PREPROCESSTransform data representation (normalize, scale, impute)Yesnormalize_counts, integrate_batches
ANALYZEExtract patterns, statistical tests, embeddingsYesrun_pca, cluster_cells, run_differential_expression
ANNOTATEAdd biological meaning (cell types, gene labels)Yesannotate_cell_types_auto, score_gene_signatures
DELEGATEHand off to a specialist child agentNohandoff_to_annotation_expert
SYNTHESIZECombine results across analysesYes(future — no implementations yet)
UTILITYWorkspace management, status checks, exportNolist_modalities, export_results, list_files, read_file, glob_files, grep_files, shell_execute
CODE_EXECCustom code execution escape hatchConditionalexecute_custom_analysis

Provenance required (7): IMPORT, QUALITY, FILTER, PREPROCESS, ANALYZE, ANNOTATE, SYNTHESIZE

Provenance not required (3): DELEGATE, UTILITY, CODE_EXEC (conditional — required if code modifies data)

Metadata Assignment Pattern

After creating a tool with the @tool decorator, assign AQUADIF metadata immediately after the decorator closes. This must happen after the decorator, not before.

from langchain_core.tools import tool

@tool
def assess_quality(modality_name: str) -> str:
    """Assess data quality for a modality.

    Args:
        modality_name: Name of the dataset to assess

    Returns:
        Summary of QC metrics and data fitness
    """
    adata = data_manager.get_modality(modality_name)
    result, stats, ir = quality_service.assess(adata)
    data_manager.log_tool_usage("assess_quality", {"modality_name": modality_name}, stats, ir=ir)
    return f"QC complete: {stats}"

# AQUADIF metadata assignment — MUST happen AFTER @tool decorator
assess_quality.metadata = {
    "categories": ["QUALITY"],  # 1-3 categories, first = primary
    "provenance": True           # True if primary category requires provenance
}
assess_quality.tags = ["QUALITY"]  # Same as categories — required for callback propagation

Three key rules:

  1. Max 3 categories per tool — Use the primary category plus up to 2 secondary categories for substantial additional functionality. Primary category is always first and determines the provenance requirement.

  2. String literals only — Use "ANALYZE" not AquadifCategory.ANALYZE. Enum imports in tool files are unnecessary coupling.

  3. .metadata and .tags must match — LangChain callbacks receive .tags but not .metadata. Both fields must always contain the same category list.

Provenance Rules

Provenance tracks what each tool did to the data, enabling reproducibility and audit trails. Tools declare whether they produce provenance via the provenance boolean in .metadata.

Which tools require provenance

Any tool that transforms, loads, or analyzes data must call log_tool_usage() with an ir parameter (an AnalysisStep object from the service):

# CORRECT: IR passed explicitly
result, stats, ir = service.analyze(adata)
data_manager.log_tool_usage("analyze_modality", params, stats, ir=ir)

# INCORRECT: IR missing — contract tests will fail
result, stats, ir = service.analyze(adata)
data_manager.log_tool_usage("analyze_modality", params, stats)  # missing ir=

Which tools do not require provenance

DELEGATE and UTILITY tools do not log provenance. They either hand off to another agent (which tracks its own provenance) or provide read-only information:

@tool
def list_modalities() -> str:
    """List available datasets."""
    modalities = data_manager.list_modalities()
    return f"Available: {modalities}"

list_modalities.metadata = {"categories": ["UTILITY"], "provenance": False}
list_modalities.tags = ["UTILITY"]

Hollow provenance (ir=None)

When a tool is categorized as provenance-required (e.g., ANALYZE) but the service does not yet return a full AnalysisStep, pass ir=None as a bridge:

result, stats, _ = service.visualize(adata)
data_manager.log_tool_usage("create_umap", params, stats, ir=None)

create_umap.metadata = {"categories": ["ANALYZE"], "provenance": True}
create_umap.tags = ["ANALYZE"]

This satisfies the contract test's AST check (ir= keyword present) while acknowledging the provenance gap. Full IR should be wired when the service is updated.

Contract Testing

Lobster AI enforces AQUADIF compliance via automated contract tests that run on every CI push.

AgentContractTestMixin

The AgentContractTestMixin (14 test methods) validates every aspect of your agent's AQUADIF compliance. Use it by subclassing:

from lobster.testing import AgentContractTestMixin


class TestMyExpert(AgentContractTestMixin):
    """AQUADIF contract tests for my_expert agent."""

    agent_module = "lobster.agents.mydomain.my_expert"
    factory_name = "my_expert"
    is_parent_agent = True  # Set False for child agents or data-prep agents

What the 14 tests validate:

CategoryTests
Metadata presence.metadata dict exists with categories and provenance keys
Category validityAll categories are from the AQUADIF 10-category set
Category constraintsMax 3 categories per tool; no duplicates
Provenance complianceprovenance boolean matches the primary category's requirement
Provenance callProvenance-required tools contain log_tool_usage(ir=ir) call (AST check)
Tags consistency.tags matches .metadata["categories"]
Parent agent MVPParent agents have at least one IMPORT + QUALITY + (ANALYZE or DELEGATE)
Ordering bypassPrimary category is actually provenance-required (prevents ordering tricks)

Running Contract Tests

# Run contract tests for all agents (CI uses this)
pytest -m contract

# Run for a specific package
pytest -m contract -k transcriptomics

# Run with verbose output
pytest -m contract -v

For setup details and fixture patterns, see Testing.

Runtime Monitoring

The AquadifMonitor service (lobster/core/aquadif_monitor.py) tracks AQUADIF compliance at runtime. It is wired into the callback chain automatically — plugin authors do not need to interact with it directly. Your tools are monitored once .metadata and .tags are assigned.

What it tracks per session:

  • Category distribution — count of tool invocations by category
  • Provenance status — per-tool status: real_ir (genuine AnalysisStep), hollow_ir (ir=None), or missing (no provenance call observed)
  • CODE_EXEC log — bounded circular buffer of custom code executions with agent attribution
  • Session summary — structured dict for cloud observability (get_session_summary())

How it's wired:

  1. client.py constructs AquadifMonitor at session start
  2. graph.py builds a tool_name → categories lookup from all agent tools and populates the monitor
  3. TokenTrackingCallback.on_tool_start calls monitor.record_tool_invocation() (single injection point — no double-counting)
  4. DataManagerV2.log_tool_usage calls monitor.record_provenance_call() — provenance detected by observation

Design: Pure stdlib (threading, deque), fail-open (monitor errors never crash tool invocations), thread-safe, bounded data structures.

Category Decision Guide

Use these rules to choose the right category when the answer isn't obvious.

The 80% Rule

If 80% or more of a tool's logic belongs to one category, use only that category. Secondary categories should only be added for substantial additional functionality.

  • A normalization tool that also filters out zero-variance genes: PREPROCESS primary, FILTER secondary
  • A quality assessment tool that calculates metrics and makes a plot: QUALITY primary (the plot is incidental infrastructure)

Boundary Cases

FILTER vs PREPROCESS

  • FILTER: Removes data elements (rows/columns). Output has fewer elements than input.
  • PREPROCESS: Transforms values within elements. Output has same elements, but values change.

QUALITY vs ANALYZE

  • QUALITY: Assesses fitness for purpose. Answers "Is this data good enough to analyze?"
  • ANALYZE: Extracts scientific patterns. Answers "What biological structure exists in the data?"

ANALYZE vs ANNOTATE

  • ANALYZE: Computes patterns, clusters, or statistical relationships from data.
  • ANNOTATE: Assigns biological meaning using external knowledge (ontologies, references, markers).

When to use UTILITY Use UTILITY for read-only tools that do not transform or analyze scientific data: listing, status checks, workspace inspection, file export. If a tool calls log_tool_usage() but only reads (not transforms), it is still UTILITY — the provenance: False declaration is what matters.

When to use DELEGATE Use DELEGATE only for inter-agent handoff tools created by graph.py's _create_lazy_delegation_tool. These are auto-tagged at creation time; you do not need to assign them manually.

Quick Decision Table

If your tool...Category
Reads files from disk or downloads data into AnnDataIMPORT
Calculates QC metrics, checks data fitness, detects artifactsQUALITY
Removes rows/columns, subsets observations or featuresFILTER
Normalizes, batch corrects, scales, imputes, reshapes valuesPREPROCESS
Clusters, embeds, runs statistics, computes trajectoriesANALYZE
Assigns labels using ontologies, references, or ID mappingsANNOTATE
Hands off to a child agentDELEGATE
Combines results from multiple analyses into interpretationSYNTHESIZE
Lists datasets, shows status, exports files (read-only ops)UTILITY
Executes arbitrary user codeCODE_EXEC

On this page