Omics-OS Docs
Advanced

Publication Content Access & Provider Architecture

Version: 2.4.0+ (Phase 1-6 Refactoring Complete) Status: Production-ready Implementation: ContentAccessService with Provider Infrastructure (Janu...

Version: 2.4.0+ (Phase 1-6 Refactoring Complete) Status: Production-ready Implementation: ContentAccessService with Provider Infrastructure (January 2025)

Overview

The ContentAccessService provides intelligent publication and dataset access through a capability-based provider architecture. This system replaced the legacy PublicationService and UnifiedContentService, delivering modular provider infrastructure, three-tier content cascade, and comprehensive literature mining capabilities.

What Changed?

Before (UnifiedContentService - Phase 3, Archived):

  • ❌ Direct provider delegation without capability routing
  • ❌ Manual provider selection logic in service code
  • ❌ Limited to 3 providers (Abstract, PMC, Webpage)
  • ❌ No dataset discovery capabilities
  • ❌ No validation or metadata extraction tools

After (ContentAccessService - Phase 2+, Current):

  • Provider Registry: Capability-based routing with priority system
  • 5 Specialized Providers: Abstract, PubMed, GEO, PMC, Webpage (with Docling)
  • 10 Core Methods: Discovery (3), Metadata (2), Content (3), System (1), Validation (1)
  • Three-Tier Cascade: PMC XML → Webpage → PDF with automatic fallback
  • Dataset Integration: GEO/SRA/PRIDE dataset discovery and validation
  • Session Caching: DataManager-first with W3C-PROV provenance

Performance Impact

MetricUnifiedContentServiceContentAccessServiceImprovement
Abstract Retrieval200-500ms (AbstractProvider)200-500ms (AbstractProvider)Same (optimized path)
PMC Full-Text500ms-2s (PMCProvider)500ms-2s (PMCProvider priority)Same (10x faster than HTML)
Dataset DiscoveryN/A (not available)2-5s (GEOProvider)New capability
Literature SearchN/A (not available)1-3s (PubMedProvider)New capability
Provider SelectionManual logicAutomatic routingBetter maintainability
ExtensibilityHard-coded providersRegistry-basedEasy to add providers

Architecture

Capability-Based Provider System

┌─────────────────────────────────────────────────────────────┐
│                    ContentAccessService                     │
│                   (Coordination Layer)                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  10 Core Methods:                                           │
│  ┌───────────────────────────────────────────────────┐     │
│  │ Discovery (3):                                    │     │
│  │  - search_literature                              │     │
│  │  - discover_datasets                              │     │
│  │  - find_linked_datasets                           │     │
│  │                                                    │     │
│  │ Metadata (2):                                     │     │
│  │  - extract_metadata                               │     │
│  │  - validate_metadata                              │     │
│  │                                                    │     │
│  │ Content (3):                                      │     │
│  │  - get_abstract                                   │     │
│  │  - get_full_content                               │     │
│  │  - extract_methods                                │     │
│  │                                                    │     │
│  │ System (1):                                       │     │
│  │  - query_capabilities                             │     │
│  └───────────────────────────────────────────────────┘     │
│                         ↓                                   │
│                  ProviderRegistry                           │
│              (Capability-Based Routing)                     │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    Provider Layer                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Provider 1: AbstractProvider (Priority: 10)               │
│  └─ Capability: GET_ABSTRACT                               │
│     Performance: 200-500ms                                  │
│                                                             │
│  Provider 2: PubMedProvider (Priority: 10)                 │
│  └─ Capabilities: SEARCH_LITERATURE, FIND_LINKED_DATASETS, │
│                   EXTRACT_METADATA                          │
│     Performance: 1-3s                                       │
│                                                             │
│  Provider 3: GEOProvider (Priority: 10)                    │
│  └─ Capabilities: DISCOVER_DATASETS, EXTRACT_METADATA,     │
│                   VALIDATE_METADATA                         │
│     Performance: 2-5s                                       │
│                                                             │
│  Provider 4: PMCProvider (Priority: 10)                    │
│  └─ Capability: GET_FULL_CONTENT (PMC XML API)            │
│     Performance: 500ms-2s (PRIORITY PATH)                  │
│                                                             │
│  Provider 5: WebpageProvider (Priority: 50)                │
│  └─ Capabilities: GET_FULL_CONTENT (Webpage + PDF)        │
│     Performance: 2-8s (FALLBACK)                           │
│     Uses: DoclingService (internal composition)            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                   DataManagerV2                             │
│              (Session Caching + Provenance)                 │
└─────────────────────────────────────────────────────────────┘

System Design

User → research_agent (10 tools)

    ContentAccessService (10 methods)

    ProviderRegistry (capability routing)

    ┌──────┴───────────────────┐
    ↓         ↓         ↓       ↓         ↓
Abstract  PubMed    GEO     PMC    Webpage
Provider  Provider  Provider Provider Provider
    ↓         ↓         ↓       ↓         ↓
 NCBI     PubMed   GEO API  PMC XML  Docling
E-utils    API              API      Service

                                  (Webpage + PDF)

Key Components

1. ContentAccessService (Coordination Layer)

Location: lobster/tools/content_access_service.py

Responsibilities:

  • Method routing to appropriate providers via ProviderRegistry
  • Capability-based provider selection
  • DataManager-first caching coordination
  • Error handling and fallback orchestration
  • W3C-PROV provenance tracking
  • Lightweight IR (Intermediate Representation) for non-exportable research operations

Public API (10 Methods):

Discovery (3 methods):

def search_literature(
    self,
    query: str,
    max_results: int = 5,
    sources: Optional[list[str]] = None,
    filters: Optional[dict[str, any]] = None
) -> Tuple[str, Dict[str, Any], AnalysisStep]:
    """Search PubMed, bioRxiv, medRxiv for literature."""

def discover_datasets(
    self,
    query: str,
    dataset_type: "DatasetType",
    max_results: int = 5,
    filters: Optional[dict[str, str]] = None
) -> Tuple[str, Dict[str, Any], AnalysisStep]:
    """Search GEO, SRA, PRIDE for omics datasets."""

def find_linked_datasets(
    self,
    identifier: str,
    dataset_types: Optional[list["DatasetType"]] = None,
    include_related: bool = True
) -> str:
    """Find datasets linked to a publication."""

Metadata (2 methods):

def extract_metadata(
    self,
    identifier: str,
    source: Optional[str] = None
) -> Union["PublicationMetadata", str]:
    """Extract publication/dataset metadata."""

def validate_metadata(
    self,
    dataset_id: str,
    required_fields: Optional[List[str]] = None,
    required_values: Optional[Dict[str, List[str]]] = None,
    threshold: float = 0.8
) -> str:
    """Validate dataset metadata completeness."""

Content (3 methods):

def get_abstract(
    self,
    identifier: str,
    force_refresh: bool = False
) -> dict[str, any]:
    """Tier 1: Fast abstract retrieval (200-500ms)."""

def get_full_content(
    self,
    source: str,
    prefer_webpage: bool = True,
    keywords: Optional[list[str]] = None,
    max_paragraphs: int = 100,
    max_retries: int = 2
) -> dict[str, any]:
    """Tier 2: Full content with PMC-first cascade."""

def extract_methods(
    self,
    content_result: dict[str, any],
    llm: Optional[any] = None,
    include_tables: bool = True
) -> dict[str, any]:
    """Extract structured methods from content."""

System (1 method):

def query_capabilities(self) -> str:
    """Query available providers and capabilities."""

2. ProviderRegistry (Routing Layer)

Location: lobster/tools/providers/provider_registry.py

Responsibilities:

  • Provider registration and lifecycle management
  • Capability-based routing to best-fit provider
  • Priority-based provider ordering
  • Dataset type mapping to providers
  • Capability matrix generation for debugging

Key Methods:

def register_provider(self, provider: BaseProvider) -> None:
    """Register a provider with its capabilities."""

def get_providers_for_capability(
    self,
    capability: ProviderCapability
) -> List[BaseProvider]:
    """Get all providers supporting a capability (sorted by priority)."""

def get_provider_for_dataset_type(
    self,
    dataset_type: DatasetType
) -> Optional[BaseProvider]:
    """Get provider for specific dataset type."""

def get_capability_matrix(self) -> str:
    """Generate debug matrix of providers and capabilities."""

3. Provider Layer (Specialized Data Access)

Provider Architecture:

# Base provider interface
class BaseProvider(ABC):
    name: str
    priority: int  # Lower = higher priority (10 = high, 50 = low)
    capabilities: Set[ProviderCapability]
    supported_dataset_types: Set[DatasetType]

    @abstractmethod
    def search_publications(
        self,
        query: str,
        max_results: int = 5,
        filters: Optional[dict] = None
    ) -> str:
        """Search for publications/datasets."""

5 Registered Providers:

ProviderPriorityCapabilitiesPerformanceCoverage
AbstractProvider10 (high)GET_ABSTRACT200-500msAll PubMed
PubMedProvider10 (high)SEARCH_LITERATURE, FIND_LINKED_DATASETS, EXTRACT_METADATA1-3sAll PubMed indexed
GEOProvider10 (high)DISCOVER_DATASETS, EXTRACT_METADATA, VALIDATE_METADATA2-5sAll GEO/SRA datasets
PMCProvider10 (high)GET_FULL_CONTENT500ms-2s30-40% (NIH-funded + open access)
WebpageProvider50 (low)GET_FULL_CONTENT2-8sMajor publishers + PDFs

Provider Details:

AbstractProvider (Fast Path):

# Location: lobster/tools/providers/abstract_provider.py
class AbstractProvider(BaseProvider):
    """Fast abstract retrieval via NCBI E-utilities."""

    capabilities = {ProviderCapability.GET_ABSTRACT}
    priority = 10  # High priority (fast)

    def get_abstract(self, identifier: str) -> PublicationMetadata:
        """Retrieve abstract metadata without full-text download."""

PubMedProvider (Literature & Linking):

# Location: lobster/tools/providers/pubmed_provider.py
class PubMedProvider(BaseProvider):
    """PubMed literature search and dataset linking."""

    capabilities = {
        ProviderCapability.SEARCH_LITERATURE,
        ProviderCapability.FIND_LINKED_DATASETS,
        ProviderCapability.EXTRACT_METADATA,
    }
    priority = 10

    def search_publications(self, query: str, **kwargs) -> str:
        """Search PubMed with E-utilities."""

    def find_datasets_from_publication(self, identifier: str) -> str:
        """Find GEO/SRA datasets linked via PubMed."""

GEOProvider (Dataset Discovery):

# Location: lobster/tools/providers/geo_provider.py
class GEOProvider(BaseProvider):
    """GEO dataset discovery and validation."""

    capabilities = {
        ProviderCapability.DISCOVER_DATASETS,
        ProviderCapability.EXTRACT_METADATA,
        ProviderCapability.VALIDATE_METADATA,
    }
    supported_dataset_types = {DatasetType.GEO}
    priority = 10

    def search_publications(self, query: str, **kwargs) -> str:
        """Search GEO datasets."""

    def search_by_accession(
        self,
        accession: str,
        include_parent_series: bool = False
    ) -> str:
        """Direct accession lookup with enhanced GSM handling."""

PMCProvider (Priority Full-Text):

# Location: lobster/tools/providers/pmc_provider.py
class PMCProvider(BaseProvider):
    """PMC full-text extraction via XML API (PRIORITY PATH)."""

    capabilities = {ProviderCapability.GET_FULL_CONTENT}
    priority = 10  # High priority (10x faster than webpage)

    def extract_full_text(self, identifier: str) -> PMCFullTextResult:
        """
        Extract full-text from PMC XML with semantic tags.

        Benefits:
        - 10x faster (500ms vs 2-5s HTML scraping)
        - 95% accuracy for methods extraction
        - 100% table parsing success
        - Structured sections with <sec sec-type=\"methods\">
        - 30-40% coverage (NIH-funded + open access)
        """

WebpageProvider (Fallback Path):

# Location: lobster/tools/providers/webpage_provider.py
class WebpageProvider(BaseProvider):
    """Webpage scraping and PDF extraction (FALLBACK)."""

    capabilities = {ProviderCapability.GET_FULL_CONTENT}
    priority = 50  # Low priority (slower fallback)

    def __init__(self, data_manager: DataManagerV2):
        self.docling_service = DoclingService(data_manager)  # Composition

    def extract_content(
        self,
        url: str,
        keywords: Optional[List[str]] = None,
        max_paragraphs: int = 100
    ) -> dict:
        """
        Extract content via webpage or PDF (uses DoclingService).

        Automatically detects format and routes to appropriate parser.
        """

DoclingService (Internal, Not Registered):

  • Used internally by WebpageProvider via composition
  • Not registered as separate provider
  • Handles both webpage HTML and PDF parsing
  • Structure-aware parsing with table extraction

Three-Tier Content Cascade

The system implements intelligent fallback for full-text retrieval:

Cascade Flow

User Request: get_full_content("PMID:35042229")

Step 1: Check DataManager cache
    ├─ Cache hit? → Return immediately (<100ms)
    └─ Cache miss → Continue to Tier 1

Tier 1: PMC XML API (Priority 10)
    ├─ Provider: PMCProvider
    ├─ Duration: 500ms-2s
    ├─ Coverage: 30-40% of biomedical literature
    ├─ Success? → Cache + Return ✅
    └─ PMCNotAvailableError → Continue to Tier 2

Tier 2: Resolve to URL (if identifier)
    ├─ Use PublicationResolver
    ├─ PMID/DOI → Accessible URL
    ├─ Check accessibility
    └─ If paywalled → Return error with suggestions

Tier 3: Webpage/PDF Extraction (Priority 50)
    ├─ Provider: WebpageProvider
    ├─ Auto-detect: Webpage HTML or PDF
    ├─ Duration: 2-8s
    ├─ Uses: DoclingService internally
    ├─ Success? → Cache + Return ✅
    └─ Failure → Return error

Performance Characteristics

TierPathDurationSuccess RateCoverage
CacheDataManager lookup<100ms100% (if cached)Previously accessed
Tier 1PMC XML API500ms-2s95%30-40% (open access)
Tier 2URL ResolutionVariable70-80%Depends on accessibility
Tier 3Webpage/PDF2-8s70%Major publishers + preprints

Code Example

from lobster.tools.content_access_service import ContentAccessService

service = ContentAccessService(data_manager)

# Automatic three-tier cascade
content = service.get_full_content("PMID:35042229")

# Check which tier was used
print(f"Tier used: {content['tier_used']}")
# Possible values:
# - 'full_cached' (cache hit)
# - 'full_pmc_xml' (Tier 1: PMC)
# - 'full_webpage' (Tier 3: webpage HTML)
# - 'full_pdf' (Tier 3: PDF via Docling)

print(f"Source type: {content['source_type']}")
print(f"Extraction time: {content['extraction_time']:.2f}s")
print(f"Content length: {len(content['content'])} characters")

Method Categories & Usage

Discovery Methods (3)

search_literature()

Search PubMed, bioRxiv, medRxiv for publications.

Example:

results, stats, ir = service.search_literature(
    query="BRCA1 breast cancer",
    max_results=10,
    sources=["pubmed"],  # Optional: filter to specific sources
    filters={"publication_year": "2023"}  # Optional: date filters
)

print(f"Found {stats['results_count']} papers")
print(f"Provider: {stats['provider_used']}")  # PubMedProvider
print(f"Time: {stats['execution_time_ms']}ms")

discover_datasets()

Search for omics datasets with automatic accession detection.

Example:

# Direct accession (auto-detected)
results, stats, ir = service.discover_datasets(
    query="GSM6204600",  # GEO sample ID
    dataset_type=DatasetType.GEO
)

# Text search
results, stats, ir = service.discover_datasets(
    query="single-cell RNA-seq breast cancer",
    dataset_type=DatasetType.GEO,
    max_results=5
)

print(f"Found {stats['results_count']} datasets")
print(f"Accession detected: {stats.get('accession_detected', False)}")

find_linked_datasets()

Find datasets associated with a publication.

Example:

results = service.find_linked_datasets(
    identifier="PMID:35042229",
    dataset_types=[DatasetType.GEO, DatasetType.SRA]
)

print(results)  # Formatted list of linked datasets

Metadata Methods (2)

extract_metadata()

Extract publication or dataset metadata.

Example:

# Publication metadata
metadata = service.extract_metadata("PMID:35042229")

print(f"Title: {metadata.title}")
print(f"Authors: {metadata.authors}")
print(f"Abstract: {metadata.abstract[:200]}...")

# Dataset metadata
metadata = service.extract_metadata("GSE180759", source="geo")

validate_metadata()

Validate dataset metadata completeness before download.

Example:

report = service.validate_metadata(
    dataset_id="GSE180759",
    required_fields=["smoking_status", "treatment_response"],
    threshold=0.8  # 80% of samples must have fields
)

print(report)
# Formatted validation report with:
# - Completeness scores
# - Missing fields
# - Sample coverage
# - Recommendations (PROCEED/COHORT/SKIP)

Content Methods (3)

get_abstract()

Fast abstract retrieval (Tier 1: 200-500ms).

Example:

abstract = service.get_abstract("PMID:35042229")

print(f"Title: {abstract['title']}")
print(f"Authors: {abstract['authors']}")
print(f"Abstract: {abstract['abstract']}")
print(f"Keywords: {abstract['keywords']}")

get_full_content()

Full-text extraction with three-tier cascade.

Example:

# Automatic cascade: PMC → Webpage → PDF
content = service.get_full_content("PMID:35042229")

print(f"Tier used: {content['tier_used']}")
print(f"Methods section: {content.get('methods_text', 'N/A')[:200]}...")
print(f"Tables: {content['metadata']['tables']}")
print(f"Software detected: {content['metadata']['software']}")

extract_methods()

Extract structured methods from full content.

Example:

# Get full content first
content = service.get_full_content("PMID:35042229")

# Extract methods
methods = service.extract_methods(content, include_tables=True)

print(f"Software: {methods['software_used']}")
print(f"GitHub repos: {methods['github_repos']}")

System Methods (1)

query_capabilities()

Query available providers and their capabilities.

Example:

capabilities = service.query_capabilities()

print(capabilities)
# Returns formatted matrix showing:
# - Available operations
# - Registered providers
# - Supported dataset types
# - Performance tiers
# - Cascade logic

Integration with Research Agent

The research_agent uses ContentAccessService through 10 tools:

Tool Mapping

Agent ToolContentAccessService MethodCategory
search_literaturesearch_literature()Discovery
fast_dataset_searchdiscover_datasets()Discovery
find_related_entriesfind_linked_datasets()Discovery
get_dataset_metadataextract_metadata()Metadata
fast_abstract_searchget_abstract()Content
read_full_publicationget_full_content()Content
extract_methodsextract_methods()Content
validate_dataset_metadatavalidate_metadata()Metadata

Example Agent Workflow

# User: "Find breast cancer datasets with smoking status"

# Step 1: Literature search (PubMedProvider)
results, stats, ir = service.search_literature("breast cancer smoking")

# Step 2: Discover datasets (GEOProvider)
datasets, stats, ir = service.discover_datasets(
    "breast cancer",
    DatasetType.GEO,
    filters={"organism": "human"}
)

# Step 3: Validate metadata (GEOProvider)
report = service.validate_metadata(
    "GSE180759",
    required_fields=["smoking_status"]
)

# Step 4: Get full publication (PMC → Webpage → PDF cascade)
content = service.get_full_content("PMID:35042229")

# All operations tracked in W3C-PROV provenance

Performance Benchmarks

Benchmark Metadata:

  • Date Measured: 2025-01-15
  • Lobster Version: v0.2.0
  • Network: Residential broadband (100 Mbps)
  • Sample Size: 100 operations per provider
  • Test Conditions: Mixed cache hit/miss scenarios

Provider Performance

ProviderOperationMean DurationP95P99Success Rate
AbstractProviderget_abstract()350ms450ms500ms95%+
PubMedProvidersearch_literature()2.1s3.5s5s99%+
GEOProviderdiscover_datasets()3.2s4.8s6s95%+
PMCProviderget_full_content()1.2s2s2.5s95% (of eligible)
WebpageProviderget_full_content()4.5s7s10s70-80%

Note: Performance varies with network conditions and external API load. P95/P99 represent 95th and 99th percentile latencies.

Cascade Performance

ScenarioTier UsedDurationFrequency
Cache hitCache<100msHigh (repeated access)
PMC availableTier 1500ms-2s30-40% of requests
PMC unavailableTier 32-8s60-70% of requests
PaywalledErrorVariable10-15% of requests

Optimization Strategies

  1. DataManager-first caching - All operations check cache before API calls
  2. Capability-based routing - Optimal provider selected automatically
  3. Priority ordering - Fast providers tried first (Priority 10 before 50)
  4. Graceful degradation - Automatic fallback on provider failures
  5. Session persistence - Workspace caching for handoffs

DataManager-First Caching

All caching goes through DataManagerV2 (architectural requirement).

Cache Flow

Service Method Call

1. Check DataManager cache
    ├─ Cache hit? → Return immediately
    └─ Cache miss → Continue

2. Execute provider operation
    ├─ Success? → Store in DataManager + Return
    └─ Error? → Return error (no cache)

3. DataManager stores:
    ├─ In-memory cache (session-scoped)
    ├─ Workspace filesystem (persistent)
    └─ W3C-PROV provenance log

Cache Methods

# ContentAccessService automatically caches all operations

# Cache publication content
data_manager.cache_publication_content(
    identifier="PMID:38448586",
    content=content_result,
    format="json"
)

# Retrieve cached content
cached = data_manager.get_cached_publication("PMID:38448586")

# Cache location
# ~/.lobster/literature_cache/{identifier}.json

Troubleshooting

Issue: "No providers available for capability"

Symptom:

ERROR: No available providers for literature search.

Cause: Provider not registered or capability not declared.

Solution:

# Check capability matrix
capabilities = service.query_capabilities()
print(capabilities)

# Verify provider registration
providers = service.registry.get_all_providers()
print(f"Registered providers: {len(providers)}")

Issue: PMC Full-Text Not Available

Symptom:

INFO: PMC full text not available for PMID:12345, falling back...

Cause: Paper not in PMC open access collection (70% of papers).

Expected: Automatic fallback to Tier 3 (Webpage/PDF).

Verification:

content = service.get_full_content("PMID:12345")
print(f"Tier used: {content['tier_used']}")  # Should be 'full_webpage' or 'full_pdf'

Issue: Dataset Validation Failed

Symptom:

WARNING: Dataset GSE12345 missing required metadata

Solution:

# Check validation report
report = service.validate_metadata(
    "GSE12345",
    required_fields=["condition", "sample_id"]
)
print(report)

# Review recommendations:
# - PROCEED: Full integration possible
# - COHORT: Cohort-level only
# - SKIP: Insufficient metadata

Best Practices

1. Use Capability-Based Routing

✅ GOOD: Let the registry route

# System automatically selects PubMedProvider
results, stats, ir = service.search_literature("BRCA1")

❌ BAD: Manual provider selection

# Don't access providers directly
provider = service.registry.get_provider_for_capability(...)

2. Leverage Three-Tier Cascade

✅ GOOD: Trust the cascade

# Automatically tries PMC → Webpage → PDF
content = service.get_full_content("PMID:35042229")

❌ BAD: Force specific tier

# Don't try to manually control cascade

3. Validate Before Download

✅ GOOD: Pre-download validation

# Check metadata first
report = service.validate_metadata("GSE180759", required_fields=["condition"])

if "PROCEED" in report:
    # Then download dataset
    pass

4. Check Capabilities

✅ GOOD: Query capabilities first

# Check what's available
capabilities = service.query_capabilities()
print(capabilities)

Version History

v0.2.0 (January 2025) - Phase 1-6 Complete:

  • ✅ Phase 1: Provider infrastructure (5 providers)
  • ✅ Phase 2: ContentAccessService consolidation (10 methods)
  • ✅ Phase 3: metadata_assistant agent (4 tools)
  • ✅ Phase 4: research_agent enhancements (10 tools)
  • ✅ Phase 5: Multi-agent handoff patterns (3 workflows)
  • ✅ Phase 6: Integration testing (127 tests, 3988 lines)
  • Added: ProviderRegistry with capability-based routing
  • Added: GEOProvider for dataset discovery
  • Added: Validation and metadata standardization
  • Enhanced: Three-tier cascade with PMC priority
  • Deprecated: UnifiedContentService (archived)
  • Deprecated: PublicationService (replaced)

v0.2.0 (January 2025) - Phase 3:

  • ✅ UnifiedContentService (coordination layer)
  • ✅ PMC-first access strategy
  • ✅ DoclingService integration
  • ✅ PublicationIntelligenceService deletion

v0.2.0 (November 2024):

  • Initial: PublicationIntelligenceService with Docling

References

  • ContentAccessService API: See 16-services-api.md
  • Provider Architecture: Source code in lobster/tools/providers/
  • Research Agent: See 15-agents-api.md
  • Metadata Assistant: Phase 3 documentation in code
  • Integration Tests: tests/integration/test_*_real_api.py (127 tests)

Next Steps:

On this page