Omics-OS Docs

Biological Database Search

Search 8 major biological databases from the command bar — UniProt, NCBI Gene, PDB, PubMed, ChEMBL, GEO, KEGG, and NCBI Nucleotide

Overview

The command bar connects to 8 major biological databases through a unified search interface. Select a tag (e.g., Protein, Gene, Structure) then type your query to search any database without leaving the canvas.

Each search result creates a canvas node tailored to its data type — 3D protein structures rendered in Mol*, gene information cards, literature citations, interactive KEGG pathway maps, and circular genome visualizations. Results are cached locally for fast repeat access and can be arranged, connected, and annotated directly on the canvas.

Quick Reference

DatabaseTagBest ForExample QueryCreates
UniProtProteinProtein info, sequences, structuresTP53, P046373D structure or InfoCard
NCBI GeneGeneGene metadata, chromosome locationBRCA1, EGFRGene InfoCard
RCSB PDBStructure3D molecular structures1CRN, ribosome3D Mol* viewer
PubMedLiteratureScientific papers, reviewsCRISPR, "gene therapy"Citation InfoCard
ChEMBLCompoundDrugs, bioactive moleculesaspirin, CHEMBL25Compound InfoCard
NCBI GEODatasetGene expression datasetsscRNA PBMCDataset InfoCard
KEGGPathwayMetabolic and signaling pathwaysglycolysis, MAPKInteractive pathway
NCBI NucleotideSequenceGenomes, plasmids, gene sequencespUC19, SARS-CoV-2Genome map

UniProt searches query the reviewed Swiss-Prot database, returning curated protein entries with function annotations, domain architecture, disease associations, and cross-references. When a protein has an associated PDB structure, the result node renders a 3D viewer. Otherwise it displays a detailed InfoCard.

Example Queries

QueryTypeExpected Result
TP53Gene symbolHuman tumor protein p53 with 3D structure
P04637UniProt accessionDirect lookup — fastest path to a specific entry
insulin receptorFree textTop matches ranked by annotation score
BRCA2_HUMANEntry nameExact entry for human BRCA2
kinase AND organism:mouseAdvancedMouse kinases filtered by organism

Tips

UniProt accession IDs (e.g., P04637, Q9Y6K9) return results instantly because they bypass full-text search and resolve directly.

  • Gene symbols like TP53 or EGFR work well for human proteins. For other organisms, append the species: TP53 mouse.
  • Partial protein names use fuzzy matching — tumor suppressor p53 finds the same entry as TP53.
  • Results include GO annotations, subcellular location, and links to external databases (PDB, InterPro, Pfam).

NCBI Gene

NCBI Gene searches return gene metadata including chromosomal location, aliases, RefSeq identifiers, and functional summaries. Results appear as Gene InfoCards with direct links to the NCBI Gene page.

Example Queries

QueryTypeExpected Result
BRCA1HGNC symbolBreast cancer type 1, chr17
EGFRHGNC symbolEpidermal growth factor receptor, chr7
HER2AliasResolves to ERBB2
7157Gene IDDirect lookup for TP53
apolipoprotein EFull nameAPOE gene card

Tips

  • Use HGNC-approved gene symbols for the most reliable results. Aliases (e.g., HER2 for ERBB2) are resolved but may return multiple matches.
  • The search defaults to human genes. Specify the organism explicitly if you need a different species: BRCA1 rat.
  • Gene IDs (numeric) perform direct lookups and are the fastest query type.

Gene InfoCards show the official symbol, full name, chromosome band, and a one-paragraph functional summary pulled from NCBI's curated RefSeq records.

PDB searches query the RCSB Protein Data Bank for experimentally determined 3D structures. Results create a MolstarNode — a fully interactive molecular viewer powered by Mol* with rotation, zoom, surface/cartoon toggles, and chain highlighting.

Example Queries

QueryTypeExpected Result
1CRNPDB IDCrambin crystal structure (direct load)
ribosomeFree textTop ribosome structures by resolution
kinase X-rayText + methodX-ray crystallography kinase structures
6LU7PDB IDSARS-CoV-2 main protease
hemoglobin NMRText + methodNMR-resolved hemoglobin structures

Tips

  • PDB IDs are exactly 4 characters (one digit followed by three alphanumeric characters, e.g., 1CRN, 6LU7). When you enter a valid PDB ID, the structure loads directly without a search step.
  • Add an experimental method to narrow results: kinase X-ray, antibody cryo-EM, peptide NMR.
  • The Mol* viewer supports multiple representations (cartoon, ball-and-stick, surface) and can highlight individual chains, ligands, or residue ranges.

Double-click any residue in the 3D viewer to center and highlight it. Right-click for options including distance measurement and surface coloring.

PubMed searches query titles, abstracts, and MeSH (Medical Subject Headings) terms across the full MEDLINE database. Results appear as Citation InfoCards showing the title, authors, journal, year, and abstract excerpt.

Example Queries

QueryTypeExpected Result
CRISPRKeywordRecent CRISPR papers ranked by relevance
"single cell sequencing"Exact phrasePapers containing the exact phrase
machine learning drug discoveryMulti-wordPapers matching all terms
PMID:32015507PMID lookupDirect retrieval of a specific paper
scRNA-seq AND pancreas NOT cancerBooleanFiltered pancreas single-cell papers

Tips

Wrap multi-word phrases in double quotes for exact matching. "gene therapy" finds papers with that exact phrase, while gene therapy matches papers containing both words anywhere in the text.

  • Boolean operators (AND, OR, NOT) work as expected. Use them to refine broad topics.
  • MeSH terms improve precision for established concepts. PubMed automatically maps common terms to their MeSH equivalents.
  • Prefix a PMID with PMID: for direct paper lookup without a search round-trip.
  • Results are sorted by relevance. The most recent and highest-cited papers appear first.

ChEMBL searches query the EMBL-EBI database of bioactive molecules with drug-like properties. Results include molecular structure, clinical development phase, mechanism of action, and target information.

Example Queries

QueryTypeExpected Result
aspirinDrug nameAcetylsalicylic acid — Approved
imatinibDrug nameGleevec — Approved, BCR-ABL inhibitor
CHEMBL25ChEMBL IDDirect lookup for aspirin
kinase inhibitorMechanismCompounds targeting kinases
CHEMBL941ChEMBL IDDirect lookup for erlotinib

Tips

  • Both common drug names and ChEMBL accession IDs work. Accession IDs (e.g., CHEMBL25) perform direct lookups.
  • Compound InfoCards display the clinical phase (Approved, Phase I-III, Preclinical), molecular formula, and key physicochemical properties.
  • Target information links compounds to their protein targets, enabling cross-referencing with UniProt results on the same canvas.

Search for a drug target protein in UniProt, then search for compounds against that target in ChEMBL. Place both nodes on the canvas to build a target-compound relationship map.

GEO searches query the Gene Expression Omnibus for publicly available gene expression, methylation, and other functional genomics datasets. Results appear as Dataset InfoCards showing the GSE accession, title, organism, sample count, and platform.

Example Queries

QueryTypeExpected Result
scRNA PBMCKeywordsSingle-cell RNA-seq datasets from PBMCs
GSE198765GSE accessionDirect lookup of a specific dataset
cancer methylation humanKeywords + organismHuman cancer methylation arrays
ATAC-seq mouse brainMethod + tissueChromatin accessibility in mouse brain
COVID-19 bulk RNA-seqDisease + methodCOVID transcriptomics datasets

Tips

  • Be specific about organism, method, and tissue type. scRNA PBMC human returns more relevant results than single cell.
  • GSE accession numbers (e.g., GSE198765) perform direct lookups and are the fastest way to retrieve a known dataset.
  • Dataset InfoCards show the number of samples and platform, helping you assess dataset suitability before downloading.

Found a dataset you want to analyze? Ask Lobster to download it directly: "Download GSE198765 and run QC." The data expert agent handles GEO downloads and format conversion automatically.

KEGG searches query the Kyoto Encyclopedia of Genes and Genomes for metabolic pathways, signaling cascades, and disease pathways. Results create a PathwayNode — an interactive, zoomable pathway map with clickable entities (genes, compounds, reactions).

Example Queries

QueryTypeExpected Result
glycolysisPathway nameGlycolysis / Gluconeogenesis (hsa00010)
MAPKPathway nameMAPK signaling pathway (hsa04010)
mTORPathway namemTOR signaling pathway (hsa04150)
hsa04110KEGG IDDirect lookup for Cell cycle pathway
purine metabolismPathway namePurine metabolism (hsa00230)

Tips

  • Use standard pathway names as they appear in KEGG. Common abbreviations (MAPK, mTOR, TCA cycle) are recognized.
  • KEGG pathway IDs (e.g., hsa04110) perform direct lookups. The hsa prefix indicates human pathways.
  • Double-click any entity (gene box, compound circle) in the pathway map to view its details or search for it in the corresponding database.
  • Pathway maps are interactive: zoom with scroll, pan by dragging, and click entities to highlight connected reactions.

For a full walkthrough of pathway navigation, entity resolution, and cross-database linking from pathway maps, see the Pathway Exploration page.

NCBI Nucleotide searches query the GenBank and RefSeq sequence databases for genomes, plasmids, chromosomes, and individual gene sequences. Results create a CgviewNode — an interactive circular or linear genome map rendered with Cgview.js, showing annotated features like genes, regulatory elements, and restriction sites.

Example Queries

QueryTypeExpected Result
pUC19Plasmid namepUC19 cloning vector (circular map)
E. coli K-12OrganismE. coli K-12 reference genome
SARS-CoV-2OrganismSARS-CoV-2 reference genome (linear)
NC_000913RefSeq accessionE. coli K-12 MG1655 genome
lambda phageCommon nameBacteriophage lambda genome

Tips

  • Well-annotated sequences (reference genomes, common plasmids) produce the richest genome maps with full feature annotations.
  • RefSeq accession numbers (e.g., NC_000913) and GenBank accessions perform direct lookups.
  • Circular maps are generated for circular molecules (plasmids, bacterial chromosomes). Linear maps are used for linear genomes and chromosomal segments.
  • Zoom into specific regions of the genome map to see individual gene annotations, reading frames, and regulatory features.

Search Behavior

Caching

Search results are cached locally to speed up repeat queries and reduce load on external databases.

Cache TypeTTLDescription
Search results60 secondsSame query returns instantly within the window
Detail results300 secondsExpanded node data (structures, full records)
Maximum entries500LRU eviction removes least-recently-used entries first

Rate Limits

Database GroupLimitNotes
NCBI (Gene, PubMed, GEO, Nucleotide)3 req/s default, 10 req/s with API keyAll NCBI databases share the same rate limit pool
UniProt, PDB, ChEMBLNo strict rate limitsFair-use policies apply
Per-user unified endpoint20 searches/minuteApplies across all databases combined

If you have an NCBI API key, configure it in your account settings to increase the NCBI rate limit from 3 to 10 requests per second. Get a free key at ncbi.nlm.nih.gov/account/settings.

Error Handling

  • If an external database is temporarily unavailable, the search returns "No results found" rather than an error. This is graceful degradation — retry after a few seconds.
  • Each database query has a timeout of 3-5 seconds. Slow upstream responses are terminated cleanly.
  • No 500 errors propagate to the UI. All failures are caught and displayed as user-friendly messages.

On this page