Omics-OS Docs
Cloud PlatformUser Guide

Data Formats

Supported file formats and data types for Omics-OS Cloud

Supported Data Formats

Omics-OS Cloud supports a wide range of bioinformatics file formats.

Single-Cell Data

FormatExtensionDescriptionMax Size
AnnData.h5adScanpy/AnnData format (recommended)500MB
10X Genomics.h5CellRanger output500MB
10X MTX.mtx.gz + .tsv.gzSparse matrix + barcodes/features500MB
Seurat RDS.rdsR Seurat object (converted to AnnData)500MB
Loom.loomLoompy format500MB

Recommended: Use .h5ad (AnnData) format for best compatibility. Convert from Seurat with SaveH5Seurat() or from R with zellkonverter.

10X Genomics Directory Structure

Upload a ZIP containing the standard CellRanger output:

sample_filtered_feature_bc_matrix/
├── matrix.mtx.gz
├── barcodes.tsv.gz
└── features.tsv.gz

Bulk RNA-seq

FormatExtensionDescriptionMax Size
Count Matrix.csv, .tsvGenes (rows) x Samples (columns)100MB
DESeq2 Object.rdsR DESeqDataSet100MB
Excel.xlsxCount matrix with gene IDs50MB

Count Matrix Format

gene_id,sample1,sample2,sample3,sample4
ENSG00000141510,1234,1456,1123,1345
ENSG00000134323,567,623,589,612
ENSG00000157764,89,102,95,88

Requirements:

  • First column: Gene IDs (Ensembl, HGNC, or Entrez)
  • Header row: Sample names
  • Values: Raw counts (integers), not normalized

Proteomics

FormatExtensionDescriptionMax Size
Spectronaut.tsvSpectronaut report export200MB
DIA-NN.tsvDIA-NN main output200MB
MaxQuantproteinGroups.txtMaxQuant protein groups200MB
Olink.xlsxOlink NPX data50MB
Generic.csvProtein x Sample matrix100MB

Spectronaut Export

Export from Spectronaut with these columns:

  • PG.ProteinGroups — Protein identifiers
  • PG.Genes — Gene symbols
  • [Sample].PG.Quantity — Quantification per sample

Metadata

FormatExtensionDescriptionMax Size
Sample Sheet.csv, .tsvSample metadata10MB
Excel.xlsxSample metadata10MB

Sample Metadata Format

sample_id,condition,batch,sex,age
sample1,control,batch1,M,45
sample2,control,batch1,F,52
sample3,treatment,batch2,M,48
sample4,treatment,batch2,F,51

Requirements:

  • sample_id column matching count matrix headers
  • Condition/group column for comparisons
  • Optional: batch, covariates for correction

Database Accessions

Instead of uploading files, provide accession numbers:

DatabaseFormatExample
GEOGSE*GSE198765
SRASRR* / SRP*SRR12345678
ArrayExpressE-MTAB-*E-MTAB-12345
PRIDEPXD*PXD012345
You: Download GEO dataset GSE198765

[Data Expert Agent]
Downloading GSE198765...
- Title: "Single-cell RNA-seq of human pancreatic islets"
- Samples: 12
- Platform: 10X Genomics
- Size: 234MB

Download complete. Loaded as 'gse198765.h5ad'

File Size Limits

TierPer FileTotal Storage
Trial100MB500MB
Starter500MB10GB
Professional1GB100GB
EnterpriseCustomCustom

Large datasets: For files over 1GB, contact support@omics-os.com for enterprise options or consider preprocessing locally first.

Compression

Compressed files are automatically extracted:

  • .gz — gzip compression
  • .zip — ZIP archives
  • .tar.gz — Tarball archives

Troubleshooting

"Unsupported format"

Check that your file:

  1. Has a supported extension
  2. Is not corrupted (try opening locally first)
  3. Uses UTF-8 encoding for text files

"File too large"

Options:

  1. Compress the file (.gz)
  2. Subset to fewer samples/genes
  3. Upgrade to a higher tier
  4. Use database accession (GEO/SRA) instead

"Missing gene IDs"

For count matrices, ensure:

  • First column contains gene identifiers
  • Use Ensembl, HGNC symbols, or Entrez IDs
  • No duplicate gene IDs

"Sample mismatch"

When uploading metadata:

  • Sample IDs must exactly match count matrix headers
  • Check for whitespace or case differences
  • Ensure no missing samples

On this page