Cloud PlatformUser Guide
Data Formats
Supported file formats and data types for Omics-OS Cloud
Supported Data Formats
Omics-OS Cloud supports a wide range of bioinformatics file formats.
Single-Cell Data
| Format | Extension | Description | Max Size |
|---|---|---|---|
| AnnData | .h5ad | Scanpy/AnnData format (recommended) | 500MB |
| 10X Genomics | .h5 | CellRanger output | 500MB |
| 10X MTX | .mtx.gz + .tsv.gz | Sparse matrix + barcodes/features | 500MB |
| Seurat RDS | .rds | R Seurat object (converted to AnnData) | 500MB |
| Loom | .loom | Loompy format | 500MB |
Recommended: Use .h5ad (AnnData) format for best compatibility. Convert from Seurat with SaveH5Seurat() or from R with zellkonverter.
10X Genomics Directory Structure
Upload a ZIP containing the standard CellRanger output:
sample_filtered_feature_bc_matrix/
├── matrix.mtx.gz
├── barcodes.tsv.gz
└── features.tsv.gzBulk RNA-seq
| Format | Extension | Description | Max Size |
|---|---|---|---|
| Count Matrix | .csv, .tsv | Genes (rows) x Samples (columns) | 100MB |
| DESeq2 Object | .rds | R DESeqDataSet | 100MB |
| Excel | .xlsx | Count matrix with gene IDs | 50MB |
Count Matrix Format
gene_id,sample1,sample2,sample3,sample4
ENSG00000141510,1234,1456,1123,1345
ENSG00000134323,567,623,589,612
ENSG00000157764,89,102,95,88Requirements:
- First column: Gene IDs (Ensembl, HGNC, or Entrez)
- Header row: Sample names
- Values: Raw counts (integers), not normalized
Proteomics
| Format | Extension | Description | Max Size |
|---|---|---|---|
| Spectronaut | .tsv | Spectronaut report export | 200MB |
| DIA-NN | .tsv | DIA-NN main output | 200MB |
| MaxQuant | proteinGroups.txt | MaxQuant protein groups | 200MB |
| Olink | .xlsx | Olink NPX data | 50MB |
| Generic | .csv | Protein x Sample matrix | 100MB |
Spectronaut Export
Export from Spectronaut with these columns:
PG.ProteinGroups— Protein identifiersPG.Genes— Gene symbols[Sample].PG.Quantity— Quantification per sample
Metadata
| Format | Extension | Description | Max Size |
|---|---|---|---|
| Sample Sheet | .csv, .tsv | Sample metadata | 10MB |
| Excel | .xlsx | Sample metadata | 10MB |
Sample Metadata Format
sample_id,condition,batch,sex,age
sample1,control,batch1,M,45
sample2,control,batch1,F,52
sample3,treatment,batch2,M,48
sample4,treatment,batch2,F,51Requirements:
sample_idcolumn matching count matrix headers- Condition/group column for comparisons
- Optional: batch, covariates for correction
Database Accessions
Instead of uploading files, provide accession numbers:
| Database | Format | Example |
|---|---|---|
| GEO | GSE* | GSE198765 |
| SRA | SRR* / SRP* | SRR12345678 |
| ArrayExpress | E-MTAB-* | E-MTAB-12345 |
| PRIDE | PXD* | PXD012345 |
You: Download GEO dataset GSE198765
[Data Expert Agent]
Downloading GSE198765...
- Title: "Single-cell RNA-seq of human pancreatic islets"
- Samples: 12
- Platform: 10X Genomics
- Size: 234MB
Download complete. Loaded as 'gse198765.h5ad'File Size Limits
| Tier | Per File | Total Storage |
|---|---|---|
| Trial | 100MB | 500MB |
| Starter | 500MB | 10GB |
| Professional | 1GB | 100GB |
| Enterprise | Custom | Custom |
Large datasets: For files over 1GB, contact support@omics-os.com for enterprise options or consider preprocessing locally first.
Compression
Compressed files are automatically extracted:
.gz— gzip compression.zip— ZIP archives.tar.gz— Tarball archives
Troubleshooting
"Unsupported format"
Check that your file:
- Has a supported extension
- Is not corrupted (try opening locally first)
- Uses UTF-8 encoding for text files
"File too large"
Options:
- Compress the file (
.gz) - Subset to fewer samples/genes
- Upgrade to a higher tier
- Use database accession (GEO/SRA) instead
"Missing gene IDs"
For count matrices, ensure:
- First column contains gene identifiers
- Use Ensembl, HGNC symbols, or Entrez IDs
- No duplicate gene IDs
"Sample mismatch"
When uploading metadata:
- Sample IDs must exactly match count matrix headers
- Check for whitespace or case differences
- Ensure no missing samples