01 logo

Single-Cell RNA-seq Data Analysis Beyond H5ad

Single-cell Data Storage: Currently Used Formats

By ElucidataPublished about a year ago 3 min read

Single cell RNA-sequencing (scRNA-seq) technology has come a long way since its first successful implementation in 2009. This technology has provided unprecedented insights into biological processes and improved our understanding at the cellular level. In the last decade, various scRNA-seq technologies have been developed and revolutionized sample collection, single-cell capture, barcoded reverse transcription, cDNA amplification, library preparation, sequencing, and streamlined bioinformatics analysis.

With the advent of more scRNA-seq methods, a proportional increase in the data generated has occurred, however, there is a disproportionate advancement in the analysis tools and techniques for single-cell RNA sequencing data.

In this blog, we’ll take a look at some of the tools and formats that are available for single-cell RNA sequencing and single-cell omics data, why H5ad is a preferred format, and why the Python-Anndata-H5ad ecosystem is widely adopted.

Currently, a Gene Cluster Text (GCT) File format or “.gct” from the Broad Institute is one of the most standard formats for storing processed gene expression data and metadata. However, the GCT format is not well suited for storing higher dimensional data such as scRNA-seq. The sparse nature and higher sample count (number of cells captured) make GCT an unsuitable format for single-cell omics data. To address this, research groups have tried to solve the on-disk data storage problem for single-cell data with a few formats. Some of the most commonly used formats are:

h5Seurat: by Paul Hoffman from Satija Lab for storing Seurat object on disk as a file that can be read as an S4 object.

RDS: A serialization-supported format in R that can store any R object. RDS is used to store Seurat and Single Cell Experiment Objects in R.

Loom: A hdf5-based file format with i/o support in R and Python. It can also be read as an S4 object in R.

H5ad: hdf5-based file format developed by Theislab with extensive support in Python.

The format that comes closest to being widely adopted owing to being a persistent on-disk storage format is H5ad format. The H5ad format is based on the standard h5 format, a Hierarchical Data Formats (HDF) used to store large amounts of data in the form of multidimensional arrays. The H5 format is primarily used to store scientific data that is well-organized for quick retrieval and analysis. There is a host of interactive tools available in Python to process, analyze and visualize data - Scanpy, MUON, Strem, etc., - in the H5ad format and this plays a major role in wide adaptability for this format. Further, to support and consume data downstream once it is stored in an H5ad format, the anndata data structure and the Python-Anndata-H5ad ecosystem are used. Now, we’ll delve into the features of the anndata data structure and the Python-Anndata-H5ad ecosystem one by one.

Why Anndata?

Anndata (a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray) is a reasonably popular data structure with good community adoption. At the time of writing this document, anndata has about 2M downloads in total and 51K downloads/month, 345 Github stars, and 1K dependent repositories.

There are multiple tools for analysis and visualization in Python that rely on the anndata structure:

Cell Oracle: to understand Gene Regulatory Networks (GRNs) and perform in silico gene perturbations to simulate cell fate changes.

Stream: for trajectory analysis for scRNA-seq data.

MUON: for multi-modal scRNA-seq analysis toolkit built with the support of anndata.

CellxGene: the state of art visualization application by CZI for scRNA-seq data.

SquidPy: Spatial Omics analysis and visualization for scRNA-seq data.

The Python-Anndata-H5ad ecosystem now comes into the picture as it supports an extensive set of tools that can be used to process data in an H5ad format downstream.

apps

About the Creator

Elucidata

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2026 Creatd, Inc. All Rights Reserved.