Skip to content

async-hdf5 (Python)

Warning — Experimental. This library is under active development and not ready for production use. The API may change without notice. Known limitations:

  • Metadata only — does not decompress or decode array data; designed for serving HDF5 via Zarr's data protocol.
  • Incomplete HDF5 coverage — object header v0, HDF5 Time datatype (class 2), virtual dataset layout, and external data links are not supported. Some compound/array dtype edge cases produce incorrect numpy dtype mappings.
  • Limited testing on real-world files — validated against the HDF5 library test suite (59% pass rate), GDAL autotest files, and a small set of NASA/NOAA data. Many exotic HDF5 features remain untested.
  • No fuzz testing — the binary parser has not been fuzz-tested against adversarial inputs. While known panics have been fixed, corrupt files may trigger unexpected errors.
  • Sparse array performance — fixed array chunk indexing reads the entire dense index into memory, which can be expensive for very large, mostly-empty datasets.

Python bindings for the async-hdf5 Rust crate. Read HDF5 file metadata asynchronously from local disk or cloud storage (S3, GCS, Azure) without libhdf5.

Install

pip install async-hdf5

Requires Python 3.11+.

Usage

Opening an HDF5 file

Any object implementing the obspec GetRangeAsync and GetRangesAsync protocols works as a store — including all obstore backends (S3, GCS, Azure, local, HTTP) and those compiled as part of async_hdf5.

import asyncio

from async_hdf5.store import LocalStore
from async_hdf5 import HDF5File

store = LocalStore()


async def inspect():
    file = await HDF5File.open(filepath, store=store)
    root = await file.root_group()

    # Group attributes
    attrs = await root.attributes()
    print(f"Title: {attrs['title']}")
    print(f"Children: {await root.children()}")

    # Dataset metadata
    ds = await root.dataset("temperature")
    print(f"\nShape: {ds.shape}")
    print(f"Dtype: {ds.numpy_dtype}")
    print(f"Chunk shape: {ds.chunk_shape}")
    print(f"Filters: {ds.filters}")

    # Chunk index — maps grid coordinates to byte ranges
    chunk_index = await ds.chunk_index()
    print(f"\nGrid shape: {chunk_index.grid_shape}")
    print(f"Number of chunks: {len(chunk_index)}")
    for loc in chunk_index:
        print(
            f"  Chunk {loc.indices}: offset={loc.byte_offset}, length={loc.byte_length}"
        )


asyncio.run(inspect())
Title: Sample temperature dataset
Children: ['lat', 'lon', 'temperature', 'time']

Shape: [12, 18, 36]
Dtype: <f4
Chunk shape: [4, 18, 36]
Filters: [{'id': 1, 'name': 'deflate', 'client_data': [4]}]

Grid shape: [3, 1, 1]
Number of chunks: 3
  Chunk [0, 0, 0]: offset=11872, length=8383
  Chunk [1, 0, 0]: offset=20255, length=8382
  Chunk [2, 0, 0]: offset=28637, length=8371

xarray backend

Open any HDF5 file as an xarray Dataset using the async_hdf5 engine:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 563, in decode_cf_datetime
    dates = _decode_datetime_with_pandas(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 494, in _decode_datetime_with_pandas
    time_unit, ref_date = _unpack_time_unit_and_ref_date(units)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 319, in _unpack_time_unit_and_ref_date
    time_unit = _netcdf_to_numpy_timeunit(time_unit)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 141, in _netcdf_to_numpy_timeunit
    {
KeyError: 'months'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/core/utils.py", line 1379, in attempt_import
    return importlib.import_module(module)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/.asdf/installs/python/3.12.10/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'cftime'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 355, in _decode_cf_datetime_dtype
    result = decode_cf_datetime(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 567, in decode_cf_datetime
    dates = _decode_datetime_with_cftime(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 380, in _decode_datetime_with_cftime
    cftime = attempt_import("cftime")
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/core/utils.py", line 1381, in attempt_import
    raise ImportError(
ImportError: The cftime package is required for working with non-standard calendars but could not be imported. Please install it with your package manager (e.g. conda or pip).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session demo; n3>", line 3, in <module>
    ds = xr.open_dataset(filepath, engine="async_hdf5")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/api.py", line 607, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/python/async_hdf5/xarray.py", line 216, in open_dataset
    return xr.open_dataset(
           ^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/api.py", line 607, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1700, in open_dataset
    ds = store_entrypoint.open_dataset(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/store.py", line 45, in open_dataset
    vars, attrs, coord_names = conventions.decode_cf_variables(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/conventions.py", line 412, in decode_cf_variables
    new_vars[k] = decode_cf_variable(
                  ^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/conventions.py", line 239, in decode_cf_variable
    var = decode_times.decode(var, name=name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 1414, in decode
    dtype = _decode_cf_datetime_dtype(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 367, in _decode_cf_datetime_dtype
    raise ValueError(msg) from err
ValueError: unable to decode time units 'months since 2020-01-01' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.
Raised while decoding variable 'time' with value <xarray.Variable (phony_dim_2: 12)> Size: 96B
[12 values with dtype=float64]
Attributes:
    units:    months since 2020-01-01

For cloud storage, pass an ObjectStore:

from async_hdf5.store import S3Store

s3 = S3Store(bucket="noaa-goes16", region="us-east-1", skip_signature=True)
ds = xr.open_dataset(
    "ABI-L2-MCMIPF/2024/099/18/OR_ABI-L2-MCMIPF-M6_G16_s20240991800204_e20240991809524_c20240991810005.nc",
    engine="async_hdf5",
    store=s3,
)

Zarr store

Under the hood, the xarray backend uses open_hdf5 which returns an HDF5Store — a read-only Zarr v3 store backed by async-hdf5. You can also use it directly:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 563, in decode_cf_datetime
    dates = _decode_datetime_with_pandas(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 494, in _decode_datetime_with_pandas
    time_unit, ref_date = _unpack_time_unit_and_ref_date(units)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 319, in _unpack_time_unit_and_ref_date
    time_unit = _netcdf_to_numpy_timeunit(time_unit)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 141, in _netcdf_to_numpy_timeunit
    {
KeyError: 'months'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/core/utils.py", line 1379, in attempt_import
    return importlib.import_module(module)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/.asdf/installs/python/3.12.10/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'cftime'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 355, in _decode_cf_datetime_dtype
    result = decode_cf_datetime(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 567, in decode_cf_datetime
    dates = _decode_datetime_with_cftime(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 380, in _decode_datetime_with_cftime
    cftime = attempt_import("cftime")
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/core/utils.py", line 1381, in attempt_import
    raise ImportError(
ImportError: The cftime package is required for working with non-standard calendars but could not be imported. Please install it with your package manager (e.g. conda or pip).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session demo; n4>", line 4, in <module>
    ds = xr.open_dataset(zarr_store, engine="zarr", consolidated=False, zarr_format=3)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/api.py", line 607, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1700, in open_dataset
    ds = store_entrypoint.open_dataset(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/backends/store.py", line 45, in open_dataset
    vars, attrs, coord_names = conventions.decode_cf_variables(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/conventions.py", line 412, in decode_cf_variables
    new_vars[k] = decode_cf_variable(
                  ^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/conventions.py", line 239, in decode_cf_variable
    var = decode_times.decode(var, name=name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 1414, in decode
    dtype = _decode_cf_datetime_dtype(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/docs/checkouts/readthedocs.org/user_builds/async-hdf5/checkouts/latest/python/.venv/lib/python3.12/site-packages/xarray/coding/times.py", line 367, in _decode_cf_datetime_dtype
    raise ValueError(msg) from err
ValueError: unable to decode time units 'months since 2020-01-01' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.
Raised while decoding variable 'time' with value <xarray.Variable (phony_dim_2: 12)> Size: 96B
[12 values with dtype=float64]
Attributes:
    units:    months since 2020-01-01

VirtualiZarr integration

async_hdf5.virtualizarr returns a ManifestStore containing virtual chunk references. No array data is read — only metadata and byte offsets:

from async_hdf5.virtualizarr import open_virtual_hdf5

manifest_store = asyncio.run(
    open_virtual_hdf5(filepath, store=store, url=f"file://{filepath}")
)
vds = manifest_store.to_virtual_dataset()
print(vds)
<xarray.Dataset> Size: 32kB
Dimensions:      (phony_dim_0: 18, phony_dim_1: 36, phony_dim_2: 12)
Dimensions without coordinates: phony_dim_0, phony_dim_1, phony_dim_2
Data variables:
    lat          (phony_dim_0) float64 144B ManifestArray<shape=(18,), dtype=...
    lon          (phony_dim_1) float64 288B ManifestArray<shape=(36,), dtype=...
    temperature  (phony_dim_2, phony_dim_0, phony_dim_1) float32 31kB Manifes...
    time         (phony_dim_2) float64 96B ManifestArray<shape=(12,), dtype=f...
Attributes:
    title:        Sample temperature dataset
    Conventions:  CF-1.8

Development

The Python package is built with maturin from the python/ directory. uv manages the Python environment and dependencies.

cd python
uv sync

Develop mode (debug, fast compile)

Compiles the Rust extension in debug mode and installs it into the virtualenv. Fast iteration for development — no optimizations, includes debug symbols.

uv run maturin develop

Release mode (optimized)

Compiles with --release (LTO + single codegen unit, as configured in Cargo.toml). Use this to benchmark or test production-like performance.

uv run maturin develop --release

Running tests

uv run pytest

Building a wheel

uv run maturin build --release

The wheel is written to target/wheels/.

License

Apache-2.0