Spatial¶
Here, you’ll learn how to manage spatial datasets:
Spatial omics data integrates molecular profiling (e.g., transcriptomics, proteomics) with spatial information, preserving the spatial organization of cells and tissues. It enables high-resolution mapping of molecular activity within biological contexts, crucial for understanding cellular interactions and microenvironments.
Many different spatial technologies such as multiplexed imaging, spatial transcriptomics, spatial proteomics, whole-slide imaging, spatial metabolomics, and 3D tissue reconstruction exist which can all be stored in the SpatialData data framework. For more details we refer to the original publication:
Marconato, L., Palla, G., Yamauchi, K.A. et al. SpatialData: an open and universal data framework for spatial omics. Nat Methods 22, 58–62 (2025). https://doi.org/10.1038/s41592-024-02212-x
Note
A collection of curated spatial datasets in SpatialData format is available on the scverse/spatialdata-db instance.
spatial data vs SpatialData terminology
When we mention spatial data, we refer to data from spatial assays, such as spatial transcriptomics or proteomics, that includes spatial coordinates to represent the organization of molecular features in tissue. When we refer SpatialData, we mean spatial omics data stored in the scverse SpatialData framework.
# pip install 'lamindb[jupyter,bionty]' spatialdata spatialdata-plot
!lamin init --storage ./test-spatial --modules bionty
import lamindb as ln
import bionty as bt
import spatialdata as sd
import warnings
warnings.filterwarnings("ignore")
spatial_guide_datasets = ln.Project(name="spatial guide datasets").save()
ln.track(project=spatial_guide_datasets)
Creating artifacts¶
You can use from_spatialdata()
method to create an Artifact
object from a SpatialData
object.
example_blobs_sdata = ln.core.datasets.spatialdata_blobs()
example_blobs_sdata
blobs_af = ln.Artifact.from_spatialdata(
example_blobs_sdata, key="example_blobs.zarr"
).save()
blobs_af
To retrieve the object back from the database you can, e.g., query by key
.
example_blobs_sdata = ln.Artifact.get(key="example_blobs.zarr")
local_zarr_path = blobs_af.cache() # returns a local path to the cached .zarr store
example_blobs_sdata = (
blobs_af.load() # calls sd.read_zarr() on a locally cached .zarr store
)
To see data lineage.
blobs_af.view_lineage()
Curating artifacts¶
For the remainder of the guide, we will work with two 10X Xenium and a 10X Visium H&E image datasets that were ingested in raw form here.
Metadata is stored in two places in the SpatialData object:
Dataset level metadata is stored in
sdata.attrs["sample"]
.Measurement specific metadata is stored in the associated tables in
sdata.tables
.
Define a schema¶
We define a lamindb.Schema
to curate both sample and table metadata.
Curating different spatial technologies
Reading different spatial technologies into SpatialData objects can result in very different objects with different metadata. Therefore, it can be useful to define technology specific Schemas by reusing Schema components.
# define features
ln.Feature(name="organism", dtype=bt.Organism).save()
ln.Feature(name="assay", dtype=bt.ExperimentalFactor).save()
ln.Feature(name="disease", dtype=bt.Disease).save()
ln.Feature(name="tissue", dtype=bt.Tissue).save()
ln.Feature(name="celltype_major", dtype=bt.CellType, nullable=True).save()
# define simple schemas
flexible_metadata_schema = ln.Schema(
name="Flexible metadata", itype=ln.Feature, coerce_dtype=True
).save()
ensembl_gene_ids = ln.Schema(
name="Spatial var level (Ensembl gene id)", itype=bt.Gene.ensembl_gene_id
).save()
# define composite schema
spatial_schema = ln.Schema(
name="Spatialdata schema (flexible)",
otype="SpatialData",
slots={
"attrs:sample": flexible_metadata_schema,
"tables:table:obs": flexible_metadata_schema,
"tables:table:var.T": ensembl_gene_ids,
},
).save()
Curate a Xenium dataset¶
# load first of two cropped Xenium datasets
xenium_aligned_1_sdata = (
ln.Artifact.using("laminlabs/lamindata")
.get(key="xenium_aligned_1_guide_min.zarr")
.load()
)
xenium_aligned_1_sdata
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_1_sdata, spatial_schema)
try:
xenium_curator.validate()
except ln.errors.ValidationError as error:
print(error)
xenium_aligned_1_sdata.tables["table"].obs["celltype_major"] = (
xenium_aligned_1_sdata.tables["table"]
.obs["celltype_major"]
.replace(
{
"CAFs": "cancer associated fibroblast",
"Endothelial": "endothelial cell",
"Myeloid": "myeloid cell",
"PVL": "perivascular cell",
"T-cells": "T cell",
"B-cells": "B cell",
"Normal Epithelial": "epithelial cell",
"Plasmablasts": "plasmablast",
"Cancer Epithelial": "neoplastic epithelial cell",
}
)
)
try:
xenium_curator.validate()
except ln.errors.ValidationError as error:
print(error)
xenium_curator.slots["tables:table:obs"].cat.add_new_from("celltype_major")
xenium_1_curated_af = xenium_curator.save_artifact(key="xenium1.zarr")
xenium_1_curated_af.describe()
Curate additional Xenium datasets¶
We can reuse the same curator for a second Xenium dataset:
xenium_aligned_2_sdata = (
ln.Artifact.using("laminlabs/lamindata")
.get(key="xenium_aligned_2_guide_min.zarr")
.load()
)
xenium_aligned_2_sdata.tables["table"].obs["celltype_major"] = (
xenium_aligned_2_sdata.tables["table"]
.obs["celltype_major"]
.replace(
{
"CAFs": "cancer associated fibroblast",
"Endothelial": "endothelial cell",
"Myeloid": "myeloid cell",
"PVL": "perivascular cell",
"T-cells": "T cell",
"B-cells": "B cell",
"Normal Epithelial": "epithelial cell",
"Plasmablasts": "plasmablast",
"Cancer Epithelial": "neoplastic epithelial cell",
}
)
)
xenium_2_curated_af = ln.Artifact.from_spatialdata(
xenium_aligned_2_sdata, key="xenium2.zarr", schema=spatial_schema
).save()
xenium_2_curated_af.describe()
Curate Visium datasets¶
Analogously, we can define a Schema and Curator for Visium datasets:
visium_aligned_sdata = (
ln.Artifact.using("laminlabs/lamindata")
.get(key="visium_aligned_guide_min.zarr")
.load()
)
visium_aligned_sdata
visium_curated_af = ln.Artifact.from_spatialdata(
visium_aligned_sdata, key="visium.zarr", schema=spatial_schema
).save()
visium_curated_af.describe()
Overview of the curated datasets¶
visium_curated_af.view_lineage()
ln.Artifact.df(features=True, include=["hash", "size"])
→ queried for all categorical features with dtype ULabel or Record and non-categorical features: (0) []
uid | key | size | hash | |
---|---|---|---|---|
id | ||||
7 | dR7fFij5tl8ajV4u0000 | visium.zarr | 5809805 | FasYDXY1qepH8fK5_Pp0gw |
5 | QO3EU6UpuVg0kvxS0000 | xenium2.zarr | 40822700 | ZQ98-PdE0MWmJ5Rjvl1ygA |
3 | gJlQTT3vhLenwgPc0000 | xenium1.zarr | 35115549 | YXxhS5L_8LXdNXQfOSHTuQ |
1 | QVmhbqLwqpnEweZI0000 | example_blobs.zarr | 12121751 | lh7PfKS4VT1miuyr7gw6hw |
4 | KFhRNPqcdoxBCNZt0001 | xenium_aligned_2_guide_min.zarr | 40822308 | oH569Lh4koYRB1I6AatnGQ |
2 | kVMuYil81BHTwQ9G0001 | xenium_aligned_1_guide_min.zarr | 35115305 | 8f1qC6IkpSvFw2H8TdhplQ |
6 | bjH534dxVi1drmLZ0001 | visium_aligned_guide_min.zarr | 5809684 | a8rVkf_kjp9To9KI06i03g |
ln.finish()