Ethnicity¶

lamindb provides access to the following public Ethnicity ontologies through bionty:

Human Ancestry Ontology

Here we show how to access and search Ethnicity ontologies to standardize new data.

import bionty as bt
import pandas as pd

PublicOntology objects¶

Let us create a public ontology accessor with .public method, which chooses a default public ontology source from Source. It’s a PublicOntology object, which you can think about as a public registry:

ethnicitys = bt.Ethnicity.public(organism="human")
ethnicitys

→ connected lamindb: testuser1/test-public-ontologies

PublicOntology
Entity: Ethnicity
Organism: human
Source: hancestro, 3.0
#terms: 342

As for registries, you can export the ontology as a DataFrame:

df = ethnicitys.df()
df.head()

	name	definition	synonyms	parents
ontology_id
HANCESTRO:0002	region	Any Geographic Area Greater Than An Individual...	geographical area	[]
HANCESTRO:0003	country	A Collective Generic Term That Refers Here To ...	None	[]
HANCESTRO:0004	ancestry category	Population Category Defined Using Ancestry Inf...	ancestral group	[]
HANCESTRO:0005	European	Includes Individuals Who Either Self-Report Or...	Caucasian\|white	[HANCESTRO:0004]
HANCESTRO:0006	South Asian	Includes Individuals Who Either Self-Report Or...	None	[HANCESTRO:0008]

Unlike registries, you can also export it as a Pronto object via public.ontology.

Look up terms¶

As for registries, terms can be looked up with auto-complete:

lookup = ethnicitys.lookup()

The . accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

lookup.american

Ethnicity(ontology_id='HANCESTRO:0463', name='American', definition=None, synonyms=None, parents=array(['HANCESTRO:0566'], dtype=object))

To look up the exact original strings, convert the lookup object to dict and use the [] accessor:

lookup_dict = lookup.dict()
lookup_dict["American"]

Ethnicity(ontology_id='HANCESTRO:0463', name='American', definition=None, synonyms=None, parents=array(['HANCESTRO:0566'], dtype=object))

By default, the name field is used to generate lookup keys. You can specify another field to look up:

lookup = ethnicitys.lookup(ethnicitys.ontology_id)

lookup.hancestro_0463

Ethnicity(ontology_id='HANCESTRO:0463', name='American', definition=None, synonyms=None, parents=array(['HANCESTRO:0566'], dtype=object))

Search terms¶

Search behaves in the same way as it does for registries:

ethnicitys.search("American").head(3)

	name	definition	synonyms	parents
ontology_id
HANCESTRO:0463	American	None	None	[HANCESTRO:0566]
HANCESTRO:0013	Native American	Includes Indigenous Individuals Of North, Cent...	American Indian	[HANCESTRO:0004]
HANCESTRO:0016	African American or Afro-Caribbean	Includes Individuals Who Either Self-Report Or...	None	[HANCESTRO:0004]

By default, search also covers synonyms and all other fileds containing strings:

ethnicitys.search("Caucasian").head(3)

	name	definition	synonyms	parents
ontology_id
HANCESTRO:0005	European	Includes Individuals Who Either Self-Report Or...	Caucasian\|white	[HANCESTRO:0004]

Search specific field (by default, search is done on all fields containing strings):

ethnicitys.search(
    "General characterisation of the Ancestry of a population",
    field=ethnicitys.definition,
).head()

	name	definition	synonyms	parents
ontology_id
HANCESTRO:0304	ancestry status	General Characterisation Of The Ancestry Of A ...	None	[]

Standardize Ethnicity identifiers¶

Let us generate a DataFrame that stores a number of Ethnicity identifiers, some of which corrupted:

df_orig = pd.DataFrame(
    index=[
        "Mende",
        "European",
        "South Asian",
        "Arab",
        "This ethnicity does not exist",
    ]
)
df_orig


Mende
European
South Asian
Arab
This ethnicity does not exist

We can check whether any of our values are validated against the ontology reference:

validated = ethnicitys.validate(df_orig.index, ethnicitys.name)
df_orig.index[~validated]

! 1 unique term (20.00%) is not validated: 'This ethnicity does not exist'

Index(['This ethnicity does not exist'], dtype='object')

Ontology source versions¶

For any given entity, we can choose from a number of versions:

bt.Source.filter(entity="bionty.Ethnicity").df()

Show code cell output

Hide code cell output

	uid	entity	organism	name	in_db	currently_used	description	url	md5	source_website	space_id	dataframe_artifact_id	version	run_id	created_at	created_by_id	_aux	branch_id
id
32	MJRqduf9	bionty.Ethnicity	human	hancestro	False	True	Human Ancestry Ontology	http://purl.obolibrary.org/obo/hancestro/relea...	None	https://github.com/EBISPOT/hancestro	1	None	3.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1

# only lists the sources that are currently used
bt.Source.filter(entity="bionty.Ethnicity", currently_used=True).df()

	uid	entity	organism	name	in_db	currently_used	description	url	md5	source_website	space_id	dataframe_artifact_id	version	run_id	created_at	created_by_id	_aux	branch_id
id
32	MJRqduf9	bionty.Ethnicity	human	hancestro	False	True	Human Ancestry Ontology	http://purl.obolibrary.org/obo/hancestro/relea...	None	https://github.com/EBISPOT/hancestro	1	None	3.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1

When instantiating a Bionty object, we can choose a source or version:

source = bt.Source.filter(
    name="hancestro", organism="human"
).first()
ethnicitys= bt.Ethnicity.public(source=source)
ethnicitys

PublicOntology
Entity: Ethnicity
Organism: human
Source: hancestro, 3.0
#terms: 342

The currently used ontologies can be displayed using:

bt.Source.filter(currently_used=True).df()

Show code cell output

Hide code cell output

	uid	entity	organism	name	in_db	currently_used	description	url	md5	source_website	space_id	dataframe_artifact_id	version	run_id	created_at	created_by_id	_aux	branch_id
id
1	33TUF039	bionty.Organism	vertebrates	ensembl	False	True	Ensembl	https://ftp.ensembl.org/pub/release-112/specie...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
2	6bbVUTCS	bionty.Organism	bacteria	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
3	6s9nV6xh	bionty.Organism	fungi	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/fungi...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
4	2PmTrc8x	bionty.Organism	metazoa	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/metaz...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
5	7GPHh16S	bionty.Organism	plants	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
6	4tsksCMX	bionty.Organism	all	ncbitaxon	False	True	NCBItaxon Ontology	http://purl.obolibrary.org/obo/ncbitaxon/2023-...	None	https://github.com/obophenotype/ncbitaxon	1	None	2023-06-20	None	2025-07-14 06:41:44.843000+00:00	1	None	1
7	4UGNz3fr	bionty.Gene	human	ensembl	False	True	Ensembl	s3://bionty-assets/df_human__ensembl__release-...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
8	4r4fvV0S	bionty.Gene	mouse	ensembl	False	True	Ensembl	s3://bionty-assets/df_mouse__ensembl__release-...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
9	4RPA3Re0	bionty.Gene	saccharomyces cerevisiae	ensembl	False	True	Ensembl	s3://bionty-assets/df_saccharomyces cerevisiae...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
10	3EYyGRYN	bionty.Protein	human	uniprot	False	True	Uniprot	s3://bionty-assets/df_human__uniprot__2024-03_...	None	https://www.uniprot.org	1	None	2024-03	None	2025-07-14 06:41:44.843000+00:00	1	None	1
11	01RWXN2V	bionty.Protein	mouse	uniprot	False	True	Uniprot	s3://bionty-assets/df_mouse__uniprot__2024-03_...	None	https://www.uniprot.org	1	None	2024-03	None	2025-07-14 06:41:44.843000+00:00	1	None	1
12	3kDh8qAX	bionty.CellMarker	human	cellmarker	False	True	CellMarker	s3://bionty-assets/human_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
13	7bV5uJo3	bionty.CellMarker	mouse	cellmarker	False	True	CellMarker	s3://bionty-assets/mouse_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
14	6LyRtvz8	bionty.CellLine	all	clo	False	True	Cell Line Ontology	s3://bionty-assets/df_all__clo__2022-03-21__Ce...	None	https://bioportal.bioontology.org/ontologies/CLO	1	None	2022-03-21	None	2025-07-14 06:41:44.843000+00:00	1	None	1
16	3Uw2Va7a	bionty.CellType	all	cl	False	True	Cell Ontology	http://purl.obolibrary.org/obo/cl/releases/202...	None	https://obophenotype.github.io/cell-ontology	1	None	2024-08-16	None	2025-07-14 06:41:44.843000+00:00	1	None	1
17	MUtAGdL4	bionty.Tissue	all	uberon	False	True	Uberon multi-species anatomy ontology	http://purl.obolibrary.org/obo/uberon/releases...	None	http://obophenotype.github.io/uberon	1	None	2024-08-07	None	2025-07-14 06:41:44.843000+00:00	1	None	1
18	IGIkseWQ	bionty.Disease	all	mondo	False	True	Mondo Disease Ontology	http://purl.obolibrary.org/obo/mondo/releases/...	None	https://mondo.monarchinitiative.org	1	None	2025-06-03	None	2025-07-14 06:41:44.843000+00:00	1	None	1
19	4kswnHVF	bionty.Disease	human	doid	False	True	Human Disease Ontology	http://purl.obolibrary.org/obo/doid/releases/2...	None	https://disease-ontology.org	1	None	2024-05-29	None	2025-07-14 06:41:44.843000+00:00	1	None	1
21	2a1HvjdB	bionty.ExperimentalFactor	all	efo	False	True	The Experimental Factor Ontology	http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl	None	https://bioportal.bioontology.org/ontologies/EFO	1	None	3.70.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
22	6S4qkDx1	bionty.Phenotype	all	pato	False	True	Phenotype And Trait Ontology	http://purl.obolibrary.org/obo/pato/releases/2...	None	https://github.com/pato-ontology/pato	1	None	2024-03-28	None	2025-07-14 06:41:44.843000+00:00	1	None	1
23	48fBFLmn	bionty.Phenotype	human	hp	False	True	Human Phenotype Ontology	https://github.com/obophenotype/human-phenotyp...	None	https://hpo.jax.org	1	None	2024-04-26	None	2025-07-14 06:41:44.843000+00:00	1	None	1
25	7Ent3V2y	bionty.Pathway	all	go	False	True	Gene Ontology	http://purl.obolibrary.org/obo/go/releases/202...	None	http://geneontology.org	1	None	2024-06-17	None	2025-07-14 06:41:44.843000+00:00	1	None	1
27	3rm9aOzL	BFXPipeline	all	lamin	False	True	Bioinformatics Pipeline	s3://bionty-assets/df_all__lamin__1.0.0__BFXpi...	None	https://lamin.ai	1	None	1.0.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
28	ugaIoIlj	Drug	all	dron	False	True	Drug Ontology	http://purl.obolibrary.org/obo/dron/releases/2...	None	https://bioportal.bioontology.org/ontologies/DRON	1	None	2024-08-05	None	2025-07-14 06:41:44.843000+00:00	1	None	1
30	1GbFkOdz	bionty.DevelopmentalStage	human	hsapdv	False	True	Human Developmental Stages	https://github.com/obophenotype/developmental-...	None	https://github.com/obophenotype/developmental-...	1	None	2024-05-28	None	2025-07-14 06:41:44.843000+00:00	1	None	1
31	10va5JSt	bionty.DevelopmentalStage	mouse	mmusdv	False	True	Mouse Developmental Stages	https://github.com/obophenotype/developmental-...	None	https://github.com/obophenotype/developmental-...	1	None	2024-05-28	None	2025-07-14 06:41:44.843000+00:00	1	None	1
32	MJRqduf9	bionty.Ethnicity	human	hancestro	False	True	Human Ancestry Ontology	http://purl.obolibrary.org/obo/hancestro/relea...	None	https://github.com/EBISPOT/hancestro	1	None	3.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
33	5JnVODh4	BioSample	all	ncbi	False	True	NCBI BioSample attributes	s3://bionty-assets/df_all__ncbi__2023-09__BioS...	None	https://www.ncbi.nlm.nih.gov/biosample/docs/at...	1	None	2023-09	None	2025-07-14 06:41:44.843000+00:00	1	None	1