Feature Article
Science Forum Wikidata as a knowledge graph for the life sciences
found in taxon (1,247)
(63
6)
symptom 1,089
positive therapeutic predictor (685)
sequence variant 1,502
ap
e
ne
ga
eu
tic
p
tiv
CIViC variant ID: 1,398 HGVS nomenclature: 820
subject has role (2,307)
18)
er th
ject
negative therapeutic predictor (565)
e th
0)
(1,4
r
e itiv
s po
d re
or ict
5 (7
role
d
,3 32 ) rt of (3 0 pa
5) 1, 05
anatomical structure 120,184
sub
InChIKey: 156,336 InChI: 153,826 PubChem CID: 150,018 ChemSpider ID: 124,461 ChEBI ID: 84,459 CAS Registry Number: 71,467 UNII: 58,419 ...
e ap
9
has
ed dru ica g u l c se on d f dit or ion tre tre atm at en ed t (6 / ,8 23 ) m
chemical compound 163,252
ic ut
e pr
o ict
6 r(
Freebase ID: 1,462 TA98 Latin term: 1,363 Terminologia Anatomica 98 ID: 1,353 UBERON ID: 1,187 Encyclopædia Britannica Online ID: 743 MeSH descriptor ID: 693 UMLS CUI: 616 ...
ly
in
te r
ac
ts
w
ith
(2
,5
03
)
cell component (15,310)
protein 961,210 RefSeq Protein ID: 750,780 UniProt protein ID: 646,506 Ensembl Protein ID: 251,125 PDB structure ID: 44,732 cell component (907)
mechanism of action 182 MeSH Code: 288 MeSH ID: 168
al
5)
c
7)
ic
0,92
a
ific sign
,3
ys
of (1
g
ru nt d
ra inte
(2 tion
significant drug interaction (3,130) 20)
ph
/ part
rt
6)
63)
pa has
rt / pa
7 of (6
part
has part / part of (1,093)
ha
3,7
subclass of (221)
5)
f (1
significant drug interaction (247)
,66
binding site 77
has
f (3
5)
27
t(
ar
InterPro ID: 76
rt o
rt o
InterPro ID: 132
medication 3,869
/ pa
pa
sp
part
rt /
rt
ha
has
a sp
pa
active site 132
CAS Registry Number: 2,775 UNII: 2,664 PubChem CID: 2,579 InChIKey: 2,535 ChemSpider ID: 2,503 InChI: 2,469 ChEMBL ID: 2,468 ...
a loc nat at om ion ic (9 al 59 )
/ of
(278,089)
t has role (4 ,315) cant dr ug inte ractio n (367 )
signifi
protein family 27,431
Reactome ID: 2,250
part of / has part
(306)
subclass of (6,276)
biological pathway 2,994
InterPro ID: 22,025
)
ss of
has part / part of (2,278)
biological variant of (1,534)
t / 26 en ,6 m (9 at d re ate rt fo tre ed ion ) us dit 97 ug on ,9 dr a l c (1 ry ic go te ca cy an
gn
subcla
4)
encoded by / encodes (1,845,119)
ms
ed
subject has role (7,945)
pto
m
subject has role (5,052)
sym
/ treated ) ndition ent (1,112 al co m medic for treat ed us drug
subjec
,773)
(220)
MonDO ID: 11,914 UMLS CUI: 11,441 Disease Ontology ID: 9,509 ICD-10-CM: 6,805 Orphanet ID: 6,745 MeSH descriptor ID: 6,019 OMIM ID: 5,975 ...
65)
e pr
significant drug interaction (246)
1)
n (795
3,61
on (1
on (2,979)
le (3
pharmacologic action 1,332
77)
n (3
se / ) cau 8 has ect (57 eff ha s
disease 17,080
of (2
ctio
ciati
asso
to f(
ss
tera
gen
pa r
cla
g in
,02
MeSH descriptor ID: 658 Freebase ID: 490 ChEBI ID: 359 CAS Registry Number: 230 UNII: 214 ChemSpider ID: 212 PubChem CID: 212 ...
h acti as ac ve ti ing ve in re d g re ien die t in nt / (2,1 64 ) therap eutic area (1,505 )
etic
sub dru
s ro
symptoms (685)
gene 1,176,028 Entrez Gene ID: 737,302 RefSeq RNA ID: 561,824 NCBI Locus tag: 502,347 Ensembl Transcript ID: 401,691 Ensembl Gene ID: 122,639 MGI Gene Symbol: 71,959 Mouse Genome Informatics ID: 65,989 ...
t/
t ha
instance of (2,335)
ha sp ar
ant
jec
in taxo
subclass of (41,199)
has active ingredient / active ingredient in (3,030)
nific
sub
ortholog (3,711,264)
found
found in tax
sig
RxNorm CUI: 2,046 European Medicines Agency product number: 1,068
/ nt ) ie 52 ed (2 gr in in t e en it v edi ac gr s in ha ive t ac
dr med ug ic us al c ed on for dit tre ion atm tre en ate t( d/ 1,0 22 )
found in taxon (581,407)
part of / has part
MeSH ID: 608 ChEBI ID: 478 MeSH Code: 443 Freebase ID: 422 KEGG ID: 395 ATC code: 390 ChemSpider ID: 316 ...
pharmaceutical product 2,731
pa rt /
therapeutic use 803
taxon 2,600,217 Global Biodiversity Information Facility ID: 2,058,609 Encyclopedia of Life ID: 1,354,013 IRMNG ID: 1,214,539 iNaturalist taxon ID: 569,998 ITIS TSN: 533,003 IPNI plant ID: 488,933 NCBI Taxonomy ID: 471,220 ...
ha s
has active ingredient / active ingredient in (236)
part of / has part (726)
physically interacts with (675)
stereoisomer of (642) physically interacts with (3,924) significant drug interaction (1,725)
Figure 1. A simplified class-level diagram of the Wikidata knowledge graph for biomedical entities. Each box represents one type of biomedical entity. The header displays the name of that entity type (e.g., pharmaceutical product) and the number of Wikidata items for that entity type. The lower portion of each box displays a partial listing of attributes about each entity type and the number of Wikidata items for each attribute. Edges between boxes represent the number of Wikidata statements corresponding to each combination of subject type, predicate, and object type. For example, there are 1505 statements with ’pharmaceutical product’ as the subject type, ’therapeutic area’ as the predicate, and ’disease’ as the object type. For clarity, edges for reciprocal relationships (e.g., ’has part’ and ’part of’) are combined into a single edge, and scientific articles (which are widely cited in statement references) have been omitted. All counts of Wikidata items are current as of September 2019. The most common data sources cited as references are available in Figure 1—source data 1. Data are generated using the code in https://github.com/SuLab/genewikiworld (archived at Mayers et al., 2020). A more complete version of this graph diagram can be found at https://commons.wikimedia.org/wiki/File:Biomedical_ Knowledge_Graph_in_Wikidata.svg. The online version of this article includes the following source data and figure supplement(s) for figure 1: Source data 1. Most frequent data sources cited as references for the biomedical subset of the Wikidata knowledge graph shown in Figure 1. Figure supplement 1. Trends in Wikidata edits.
focused on those with a clear clinical or therapeutic relevance. Chemical compounds including drugs: Wikidata has items for over 150 thousand chemical compounds, including over 3500 items which are specifically designated as medications. Compound attributes are drawn from a diverse set of databases, including PubChem
Waagmeester et al. eLife 2020;9:e52614. DOI: https://doi.org/10.7554/eLife.52614
(Wang et al., 2009), RxNorm (Nelson et al., 2011), the IUPHAR Guide to Pharmacology (Harding et al., 2018; Pawson et al., 2014; Southan et al., 2016), NDF-RT (National Drug File – Reference Terminology), and LIPID MAPS (Sud et al., 2007). These items typically contain statements describing chemical structure and key physicochemical properties, and links to
3 of 15