Cheminformatics to improve Wikidata on chemical compounds

Egon Willighagen

Playlists: 'wikidatacon2019' videos starting here / audio / related events

Chemistry has long been an important domain-specific corner in the Wikipedia and Wikidata communities. The two are not tightly linked, though increasingly information from Wikidata shows up on Wikipedia ChemBoxes. We have been using Wikidata content in our research into human metabolism and metabolic diseases. This requires the information about metabolites in Wikidata to be accurate. We have been using cheminformatics to support our manual work to add missing information and compounds and curate existing knowledge.
In this presentation it will be shown how the Chemistry Development Kit (Q2383032), Bioclipse (Q1769726), and QuickStatements (Q20084080) have been used in the past two years for these purposes ( We will demonstrate this infrastructure of Open Source tools, and how it can be used for using the Simplified molecular input line entry specification (Q466769) and International Chemical Identifier (Q203250) information to: link out to external databases (e.g. the EPA CompTox Chemistry Dashboard (Q26998510), MassBank (Q24088019), LIPID MAPS (Q20968889), etc); add physicochemical properties; add missing InChIs and chemical formulas using the SMILES; add new compounds based on a SMILES; and, detect incorrect or inconsistent information in Wikidata items on chemical compounds.