ChemSpider Reactions: Delivering a free community

0 downloads 0 Views 1MB Size Report
a free community resource of chemical syntheses. Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken. Karapetyan, David Sharpe and Antony Williams.
ChemSpider Reactions: Delivering a free community resource of chemical syntheses Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken Karapetyan, David Sharpe and Antony Williams

ACS New Orleans April 2013

Overview • • • • • •

Motivation The RSC and chemical reaction data New sources of chemical reaction data ChemSpider Reactions: bringing it all together Experiments with reaction classification The National Chemical Database Service

Who needs another reaction database? • Those who cannot afford to license access… • Those who would like to access data that is not abstracted • Those who might like to contribute data to a database • Anybody wanting to integrate their systems in and to pull data out.

RSC and chemical reaction data 1

Graphical abstracting journals: Methods in Organic Synthesis (monthly, 1990 to present) Catalysts and Catalysed Reactions (monthly, 2005 to present) These constitute a backfile of over 50000 novel reactions

RSC and chemical reaction data 2

RSC and chemical reaction data 3

New sources of reaction data

Daniel Lowe’s PhD thesis (Cantab, 2012) was on extracting reactions from US patent data. We can apply this technology to the RSC Journal archive.

ChemSpider Reactions bringing it all together http://csr.dev.rsc-us.org/

WORK IN PROGRESS

Reaction classification

1

Project Prospect has text-mined RSC journal articles for named reactions and molecular processes, annotated according to Creative Commons-licensed ontologies: See http://rxno.googlecode.com/

Reaction classification Heteroatom alkylation and arylation 1%

3%

2%

Acylation and related processes

2%

C-C bond formation

2% 13%

6%

Heterocycle formation Protections Deprotections

1%

Reductions 4% Oxidations

0% 10%

Functional group interconversion Functional group addition

Classification of Daniel’s US Patent data

Resolution

2

Reaction InChI To do for reactions what InChI has done for structures • Think online searching • Deduplication and linking http://www-rinchi.ch.cam.ac.uk/help.html

Reaction InChI Early work – RInChIs layered on to a few hundred thousand reactions • Not generated for a few 10s of thousands of reactions • Reaction deduplication results differ based on algorithm – GGA software versus RInChI • Under investigation

Other sources ChemSpider SyntheticPages • Electronic Lab Notebooks • University repositories Please send theses

What will ChemSpider Reactions serve? • Chemical Database Service • Linking back to original publications/supplementary data • Underpinning other tools e.g. retrosynthetic analysis (depends on data quality and mapping)

Chemical Database Service National Chemical Database Service for UK academics Integrates commercial databases and services Chemicals, analytical data, prediction algorithms Development of data repository

ARChem from SimBioSys

1

Synthesis planning tool which performs ruleand precedent-based retrosynthetic analysis back to commercially available starting materials.

ARChem from SimBiosys

2

ARChem from SimBioSys

3

But what about data quality? • Data validation and curation required • Encouraging participation with Rewards and RECOGNITION

Manual curation • Integrated commenting, curating and validation platform across ALL eScience and Publishing platforms • All integrated to a central RSC profile and feeding the alt-metrics tools

The other kind of RDF (made-up example) Chemical reactions are unusually well-suited to representation. (Donald Davidson’s event semantics) _:r1 a obo:RXNO_0000004 ; # Diels–Alder obo:has_participant_ceasing_to_exist _:m1 ; # a diene obo:has_participant_ceasing_to_exist _:m2 ; # an olefin obo:has_participant_starting_to_exist _:m3 . # a substituted cyclohexene _:m1 a . _:m2 a . _:m3 a .

Questions? E-mail: [email protected], [email protected]