Cambridge MPhil Lecture Series (semi-closed)

Abstract

Bioscience is fortunate in that the community has created a very large frictionless semantic knowledge commons for the data it creates and uses.

Most other subjects have highly heterogeneous data without semantics and this holds back the creation of knowledge. There is a pressing need to make knowledge about climate available to mitigate the effects of gaseous emissions. The most important resource is the UN’s IPCC reports, published about every five years. In 2021-2022 AR6 , with 10_000 pages, was released. #semanticClimate is a group of young Indian science students who are developing tools and community protocols to make IPCC .AR6 semantic.

Our first step is to convert PDF to structured HTML (a messy business) and then to use a variety of Text-mining tools to create vocabularies. These are turned into a distributed ontology based on equivalences with Wikidata items. Wikidata has 100 million items and maps onto most important metadata bases, e.g. genes, species, chemicals and other infrastructure such as countries, states, protocols, organizations, research establishments, etc. This effectively creates a knowledge graph for the reports, mapped onto the public Linked Open Data cloud.

The system can be used for any set of documents, such as a corpus for a literature report. All tools and data are open and participants can use the systems locally or in Google Colab.

Ref: https://www.eventbrite.co.uk/e/the-climate-knowledge-hunt-hackathon-tickets-414825362827 (run on 2022-09-24). Dr Gitanjali was a Cambridge-India Lecturer for 5 years

This talk is part of the Computational and Systems Biology Seminar Series 2022 - 23 series.

past

← Back