We work by creating, distributing and using software. We are involved in making climate knowledge semantic and by semantic we mean there's a formal structure to the documents or other informations in which specific terms are linked to definitions particularly in Wikimedia such as Wikidata or Wikipedia. These documents are described as a five-star document in Tim Berners-Lee's approach. That is it's open, it's semantic, and it can be embedded in the linked open data graph of the world of which Wikidata is the epitome with about 100 million nodes in that.
Our work at semanticClimate is mainly around the tool set for the content retrieval, creating what we call a corpus or mini corpus of IPCC reports and then some amplification of that.
Here are three programs called:
pygetpapers which reads the scientific literature automatically.
docanalysis which analyzes it
And these were written by two of our earliest interns. pygetpapers from Ayush Garg when he's still at school. docanalysis from Shweata Hegde who did this while an undergraduate.
and the third tool a library of utilities called
The ultimate goal is to take the IPCC material and automatically convert it into a standard normalized semantic form.
We use HTML as final product and then label it, build dictionaries which are term base on this and label as much as we can.
We build knowledge graphs out of that, start to find patterns in the knowledge graphs using machine learning methods.