Introduction about semanticClimate tools

The #semanticClimate tools provides an innovative approach to manage climate data efficiently. This page provides an overview of these tools and their practical applications in semantifying the climate reports and data.

Requirements for Installing Tools with pip :

amilib

Release Category: Beta

amilib has tools for finding, cleaning, converting, searching, republishing legacy documents (PDF, PNG, etc.).

It is a Python library designed for document processing, and dictionary creation.

We can create dctionaries using amilib from existing set of words. The library simplifies data extraction and manipulation, offering a user-friendly interface for processing data formats like HTML and XML. It ensures that complex operations like term marking and dictionary building can be performed with minimal coding effort.

Check the successful installation with command : amilib --help. You should see a help message come up.

amiclimate

Release Category: Alpha

amiclimate is a NLP and semantic software and material for managing climate knowledge.

It is a Python code for accessing and transforming key climate documents. A refactoring of the (bloated) pyamihtml repository and has the functionality for downloading and parsing

  1. IPCC reports
  2. IPCC glossary
  3. UNFCCC reports (COP, etc.)

This repository will NOT have the complete IPCC or UNFCCC corpus , but will have small exemplars.

pyamiimage

Release Category: Alpha

pyamiimage is a set of tools to extract semantic information from scientific diagrams.

The output of pyamiimage is an image with annotations of substrate, products and enzymes.

pygetpapers

Release Category: Production

pygetpapers is a tool to assist text miners. It makes requests to open access scientific text repositories, analyses the hits, and systematically downloads the articles without further interaction.

It has been developed by Ayush Garg under the guidance of the OpenVirus community and Peter Murray Rust and Rik Smith-Unna funded by ContentMine.

It comes with the packages pygetpapers and download tools which provide various functions to download, process and save research papers and their metadata.

We use pygetpapers for querying current and past scholarly literature in bulk.

Check the successful installation with command : pygetpapers --help. You should see a help message come up.

docanalysis

Release Category: Production

docanalysis is a command-line tool that processes document collections (CProjects) and performs text analysis.

It can:

  1. Divide documents into sections
  2. Perform text mining and natural language processing (NLP)
  3. Generate dictionaries of terms

It uses custom code along with Python tools like NLTK, and it can use spaCy or scispaCy for extracting and annotating entities. The tool creates summary data and word lists as output.

Check the successful installation with command : docanalysis --help. You should see a help message come up.


← Back