Release Category: Production
Developed By: Shweata N. Hegde and Peter Murray-Rust
docanalysis is a command-line tool that processes document collections (CProjects) and performs text analysis.
It can:
It uses custom code along with Python tools like NLTK, and it can use spaCy or scispaCy for extracting and annotating entities. The tool creates summary data and word lists as output.
Primary functionality:
fulltext.xml, eupmc_result.json).fulltext.xml (JATS) into sections/ directory trees.Primary inputs:
fulltext.xml (and optionally eupmc_result.json).Primary outputs:
sections/ per CTree (sectioned XML).Main file types for transfer: .xml (fulltext, sectioned), .json (eupmc_result), .csv, .html, .json (output), AMI .xml dictionaries.
pip install docanalysis to install docanalysisCheck the successful installation with command : docanalysis --help. You should see a help message come up.
Code Repository - github
README file of docanalysis : docanalysis/README.md