Release Category: Production
Developed By: Ayush Garg
pygetpapers is a tool to download papers and metadata from open-access repositories. It makes requests to open access scientific text repositories, analyses the hits, and systematically downloads the articles without further interaction.
It has been developed by Ayush Garg under the guidance of the OpenVirus community and Peter Murray Rust and Rik Smith-Unna funded by ContentMine.
It comes with the packages pygetpapers and download tools which provide various functions to download, process and save research papers and their metadata.
We use pygetpapers for querying current and past scholarly literature in bulk.
Primary functionality:
~/pygetpapers/ (or custom path) with per-article folders.Primary inputs:
--xml, --pdf, --makecsv, --datatables).Primary outputs:
{output_root}/{repo}_{timestamp}/ with {paper_id}/ subdirs.fulltext.xml, fulltext.pdf, fulltext.html, fulltext.pdf.html (when requested); eupmc_result.json / *_result.json per article; eupmc_results.json (or repo-specific) at project level; *_papers_data.json for DataTables; datatables.html; CSV when --makecsv.Main file types for transfer: .xml, .pdf, .html, .json, .csv.
pip install pygetpapers to install pygetpapersCheck the successful installation with command : pygetpapers --help. You should see a help message come up.
Code Repository - github
README file of pygetpapers : pygetpapers/README.md