txt2phrases | Automated Text Processing and Keyphrases Extraction

Release Category: Alpha

txt2phrases is a Python library and command-line tool for processing and analyzing textual data. It offers a streamlined workflow to convert documents (HTML and PDF) into plain text, extract keywords using AI-based models, and classify them into specific and general categories using TF-IDF techniques.

Role: Pipeline from documents (PDF/HTML) to plain text and then to keyphrases; can consume pygetpapers output.

Primary functionality:

Primary inputs:

Primary outputs:

Main file types for transfer: .pdf, .html, .txt, .csv.

Installation

hackathon

← Back