article-extracting

📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа

nlp language-detection nltk readability text-processing fasttext nlp-parsing sentence-tokenizer article-extracting language-recognition article-extractor

Updated Mar 7, 2023
Python

Sathish-Vasudev / Article-Scraper

Star

The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this program will give a neatly modified Word Document in '.docx' format with the contents of the article.

python3 python-docx article-extracting article-extractor literature-mining newspaper3k article-scraper

Updated Aug 5, 2020
Python

johnbumgarner / newshound

Star

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

data-science text-mining data-mining news news-aggregator python3 datascience web-scraping data-extraction webscraping news-crawler article-extracting article-extractor newspaper-crawler python-newspaper

Updated Mar 14, 2023

kl / the-daily-stallman

Star

Read the news like Stallman would. No JavaScript required.

stallman rms article-extracting richard-stallman article-extractor

Updated May 8, 2023
HTML

absingh31 / MercuryAPI_Client

Star

Python wrapper for Mercury API and get the JSON and html output, using your key. Using which anyone can denoise a online article and view the same without any adds or external links or content.

html api json json-serialization api-client python3 python-wrapper api-wrapper mercury article-extracting html-output mercury-api mercury-parser mercury-client mercuryapi-client

Updated Jan 9, 2018
HTML

EmailThis / readability

Star

Readability is Elixir library for extracting and curating articles.

elixir readability article-extracting

Updated Feb 18, 2017
Elixir

KashmereLabs / permalink_web_archiver

Star

Allows any article on the web to be parsed into a readable format and archived into the permanent web

storage dapp blockchain summarization article-extracting arweave-permaweb

Updated Dec 10, 2022
JavaScript

0x01h / yozdil-article-scraper-generator

Star

Scrape Yılmaz Özdil articles and create Markov model to generate newspaper articles like Yılmaz Özdil. Turkish text dataset creator for data science and NLP projects.

markov-model scraper markov-chain markov article-extracting article-extractor yilmaz-ozdil

Updated Jan 23, 2019
Python

absingh31 / Article_Smart

Star

A python project (with nlp integration) to denoise any news article and strip off any images, advertisement from it giving a basic and hassle free article. It provides a 'smart view' for web-view in mobile devices with heading, keywords and text. Powered with newspaper3k.