Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
-
Updated
Jun 9, 2024
Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla
Parse markdown article, download images and replace images URL's with local paths
Extract article or news by url or html, parse the title and content, output in markdown format.
Extracts article content from a web page.
📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this program will give a neatly modified Word Document in '.docx' format with the contents of the article.
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
Read the news like Stallman would. No JavaScript required.
Python wrapper for Mercury API and get the JSON and html output, using your key. Using which anyone can denoise a online article and view the same without any adds or external links or content.
Readability is Elixir library for extracting and curating articles.
Allows any article on the web to be parsed into a readable format and archived into the permanent web
Scrape Yılmaz Özdil articles and create Markov model to generate newspaper articles like Yılmaz Özdil. Turkish text dataset creator for data science and NLP projects.
A python project (with nlp integration) to denoise any news article and strip off any images, advertisement from it giving a basic and hassle free article. It provides a 'smart view' for web-view in mobile devices with heading, keywords and text. Powered with newspaper3k.
An attempt to create summary from a news article using object oriented Python Programming approach
A web page content extractor
Extracts the article content from web pages. Runs as a standalone Rest service.
Python Newspaper api
A web app that returns Wikipedia data based on a given search query.
A New Way to Visualize the Markets (Created in 24 hours @ CalHacks)
Add a description, image, and links to the article-extracting topic page so that developers can more easily learn about it.
To associate your repository with the article-extracting topic, visit your repo's landing page and select "manage topics."