Newspaper delivers Instapaper style article extraction

Newspaper lets anyone do article extraction like Instapaper and Pocket.

Newspaper is a Python 2 library for extracting & curating articles from the web. It wants to change the way people handle article extraction with a new, more precise layer of abstraction.

Besides "read later" services, there's a growing number of APIs that provide article extraction as a service like diffbot and Those services are great, but it’s nice that newspaper is open source and hackable.

For instance, when I first checked out newspaper it only had plain text article extraction. Sometimes, though, I want the original markup of the article with some sanitization. It helps to have the paragraphs, links, and headers accurately represent the article. So, I forked the project, made some changes, and the maintainer codelucas was reactive and worked with me to get my changes merged in.

If you want a place to start working on article extraction Newspaper looks like a good bet.

News Films

Our little film studio focuses on telling developer-centric stories that need to be seen.

Beyond Code: Season 3 / GopherCon 2015

0:00 / 0:00