Infinity: Removing boilerplate from webpages using python

Monday, June 10, 2019

Removing boilerplate from webpages using python

I am an avid Firefox web browser user and more often that not, when I am visiting a webpage containing an article, I end up clicking on the "Reader Mode" button in the address bar so that I can remove all the useless noise and just focus on the main content that the page has to offer.

A news article, like this one shown below

becomes like this after entering the reader mode.

As you can see, all the ads, boilerplate etc is gone and now I can focus on the actual content without straining my eyes to find the stuff that I visited the webpage for.

It turns out that one could easily write a python program ( python being my language of choice for such quick experiments ) to do the same thing.

We will use the python readability library to achieve this thing in our python program.

Here is the program. Seems to work for me.

Infinity

menu

Monday, June 10, 2019

Removing boilerplate from webpages using python

No comments:

Post a Comment

Popular Posts

Blog Archive