Adding support for your own RSS feeds to web2lrf

Basic Example

To add support for your own RSS Feeds you will need to create a small python script. For example, suppose you want to convert the feed http://wonderfulwebsite.or/rss/feed1.xml to an ebook. Create the file wonderfulwebsite.py

from libprs500.ebooks.lrf.web.profiles import DefaultProfile

class WonderfulWebsite(DefaultProfile):

    title = 'My Wonderful Webiste Feed'
    max_recursions = 2

    def get_feeds(self):
        return [ ('Feed 1', 'http://wonderfulwebsite.or/rss/feed1.xml') ]

Now run web2lrf as

web2lrf --verbose --user-profile wonderfulwebsite.py

Customization

Most websites typically have a lot of cruft that makes a direct conversion into an ebook problematic. While the above example should work for a very simple feed and website, you will need to customize the process for real world feeds. Fortunately, the user profile framework is very flexible. See the detailed HOWTO on creating your own User Profiles at the bottom of the page.

Print version

The first thing you'd probably want to do is use the print version of the articles. That can be accomplished by adding a method to WonderfulWebsite as shown below

class WonderfulWebsite(DefaultProfile):

    ...
  
    def print_version(self, url):
        return url + '/print_version'

The method print_version will be called with the URL for every article and should return the modified URL that points to the print version of that article.

Preprocessing Article HTML

You can preprocess the downloaded HTML to remove, banners, ads, unwanted HTML and Graphics, change styles etc. This is done by adding an member to WonderfulWebsite as shown

import re

class WonderfulWebsite(DefaultProfile):
    ...
    preprocess_regexps = [
       (re.compile(r'<div class="banner">.*?</div>', re.IGNORECASE | re.DOTALL), 
        lambda match : ''),
    ]

This example removes banners (<div> elements of class "banner") from the HTML before converting.

More Information

The above examples are just the tip of the iceberg when it comes to the capabilities of web2lrf. The best way to learn how to write user profiles is to look at the built-in profiles that web2lrf already has. A more detailed HOWTO to guide you in doing that is available here.

  • [source:trunk/src/libprs500/ebooks/lrf/web/profiles/newsweek.py newsweek]
  • [source:trunk/src/libprs500/ebooks/lrf/web/profiles/bbc.py bbc]
  • [source:trunk/src/libprs500/ebooks/lrf/web/profiles/nytimes.py nytimes]

Finally, you should look at the definition of [source:trunk/src/libprs500/ebooks/lrf/web/profiles/__init__.py DefaultProfile]

User provided profiles

Attachments