web2lrf

web2lrf [options] website_profile

web2lrf downloads a site from the web and converts it into a LRF file for use with the SONY Reader. website_profile is one of [‘atlantic’, ‘ap’, ‘barrons’, ‘bbc’, ‘chr_mon’, ‘cnn’, ‘economist’, ‘faznet’, ‘jpost’, ‘jutarnji’, ‘nasa’, ‘newsweek’, ‘newyorker’, ‘newyorkreview’, ‘nytimes’, ‘upi’, ‘usatoday’, ‘portfolio’, ‘reuters’, ‘spiegelde’, ‘wsj’, ‘wash_post’, ‘zeitde’] If you specify a website_profile of default or do not specify it, you must specify the –url option.

Whenever you pass arguments to web2lrf that have spaces in them, enclose the arguments in quotation marks.

[options]

--version

show program’s version number and exit

--help, -h

show this help message and exit

--output, -o

Output file name. Default is derived from input filename

--ignore-tables

Render HTML tables as blocks of text instead of actual tables. This is neccessary if the HTML contains very large or complex tables.

--minimize-memory-usage

Minimize memory usage at the cost of longer processing times. Use this option if you are on a memory constrained machine.

--encoding

Specify the character encoding of the source file. If the output LRF file contains strange characters, try changing this option. A common encoding for files from windows computers is cp-1252. Another common choice is utf-8. The default is to try and guess the encoding.

--url, -u

The URL to download. You only need to specify this if you are not specifying a website_profile.

--user-profile

Path to a python file containing a user created profile. For help visit http://calibre.kovidgoyal.net/wiki/UserProfiles

--username

Specify the username to be used while downloading. Only used if the profile supports it.

--password

Specify the password to be used while downloading. Only used if the profile supports it.

--timeout

Timeout in seconds to wait for a response from the server. Default: 10 s

--max-recursions, -r

Maximum number of levels to recurse i.e. depth of links to follow. Default 10

--max-files, -n

The maximum number of files to download. This only applies to files from <a href> tags. Default is 10

--delay

Minimum interval in seconds between consecutive fetches. Default is 10 s

--dont-download-stylesheets

Do not download CSS stylesheets.

--match-regexp

Only links that match this regular expression will be followed. This option can be specified multiple times, in which case as long as a link matches any one regexp, it will be followed. By default all links are followed.

--filter-regexp

Any link that matches this regular expression will be ignored. This option can be specified multiple times, in which case as long as any regexp matches a link, it will be ignored.By default, no links are ignored. If both –filter-regexp and –match-regexp are specified, then –filter-regexp is applied first.

--keep-downloaded-files

Do not delete the downloaded files after creating the LRF

METADATA OPTIONS

--title, -t

Set the title. Default: filename.

--author, -a

Set the author(s). Multiple authors should be set as a comma separated list. Default: Unknown

--comment

Set the comment.

--category

Set the category

--title-sort

Sort key for the title

--author-sort

Sort key for the author

--publisher

Publisher

--cover

Path to file containing image to be used as cover

--use-metadata-cover

If there is a cover graphic detected in the source file, use that instead of the specified cover.

LOOK AND FEEL

--base-font-size

Specify the base font size in pts. All fonts are rescaled accordingly. This option obsoletes the –font-delta option and takes precedence over it. To use –font-delta, set this to 0. Default: 10.0pt

--enable-autorotation

Enable autorotation of images that are wider than the screen width.

--wordspace

Set the space between words in pts. Default is 2.5

--blank-after-para

Separate paragraphs by blank lines.

--header

Add a header to all the pages with title and author.

--headerformat

Set the format of the header. %a is replaced by the author and %t by the title. Default is %t by %a

--override-css

Override the CSS. Can be either a path to a CSS stylesheet or a string. If it is a string it is interpreted as CSS.

--use-spine

Use the <spine> element from the OPF file to determine the order in which the HTML files are appended to the LRF. The .opf file must be in the same directory as the base HTML file.

--minimum-indent

Minimum paragraph indent (the indent of the first line of a paragraph) in pts. Default: 0

--font-delta

Increase the font size by 2 * FONT_DELTA pts and the line spacing by FONT_DELTA pts. FONT_DELTA can be a fraction.If FONT_DELTA is negative, the font size is decreased.

--ignore-colors

Render all content as black on white instead of the colors specified by the HTML or CSS.

PAGE OPTIONS

--profile, -p

Profile of the target device for which this LRF is being generated. The profile determines things like the resolution and screen size of the target device. Default: prs500 Supported profiles: prs500

--left-margin

Left margin of page. Default is 20 px.

--right-margin

Right margin of page. Default is 20 px.

--top-margin

Top margin of page. Default is 10 px.

--bottom-margin

Bottom margin of page. Default is 0 px.

--render-tables-as-images

Render tables in the HTML as images (useful if the document has large or complex tables)

--text-size-multiplier-for-rendered-tables

Multiply the size of text in rendered tables by this factor. Default is 1.0

CHAPTER OPTIONS

--disable-chapter-detection

Prevent the automatic detection chapters.

--chapter-regex

The regular expression used to detect chapter titles. It is searched for in heading tags (h1-h6). Defaults to chapter|book|appendix

--chapter-attr

Detect a chapter beginning at an element having the specified attribute. The format for this option is tagname regexp,attribute name,attribute value regexp. For example to match all heading tags that have the attribute class=”chapter” you would use “hd,class,chapter”. You can set the attribute to “none” to match only on tag names. So for example, to match all h2 tags, you would use “h2,none,”. Default is $,,$

--page-break-before-tag

If html2lrf does not find any page breaks in the html file and cannot detect chapter headings, it will automatically insert page-breaks before the tags whose names match this regular expression. Defaults to h[12]. You can disable it by setting the regexp to “$”. The purpose of this option is to try to ensure that there are no really long pages as this degrades the page turn performance of the LRF. Thus this option is ignored if the current page has only a few elements.

--force-page-break-before-tag

Force a page break before tags whose names match this regular expression.

--force-page-break-before-attr

Force a page break before an element having the specified attribute. The format for this option is tagname regexp,attribute name,attribute value regexp. For example to match all heading tags that have the attribute class=”chapter” you would use “hd,class,chapter”. Default is $,,$

--add-chapters-to-toc

Add detected chapters to the table of contents.

PREPROCESSING OPTIONS

--baen

Preprocess Baen HTML files to improve generated LRF.

--pdftohtml

You must add this option if processing files generated by pdftohtml, otherwise conversion will fail.

--book-designer

Use this option on html0 files from Book Designer.

FONT FAMILIES

Specify trutype font families for serif, sans-serif and monospace fonts. These fonts will be embedded in the LRF file. Note that custom fonts lead to slower page turns. For example: –serif-family “Times New Roman”

--serif-family

The serif family of fonts to embed

--sans-family

The sans-serif family of fonts to embed

--mono-family

The monospace family of fonts to embed

DEBUG OPTIONS

--verbose

Be verbose while processing

--lrs

Convert to LRS