HTML -> EPUB Converter
This wiki page is for discussing the design of the html -> epub converter. The main design goals:
Design Goals
- Recursive: Must follow all local links in the HTML file to a user controllable depth
- Rationalize font sizes: The user should be able to rescale font sizes by specifying a new base font size.
- Auto detect chapters: Detection should be arbitrarily powerful (based on !XPath). Actions based on detection should include inserting page breaks (or ruled lines) and automatically adding to the TOC.
- Should be capable of serving as the backend for epub2epub
- Should convert pixel based lengths to pts using a user specified DPI
- Should have convenience methods for setting margins and line spacing rather than forcing the user to use --override-css
- Should be written in an as epub independent way as possible so that most of the code can be re-used for html2mobi or other formats. Basically, the HTML traversal code should be output format neutral.
- Should have an option to split up an HTML file into individual chapter files (duplicating the nesting structure) to handle the limitations of DE.
Discussion
Feel free to add your suggestion to this area. If I like them, I'll move them up to Design Goals.
- Accept a list of files and package them with a TOC, regardless of any links between them. (This is implied by the requirement of being able to server as the backed for epub2epub (since epub is basically a list of files with a TOC).
- Interesting use-case: http://www.huxley.net/bnw/index.html
The page contains the first chapter, with links to pages with the other chapters. There is no pattern to the chapter URL's. I can either let the tool crawl the page and figure things out, or download the chapters myself and order them for the tool. (By default it will add all links on the first HTML file it crawls to the TOC, so this case should be handled pretty seamlessly. You can also manually control what goes into the TOC by creating a OPF file.)
