Thursday, October 25, 2007

html5lib 0.10 release

A ruby/python based HTML parser/tokenizer based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.

0.10 Release Features

  • Parses valid and invalid HTML documents to a tree
  • Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup and custom simpletree output formats
  • DOM to SAX converter
  • Reports parse errors
  • Character encoding detection
  • XML mode for working with illformed XML e.g. feeds
  • Filtering and serializing of trees
  • HTML+CSS sanitizer
  • Many unit tests


Known Issues (0.10)

  • Python 2.3 users will fail several encoding related tests unless they install the cjkcodecs module

html5lib

No comments:

T-Shirts