0.10 Release Features
- Parses valid and invalid HTML documents to a tree
- Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup and custom simpletree output formats
- DOM to SAX converter
- Reports parse errors
- Character encoding detection
- XML mode for working with illformed XML e.g. feeds
- Filtering and serializing of trees
- HTML+CSS sanitizer
- Many unit tests
Known Issues (0.10)
- Python 2.3 users will fail several encoding related tests unless they install the cjkcodecs module
html5lib
No comments:
Post a Comment