Delphi DOM HTML parser and converter

initialization

Today we see more and more email messages formatted as HTML. For me the email is plain text medium (with attachments) and I don't use WebBrowser or Mozilla object as a message browser in my email client. So I'm write quick and dirty HTML parser instead. It seems enough to parse email messages but some work is needed to shift them to more general purpose HTML parser.

To-Do list:

interface

TDocument, TNode, TElement, TAttr etc. implements DOM2 Core.

THtmlParser produces TDocument from HTML string.

  function parseString(const HtmlStr: TDomString): TDocument;

THtmlParser uses THtmlReader a event driven SAX-like interface.

To convert DOM tree to plain text or HTML use TTextFormatter or THtmlFormatter respectively.

implementation

Parser is implemented as several modules:

finalization

You can download project modules as zip archive. This is small project so only the latest (always stable ;-) version is available.


SourceForge.net Logo