Unit tidyobj

DescriptionusesClasses, Interfaces, Objects and RecordsFunctions and ProceduresTypesConstantsVariables

Description

Delphi/Kylix/FreePascal wrapper for TidyLib

The tidyobj unit provides a high-level, object-oriented wrapper around HTML-Tidy's TidyLib library.

The central component of this unit is the TTidy object, but the unit also imports the complete TidyLib API bindings, and provides additional features for traversing and querying a TidyDoc document tree.

Note to Lazarus users:
You should use the laztidy unit for Lazarus projects, it offers identical functionality to tidyobj, but provides some special $DEFINE's and intitialization needed for the Lazarus component.

To avoid duplication of effort, the core TidyLib bindings are not documented here, consult the TidyLib API documentation from the HTML-Tidy website for this information.

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TTidy The main component of the TidyPas wrapper.
Class tNodeInfo Dynamic structure used to pass information to a ForEachNode callback.
Class tTidyBus Class to facilitate minor editing operations on a TidyDoc.

Functions and Procedures

procedure ForEachAttr(const info:tNodeInfo;callback:tAttrCallback;data:pointer);
procedure ForEachNode(doc:pTidyDoc;start:pTidyNode;callback:tNodeCallback;data:pointer);
procedure ForEachTag(doc:pTidyDoc;start:pTidyNode;tag:TidyTagID;cb:tNodeCallback;data:pointer);
procedure ForEachTagInTags(doc:pTidyDoc;start:pTidyNode;tags:tTidyTagSet;cb:tNodeCallback;data:pointer);
function TagNameToTagID(name:pChar):TidyTagID;
function AttrNameToAttrID(name:pChar):TidyAttrID;
function FindAncestor(anode:pTidyNode;tag:TidyTagID;attr:TidyAttrID):pTidyNode;
procedure tidyBufLineEnd(buf:pTidyBuffer;eol:TidyLineEnding);
procedure EntsToChars(src:pChar); overload;
procedure EntsToChars(var src:ansistring); overload;
function RowColToOffset(src:pChar;row,col:Cardinal;LineBreak:pChar):pChar;
function FindTag(haystack:pChar;needle:pChar):pChar;
procedure ForEachTagStr(src:pChar;cb:tForEachTagStrCallback;user_data:pointer);
function FormatTidyReportMessage(level:TidyReportLevel;line:uint;col:uint;msg:ctmbstr):ansistring;

Types

tTidyTagSet=set of TidyTagID;
tEndTagRule = (...);
tAttrCallback = procedure(const info:tNodeInfo;name:pChar;value:pChar;user_data:pointer);
tNodeCallback = procedure(const info:tNodeInfo;user_data:pointer);
tTagNameStr=array[0..17] of char;
tTidyBusInsertMode = (...);
tTidyBusCopyMode = (...);
TidyEncodingID = (...);
tTidyInputCompleteEvent = procedure(sender:TObject;buf:Pchar;len:Cardinal)of object;
tTidyInputCompleteCallback = procedure(sender:TObject;buf:pTidyBuffer);
tTidyReportEvent = procedure(sender:TObject;level:TidyReportLevel;line, col:Cardinal;msg:String)of object;
tForEachTagStrCallback = procedure(tag:pChar;var proceed:boolean;user_data:pointer);

Constants

ElementsWithoutEndTags: set of TidyTagID = [...];
ElementsWithOptionalEndTags: set of TidyTagID = [...];
TypesWithoutEndTags: set of TidyNodeType = [...];
BlockLevelTags: set of TidyTagID = [...];
TAG_BUF_SIZE=256;
TagNames:array[TidyTag_A..TidyTag_XMP] of tTagNameStr=();
AttrNames:array[TidyAttr_ABBR..TidyAttr_URN] of tTagNameStr=();
TIDY_NULL_FILE='/dev/null';

Description

Functions and Procedures

procedure ForEachAttr(const info:tNodeInfo;callback:tAttrCallback;data:pointer);

ForEachAttr() will call your tAttrCallback once for each attribute of the node specified in the given tNodeInfo.

The data parameter is an arbitrary pointer that will be passed to the callback as the user_data argument.

procedure ForEachNode(doc:pTidyDoc;start:pTidyNode;callback:tNodeCallback;data:pointer);

ForEachNode() will iterate through the start node and any proceeding siblings.

Your tNodeCallback will be called once for each node at the same level as start, setting the fields of the tNodeInfo object to reflect the current TidyNode.

To recurse into the child nodes of a node, you must call ForEachNode() again from within your callback, e.g.

  procedure MyNodeCallback(const info:tNodeInfo; user_data:pointer);
  begin
    ForEachNode(info.doc, info.child, @MyNodeCallback, user_data);
  end; 
 

Setting info.Proceed to False from within your callback will cause ForEachNode to cancel further iteration.

procedure ForEachTag(doc:pTidyDoc;start:pTidyNode;tag:TidyTagID;cb:tNodeCallback;data:pointer);

ForEachTag() will traverse ALL elements of the document (including children), starting at the specified pTidyNode.

Your callback will be called once for each node that matches the specified TAG.

If the TAG argument is set to TidyTag_UNKNOWN, all tags will match.

procedure ForEachTagInTags(doc:pTidyDoc;start:pTidyNode;tags:tTidyTagSet;cb:tNodeCallback;data:pointer);

ForEachTagInTags() is basically the same as the ForEachTag() procedure, except that the callback is called for any node whose tag is a member of the tags argument.

function TagNameToTagID(name:pChar):TidyTagID;

Convert a tag name to a TidyTagID, returns TidyTag_UNKNOWN if the string is not a default HTML-4 tag name. Note that the match is NOT case-sensitive.

function AttrNameToAttrID(name:pChar):TidyAttrID;

Convert an attribute name to a TidyAttrID, returns TidyTag_UNKNOWN if the string is not a default HTML-4 attribute name. Note that the match is NOT case-sensitive.

function FindAncestor(anode:pTidyNode;tag:TidyTagID;attr:TidyAttrID):pTidyNode;

FindAncestor() searches the parental hierarchy of the specified node for an element of the type specified in the TAG argument that carries the attribute specified by the ATTR argument. To ignore the either the TAG or ATTR criteria, pass TidyTag_UKNOWN or TidyAttr_UKNOWN, respectively. Returns NIL if it can't find a higher level container with the matching criteria.

This might be useful for example if you want to know if an image is "hot" :

      if tidyNodeIsIMG(some_node)
      and ( FindAncestor(some_node, TidyTag_A, TidyAttr_HREF) <> nil )
      then ... 

Or perhaps if you want to know if a node contains preformatted text:

      if tidyNodeHasText(some_doc, some_node)
      and ( FindAncestor(some_node, TidyTag_PRE, TidyAttr_UNKNOWN) <> nil )
      then ... 

Or maybe you want to know if an element inherits any STYLE attributes:

      if ( FindAncestor(some_node, TidyTag_UNKNOWN, TidyAttr_STYLE) <> nil )
      then ... 

 

procedure tidyBufLineEnd(buf:pTidyBuffer;eol:TidyLineEnding);

Appends the specified EOL to the buffer, specify one of: TidyLF for unix, TidyCRLF for ms-dos/windows, or TidyCR for macintosh.

procedure EntsToChars(src:pChar); overload;

EntsToChars() tries to convert most HTML entities to their respective ASCII characters.

This function modifies the input string in-place, it does not re-allocate anything.

procedure EntsToChars(var src:ansistring); overload;

Same as the EntsToChars(src:pChar) procedure, overloaded to use AnsiString instead of pChar.

function RowColToOffset(src:pChar;row,col:Cardinal;LineBreak:pChar):pChar;

RowColToOffset() returns a pointer into SRC at the location specified by ROW and COL. The LineBreak argument specifies the type of line breaks to expect in SRC, and should be one of #10, #13#10, or #13 ( for unix, dos, or mac, respectively )

ROW and COL are both one-based, that is, the first chararcter in SRC is (1,1)

Returns NIL if the calculated offset of ROW and COL is out-of-range ( past the end of SRC )

Note that this function is implemented by TidyPas, not by the TidyLib library.

function FindTag(haystack:pChar;needle:pChar):pChar;

FindTag() scans the haystack for the next tag that matches needle (case-insensitive match) The needle must start with < or </ followed by a tag name. The tag name should start with an alpha char, and be null-terminated with no trailing whitespace. Returns a pointer into the haystack where the match is found, or NIL otherwise.

Note that this function is implemented by TidyPas, not by the TidyLib library.

procedure ForEachTagStr(src:pChar;cb:tForEachTagStrCallback;user_data:pointer);

ForEachTagStr will call your tForEachTagStrCallback once for each markup tag it finds in source.

Note that this procedure is implemented by TidyPas, not by the TidyLib library.

function FormatTidyReportMessage(level:TidyReportLevel;line:uint;col:uint;msg:ctmbstr):ansistring;

Convenience function to format the information passed by the TidyReport filter callback ( or TTidy OnReport event) into a single string.

Types

tTidyTagSet=set of TidyTagID;

Set type used to describe an arbitrary group of TidyTagID's

tEndTagRule = (...);

Enumerated type to describe the HTML-4 rules for closing a certain element.

Values
  • etRequired: The element is required to have a closing tag.
  • etOptional: The element may optionally have a closing tag.
  • etForbidden: The element can never have a closing tag.
  • etUndefined: Internal semaphore, denotes that the value has not been initialized.
tAttrCallback = procedure(const info:tNodeInfo;name:pChar;value:pChar;user_data:pointer);

Procedural type used as a callback by the ForEachAttr() procedure

tNodeCallback = procedure(const info:tNodeInfo;user_data:pointer);

Procedural type used as a callback by the ForEachNode() procedure

tTagNameStr=array[0..17] of char;

Internal type, used to define constants for element and attribute names.

tTidyBusInsertMode = (...);

Enumerated type to control where the "Cargo" text of a TTidyBus will be inserted.

Values
  • imInsertBefore: Cargo is inserted before TargetNode
  • imInsertAfter: Cargo is inserted after TargetNode
  • imBeforeFirstChild: Cargo is inserted before first child of TargetNode
  • imAfterLastChild: Cargo is inserted after last child of TargetNode
  • imReplaceNode: TargetNode is replaced by Cargo
  • imReplaceContent: Content (children) of TargetNode replaced by Cargo
tTidyBusCopyMode = (...);

Enumerated type to control how the SourceNode of a TTidyBus will be updated.

Values
  • cmMove: SourceNode is deleted and Cargo is moved to TargetNode.
  • cmCopy: SourceNode is preserved and Cargo is copied to TargetNode.
  • cmEdit: SourceNode is replaced by Cargo. ( TargetNode and InsertMode are ignored. )
  • cmDelete: SourceNode is deleted. ( Cargo, TargetNode, and InsertMode are ignored. )
TidyEncodingID = (...);

Enumerated type used to control the input and output encoding of a TTidy document.

Values
  • TidyRaw:
  • TidyASCII:
  • TidyLatin1:
  • TidyUTF8:
  • TidyISO2022:
  • TidyMacRoman:
  • TidyWin1252:
  • TidyUTF16le:
  • TidyUTF16be:
  • TidyUTF16:
  • TidyBig5:
  • TidyShiftJIS:
tTidyInputCompleteEvent = procedure(sender:TObject;buf:Pchar;len:Cardinal)of object;

Procedural type used by TTidy's OnInputComplete event.

tTidyInputCompleteCallback = procedure(sender:TObject;buf:pTidyBuffer);

Procedural type used by TTidy's InputComplete callback.

tTidyReportEvent = procedure(sender:TObject;level:TidyReportLevel;line, col:Cardinal;msg:String)of object;

Procedural type used by TTidy's OnReport event.

tForEachTagStrCallback = procedure(tag:pChar;var proceed:boolean;user_data:pointer);

Procedural type used as a callback by the ForEachTagStr procedure.

Constants

ElementsWithoutEndTags: set of TidyTagID = [...];

Elements that can never have a closing </tag>

ElementsWithOptionalEndTags: set of TidyTagID = [...];

Elements that can optionally have a closing </tag>

TypesWithoutEndTags: set of TidyNodeType = [...];

TidyNodeType's that can never have a closing </tag>

BlockLevelTags: set of TidyTagID = [...];

TidyTagID's that define "block level" tags. Generally, these are the tags that would cause a web browser to insert an implicit new line before and/or after the element.

TAG_BUF_SIZE=256;

Used by some of the routines to declare static buffers for parsing tag names and attribute names.

If you should ever need to modify the TagNames[] or AttrNames[] constants, be sure you don't add a string that is longer than TAG_BUF_SIZE-1 unless you increase the value of TAG_BUF_SIZE accordingly.

I doubt that anyone will have a need for names longer than 255 chars, but just in case, you have been warned!

TagNames:array[TidyTag_A..TidyTag_XMP] of tTagNameStr=();

Names of standard HTML-4 elements

AttrNames:array[TidyAttr_ABBR..TidyAttr_URN] of tTagNameStr=();

Names of standard HTML-4 attributes

TIDY_NULL_FILE='/dev/null';

Describes the system's "trash can" file, e.g. /dev/null on Linux, or 'NUL' on MS-Windows.

Author

Created

December 9, 2005

Last Modified

December 9, 2005


Generated by PasDoc 0.10.0 on 2005-12-23 21:07:55

Support This Project  
Get CurlPas and TidyPas at SourceForge.net. Fast, secure and Free Open Source software downloads