Class LagartoDOMBuilderTagVisitor
java.lang.Object
jodd.lagarto.dom.LagartoDOMBuilderTagVisitor
- All Implemented Interfaces:
TagVisitor
Lagarto tag visitor that builds a DOM tree.
It (still) does not build the tree fully by the HTML specs,
however, it works good enough for any sane HTML out there.
In the default mode, the tree builder does not change
the order of the elements, so the returned tree reflects
the input. So if the input contains crazy stuff, the tree will
be weird, too :)
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final LagartoDOMBuilderprotected booleanWhile enabled, nodes will be added to the DOM tree.protected HtmlCCommentExpressionMatcherprotected HtmlVoidRulesprotected final HtmlImplicitClosingRulesprivate static final org.slf4j.Loggerprotected Nodeprotected Document -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcdata(CharSequence cdata) Invoked on CDATA sequence.voidcomment(CharSequence comment) Invoked on comment.voidcondComment(CharSequence expression, boolean isStartingTag, boolean isHidden, boolean isHiddenEndTag) Invoked on IE conditional comment.protected ElementcreateElementNode(Tag tag) Creates new element with correct configuration.voidInvoked on DOCTYPE directive.voidend()Finishes the tree building.voidActually collects and logs the errors messages.protected booleanReturnstrueif error logging or collecting is enabled.protected NodefindMatchingParentOpenTag(String tagName) Finds matching parent open tag ornullif not found.protected voidfixUnclosedTagsUpToMatchingParent(Tag tag, Node matchingParent) Fixes all unclosed tags up to matching parent.Returns rootdocumentnode of parsed DOM tree.protected voidremoveLastChildNodeIfEmptyText(Node parentNode, boolean closedTag) Removes last child node if contains just empty text.voidscript(Tag tag, CharSequence body) Invoked on script tag.voidstart()Starts with DOM building.voidVisits tags.voidtext(CharSequence text) Invoked on text i.e.voidxml(CharSequence version, CharSequence encoding, CharSequence standalone) Invoked on xml declaration.
-
Field Details
-
log
private static final org.slf4j.Logger log -
domBuilder
-
implRules
-
htmlVoidRules
-
rootNode
-
parentNode
-
enabled
protected boolean enabledWhile enabled, nodes will be added to the DOM tree. Useful for skipping some tags. -
htmlCCommentExpressionMatcher
-
-
Constructor Details
-
LagartoDOMBuilderTagVisitor
-
-
Method Details
-
getDocument
-
start
public void start()Starts with DOM building. Creates rootDocumentnode.- Specified by:
startin interfaceTagVisitor
-
end
public void end()Finishes the tree building. Closes unclosed tags.- Specified by:
endin interfaceTagVisitor
-
createElementNode
-
tag
-
removeLastChildNodeIfEmptyText
Removes last child node if contains just empty text. -
findMatchingParentOpenTag
-
fixUnclosedTagsUpToMatchingParent
Fixes all unclosed tags up to matching parent. Missing end tags will be added just before parent tag is closed, making the whole inner content as its tag body.Tags that can be closed implicitly are checked and closed.
There is optional check for detecting orphan tags inside the table or lists. If set, tags can be closed beyond the border of the table and the list and it is reported as orphan tag.
This is just a generic solutions, closest to the rules.
-
script
Description copied from interface:TagVisitorInvoked on script tag.- Specified by:
scriptin interfaceTagVisitor
-
comment
Description copied from interface:TagVisitorInvoked on comment.- Specified by:
commentin interfaceTagVisitor
-
text
Description copied from interface:TagVisitorInvoked on text i.e. anything other than a tag.- Specified by:
textin interfaceTagVisitor
-
cdata
Description copied from interface:TagVisitorInvoked on CDATA sequence.- Specified by:
cdatain interfaceTagVisitor
-
xml
Description copied from interface:TagVisitorInvoked on xml declaration.- Specified by:
xmlin interfaceTagVisitor
-
doctype
Description copied from interface:TagVisitorInvoked on DOCTYPE directive.- Specified by:
doctypein interfaceTagVisitor
-
condComment
public void condComment(CharSequence expression, boolean isStartingTag, boolean isHidden, boolean isHiddenEndTag) Description copied from interface:TagVisitorInvoked on IE conditional comment. By default, the parser does not process the conditional comments, so you need to turn them on. Once conditional comments are enabled, this even will be fired.The following conditional comments are recognized:
<!--[if IE 6]>one<![endif]--> <!--[if IE 6]><!-->two<!---<![endif]--> <!--[if IE 6]>three<!--xx<![endif]--> <![if IE 6]>four<![endif]>- Specified by:
condCommentin interfaceTagVisitor
-
errorEnabled
protected boolean errorEnabled()Returnstrueif error logging or collecting is enabled. -
error
Actually collects and logs the errors messages.- Specified by:
errorin interfaceTagVisitor- Parameters:
message- parsing error message
-