Class LagartoDOMBuilderTagVisitor

java.lang.Object
jodd.lagarto.dom.LagartoDOMBuilderTagVisitor
All Implemented Interfaces:
TagVisitor

public class LagartoDOMBuilderTagVisitor extends Object implements TagVisitor
Lagarto tag visitor that builds a DOM tree. It (still) does not build the tree fully by the HTML specs, however, it works good enough for any sane HTML out there. In the default mode, the tree builder does not change the order of the elements, so the returned tree reflects the input. So if the input contains crazy stuff, the tree will be weird, too :)
  • Field Details

    • log

      private static final org.slf4j.Logger log
    • domBuilder

      protected final LagartoDOMBuilder domBuilder
    • implRules

      protected final HtmlImplicitClosingRules implRules
    • htmlVoidRules

      protected HtmlVoidRules htmlVoidRules
    • rootNode

      protected Document rootNode
    • parentNode

      protected Node parentNode
    • enabled

      protected boolean enabled
      While enabled, nodes will be added to the DOM tree. Useful for skipping some tags.
    • htmlCCommentExpressionMatcher

      protected HtmlCCommentExpressionMatcher htmlCCommentExpressionMatcher
  • Constructor Details

    • LagartoDOMBuilderTagVisitor

      public LagartoDOMBuilderTagVisitor(LagartoDOMBuilder domBuilder)
  • Method Details

    • getDocument

      public Document getDocument()
      Returns root document node of parsed DOM tree.
    • start

      public void start()
      Starts with DOM building. Creates root Document node.
      Specified by:
      start in interface TagVisitor
    • end

      public void end()
      Finishes the tree building. Closes unclosed tags.
      Specified by:
      end in interface TagVisitor
    • createElementNode

      protected Element createElementNode(Tag tag)
      Creates new element with correct configuration.
    • tag

      public void tag(Tag tag)
      Visits tags.
      Specified by:
      tag in interface TagVisitor
    • removeLastChildNodeIfEmptyText

      protected void removeLastChildNodeIfEmptyText(Node parentNode, boolean closedTag)
      Removes last child node if contains just empty text.
    • findMatchingParentOpenTag

      protected Node findMatchingParentOpenTag(String tagName)
      Finds matching parent open tag or null if not found.
    • fixUnclosedTagsUpToMatchingParent

      protected void fixUnclosedTagsUpToMatchingParent(Tag tag, Node matchingParent)
      Fixes all unclosed tags up to matching parent. Missing end tags will be added just before parent tag is closed, making the whole inner content as its tag body.

      Tags that can be closed implicitly are checked and closed.

      There is optional check for detecting orphan tags inside the table or lists. If set, tags can be closed beyond the border of the table and the list and it is reported as orphan tag.

      This is just a generic solutions, closest to the rules.

    • script

      public void script(Tag tag, CharSequence body)
      Description copied from interface: TagVisitor
      Invoked on script tag.
      Specified by:
      script in interface TagVisitor
    • comment

      public void comment(CharSequence comment)
      Description copied from interface: TagVisitor
      Invoked on comment.
      Specified by:
      comment in interface TagVisitor
    • text

      public void text(CharSequence text)
      Description copied from interface: TagVisitor
      Invoked on text i.e. anything other than a tag.
      Specified by:
      text in interface TagVisitor
    • cdata

      public void cdata(CharSequence cdata)
      Description copied from interface: TagVisitor
      Invoked on CDATA sequence.
      Specified by:
      cdata in interface TagVisitor
    • xml

      public void xml(CharSequence version, CharSequence encoding, CharSequence standalone)
      Description copied from interface: TagVisitor
      Invoked on xml declaration.
      Specified by:
      xml in interface TagVisitor
    • doctype

      public void doctype(Doctype doctype)
      Description copied from interface: TagVisitor
      Invoked on DOCTYPE directive.
      Specified by:
      doctype in interface TagVisitor
    • condComment

      public void condComment(CharSequence expression, boolean isStartingTag, boolean isHidden, boolean isHiddenEndTag)
      Description copied from interface: TagVisitor
      Invoked on IE conditional comment. By default, the parser does not process the conditional comments, so you need to turn them on. Once conditional comments are enabled, this even will be fired.

      The following conditional comments are recognized: <!--[if IE 6]>one<![endif]--> <!--[if IE 6]><!-->two<!---<![endif]--> <!--[if IE 6]>three<!--xx<![endif]--> <![if IE 6]>four<![endif]>

      Specified by:
      condComment in interface TagVisitor
    • errorEnabled

      protected boolean errorEnabled()
      Returns true if error logging or collecting is enabled.
    • error

      public void error(String message)
      Actually collects and logs the errors messages.
      Specified by:
      error in interface TagVisitor
      Parameters:
      message - parsing error message