Package org.htmlunit.cyberneko
Class HTMLScanner.ContentScanner
java.lang.Object
org.htmlunit.cyberneko.HTMLScanner.ContentScanner
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
HTMLScanner
The primary HTML document scanner.
- Author:
- Andy Clark
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected StringnextContent(int len) Reads the next characters WITHOUT impacting the buffer content up to current offset.booleanscan(boolean complete) Scan.protected booleanscanAttribute(XMLAttributesImpl attributes, boolean[] empty) Scans a real attribute.protected voidscanAttributeQuotedValue(int currentQuote, org.htmlunit.cyberneko.HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) protected voidscanAttributeUnquotedValue(org.htmlunit.cyberneko.HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) protected voidprotected booleanscanCDataContent(XMLString xmlString) protected voidprotected voidprotected booleanscanCommentContent(XMLString buffer) protected voidprotected voidscanPI()protected StringscanStartElement(boolean[] empty) Scans a start element.
-
Constructor Details
-
ContentScanner
public ContentScanner()
-
-
Method Details
-
scan
Scan.- Specified by:
scanin interfaceHTMLScanner.Scanner- Parameters:
complete- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
IOException- Thrown if I/O error occurs.
-
nextContent
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
IOException- in case of io problems
-
scanCharacters
- Throws:
IOException
-
scanCDATA
- Throws:
IOException
-
scanComment
- Throws:
IOException
-
scanCommentContent
- Throws:
IOException
-
scanCDataContent
- Throws:
IOException
-
scanPI
- Throws:
IOException
-
scanStartElement
Scans a start element.- Parameters:
empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- ename
- Throws:
IOException- in case of io problems
-
scanAttribute
Scans a real attribute.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- success
- Throws:
IOException- in case of io problems
-
scanAttributeUnquotedValue
protected void scanAttributeUnquotedValue(org.htmlunit.cyberneko.HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws IOException - Throws:
IOException
-
scanAttributeQuotedValue
protected void scanAttributeQuotedValue(int currentQuote, org.htmlunit.cyberneko.HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws IOException - Throws:
IOException
-
scanEndElement
- Throws:
IOException
-