Class StreamScanner
java.lang.Object
com.fasterxml.aalto.in.XmlScanner
com.fasterxml.aalto.in.ByteBasedScanner
com.fasterxml.aalto.in.StreamScanner
- All Implemented Interfaces:
XmlConsts, NamespaceContext, XMLStreamConstants
- Direct Known Subclasses:
Utf8Scanner
Base class for various byte stream based scanners (generally one
for each type of encoding supported).
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final XmlCharTypesThis is a simple container object that is used to access the decoding tables for characters.protected InputStreamUnderlying InputStream to use for reading content.protected byte[]protected int[]This buffer is used for name parsing.protected final ByteBasedPNameTableFor now, symbol table contains prefixed names.Fields inherited from class ByteBasedScanner
_inputEnd, _inputPtr, _tmpChar, BYTE_a, BYTE_A, BYTE_AMP, BYTE_APOS, BYTE_C, BYTE_CR, BYTE_D, BYTE_EQ, BYTE_EXCL, BYTE_g, BYTE_GT, BYTE_HASH, BYTE_HYPHEN, BYTE_l, BYTE_LBRACKET, BYTE_LF, BYTE_LT, BYTE_m, BYTE_NULL, BYTE_o, BYTE_p, BYTE_P, BYTE_q, BYTE_QMARK, BYTE_QUOT, BYTE_RBRACKET, BYTE_s, BYTE_S, BYTE_SEMICOLON, BYTE_SLASH, BYTE_SPACE, BYTE_t, BYTE_T, BYTE_TAB, BYTE_u, BYTE_xFields inherited from class XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _pastBytesOrChars, _publicId, _rowStartOffset, _startColumn, _startRawOffset, _startRow, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOIFields inherited from interface XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWNFields inherited from interface XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT -
Constructor Summary
ConstructorsConstructorDescriptionStreamScanner(ReaderConfig cfg, InputStream in, byte[] buffer, int ptr, int last) -
Method Summary
Modifier and TypeMethodDescriptionprotected voidprotected intHelper method used to isolate things that need to be (re)set in cases whereprotected voidprotected final PNameaddPName(int hash, int[] quads, int qlen, int lastQuadBytes) protected final intcheckInTreeIndentation(int c) Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end).protected final intcheckPrologIndentation(int c) protected final intprotected final intNote that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1.protected abstract inthandleEntityInText(boolean inAttr) protected abstract inthandleStartElement(byte b) Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.protected final booleanloadAndRetain(int nrOfChars) protected final booleanloadMore()protected final byteloadOne()protected final byteloadOne(int type) protected final bytenextByte()protected final bytenextByte(int tt) final intnextFromProlog(boolean isProlog) final intprotected final PNameparsePName(byte b) This method can (for now?)protected final PNameparsePNameLong(int q, int[] quads) protected PNameparsePNameMedium(int i2, int q1) protected final PNameparsePNameSlow(byte b) protected abstract StringparsePublicId(byte quoteChar) protected abstract StringparseSystemId(byte quoteChar) protected byteskipInternalWs(boolean reqd, String msg) Methods inherited from class ByteBasedScanner
addUTFPName, decodeCharForError, getCurrentColumnNr, getCurrentLocation, getEndingByteOffset, getEndingCharOffset, getStartingByteOffset, getStartingCharOffset, markLF, markLF, reportInvalidInitial, reportInvalidOther, setStartLocationMethods inherited from class XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, finishCData, finishCharacters, finishComment, finishDTD, finishPI, finishSpace, finishToken, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getCurrentLineNr, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, handleInvalidXmlChar, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportPrologUnexpElement, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipCData, skipCharacters, skipCoalescedText, skipComment, skipPI, skipSpace, skipToken, throwInvalidSpace, throwNullChar, throwUnexpectedChar, verifyXmlChar
-
Field Details
-
_in
Underlying InputStream to use for reading content. -
_inputBuffer
protected byte[] _inputBuffer -
_charTypes
This is a simple container object that is used to access the decoding tables for characters. Indirection is needed since we actually support multiple utf-8 compatible encodings, not just utf-8 itself. -
_symbols
For now, symbol table contains prefixed names. In future it is possible that they may be split into prefixes and local names? -
_quadBuffer
protected int[] _quadBufferThis buffer is used for name parsing. Will be expanded if/as needed; 32 ints can hold names 128 ascii chars long.
-
-
Constructor Details
-
StreamScanner
-
-
Method Details
-
_releaseBuffers
protected void _releaseBuffers()- Overrides:
_releaseBuffersin classXmlScanner
-
_closeSource
- Specified by:
_closeSourcein classByteBasedScanner- Throws:
IOException
-
handleEntityInText
- Throws:
XMLStreamException
-
parsePublicId
- Throws:
XMLStreamException
-
parseSystemId
- Throws:
XMLStreamException
-
nextFromProlog
- Specified by:
nextFromPrologin classXmlScanner- Throws:
XMLStreamException
-
nextFromTree
- Specified by:
nextFromTreein classXmlScanner- Throws:
XMLStreamException
-
_nextEntity
protected int _nextEntity()Helper method used to isolate things that need to be (re)set in cases where -
handleCharEntity
- Returns:
- Code point for the entity that expands to a valid XML content character.
- Throws:
XMLStreamException
-
handleStartElement
Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.- Throws:
XMLStreamException
-
handleEndElement
Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1. The reason is that since we already know exact bytes that need to be matched, there's no danger of getting invalid encodings or such. So, for now, let's leave this method here in the base class.- Throws:
XMLStreamException
-
parsePName
This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.Some notes about assumption implementation makes:
- Well-formed xml content can not end with a name: as such, end-of-input is an error and we can throw an exception
- Throws:
XMLStreamException
-
parsePNameMedium
- Throws:
XMLStreamException
-
parsePNameLong
- Throws:
XMLStreamException
-
parsePNameSlow
- Throws:
XMLStreamException
-
addPName
protected final PName addPName(int hash, int[] quads, int qlen, int lastQuadBytes) throws XMLStreamException - Throws:
XMLStreamException
-
skipInternalWs
- Returns:
- First byte following skipped white space
- Throws:
XMLStreamException
-
checkInTreeIndentation
Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end). This so that if a CDATA section follows, it can be coalesced in coalescing mode. Although we could check if coalescing mode is enabled, this should seldom have significant effect either way, so it removes one possible source of problems in coalescing mode.
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
checkPrologIndentation
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
loadMore
- Specified by:
loadMorein classXmlScanner- Throws:
XMLStreamException
-
nextByte
- Throws:
XMLStreamException
-
nextByte
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadAndRetain
- Throws:
XMLStreamException
-