Class Parser
- java.lang.Object
-
- eu.maveniverse.domtrip.Parser
-
public class Parser extends java.lang.ObjectA lossless XML parser that preserves all formatting information including whitespace, comments, attribute quote styles, and entity encoding.The Parser class is responsible for converting XML text into DomTrip's internal node tree representation. Unlike traditional XML parsers that normalize content and lose formatting information, this parser meticulously preserves every aspect of the original XML formatting to enable perfect round-trip processing.
Parsing Features:
- Whitespace Preservation - Maintains all whitespace exactly as written
- Automatic Whitespace Normalization - Never creates Text nodes with only whitespace
- Attribute Formatting - Preserves quote styles, order, and spacing
- Comment Preservation - Keeps all XML comments in their original positions
- Entity Preservation - Maintains entity references in their original form
- Processing Instructions - Preserves PIs including XML declarations
- CDATA Sections - Maintains CDATA boundaries and content
Parsing Process:
The parser uses a stack-based approach to build the XML tree:
- Tokenizes the input XML character by character
- Identifies XML constructs (elements, comments, text, etc.)
- Preserves original formatting information for each construct
- Automatically normalizes whitespace-only content to element properties
- Builds a complete node tree with parent-child relationships
- Maintains modification flags for selective formatting preservation
Whitespace Normalization:
The parser automatically normalizes whitespace during parsing to ensure a clean tree structure:
- No Whitespace-Only Text Nodes - Whitespace between elements is captured in element properties
- Mixed Content Preservation - Text nodes with actual content preserve their whitespace
- Lossless Round-Trip - All whitespace is preserved for perfect XML reconstruction
- Element Properties - Whitespace stored in precedingWhitespace, innerPrecedingWhitespace, etc.
Error Handling:
The parser provides detailed error information for malformed XML:
- Precise error positions within the source text
- Descriptive error messages for common XML problems
- Context information to help locate and fix issues
Usage:
Parser parser = new Parser(); try { // Parse from String Document document = parser.parse(xmlString); // Parse from InputStream with encoding detection Document document2 = parser.parse(inputStream); // Parse from InputStream with fallback encoding Document document3 = parser.parse(inputStream, "UTF-8"); // Use the parsed document } catch (DomTripException e) { // Handle parsing errors System.err.println("Parse error at position " + e.position() + ": " + e.getMessage()); }- See Also:
Document,Element,DomTripException,Serializer
-
-
Constructor Summary
Constructors Constructor Description Parser()Creates a new Parser instance with default settings.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Documentparse(java.io.InputStream inputStream)Parses XML from an InputStream with automatic encoding detection.Documentparse(java.io.InputStream inputStream, java.lang.String defaultEncoding)Parses XML from an InputStream with encoding detection and fallback.Documentparse(java.io.InputStream inputStream, java.nio.charset.Charset defaultCharset)Parses XML from an InputStream with encoding detection and fallback.Documentparse(java.lang.String xml)Parses an XML string into a lossless XML document tree.
-
-
-
Constructor Detail
-
Parser
public Parser()
Creates a new Parser instance with default settings.No initialization is needed here because the parser state (xml, position, length) is initialized at the start of each
parse(String)call.
-
-
Method Detail
-
parse
public Document parse(java.io.InputStream inputStream) throws DomTripException
Parses XML from an InputStream with automatic encoding detection.This method automatically detects the character encoding by:
- Checking for a Byte Order Mark (BOM)
- Reading the XML declaration to extract the encoding attribute
- Falling back to UTF-8 if no encoding is specified
The resulting Document will have its encoding property set to the detected or declared encoding.
- Parameters:
inputStream- the InputStream containing XML data- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed, cannot be parsed, or I/O errors occur
-
parse
public Document parse(java.io.InputStream inputStream, java.lang.String defaultEncoding) throws DomTripException
Parses XML from an InputStream with encoding detection and fallback.This method attempts to detect the character encoding by:
- Checking for a Byte Order Mark (BOM)
- Reading the XML declaration to extract the encoding attribute
- Using the provided default encoding if detection fails
The resulting Document will have its encoding property set to the detected, declared, or default encoding.
- Parameters:
inputStream- the InputStream containing XML datadefaultEncoding- the encoding name to use if detection fails- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed, cannot be parsed, or I/O errors occur
-
parse
public Document parse(java.io.InputStream inputStream, java.nio.charset.Charset defaultCharset) throws DomTripException
Parses XML from an InputStream with encoding detection and fallback.This method attempts to detect the character encoding by:
- Checking for a Byte Order Mark (BOM)
- Reading the XML declaration to extract the encoding attribute
- Using the provided default charset if detection fails
The resulting Document will have its encoding property set to the detected, declared, or default encoding.
- Parameters:
inputStream- the InputStream containing XML datadefaultCharset- the charset to use if detection fails- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed, cannot be parsed, or I/O errors occur
-
parse
public Document parse(java.lang.String xml) throws DomTripException
Parses an XML string into a lossless XML document tree.This method performs complete XML parsing while preserving all formatting information including whitespace, comments, attribute styles, and entity encoding. The resulting Document can be used for lossless round-trip editing.
- Parameters:
xml- the XML string to parse- Returns:
- a Document containing the parsed XML with preserved formatting
- Throws:
DomTripException- if the XML is malformed or cannot be parsed
-
-