Class StreamScanner

  • All Implemented Interfaces:
    XmlConsts, javax.xml.namespace.NamespaceContext, javax.xml.stream.XMLStreamConstants
    Direct Known Subclasses:
    Utf8Scanner

    public abstract class StreamScanner
    extends ByteBasedScanner
    Base class for various byte stream based scanners (generally one for each type of encoding supported).
    • Field Detail

      • _in

        protected java.io.InputStream _in
        Underlying InputStream to use for reading content.
      • _inputBuffer

        protected byte[] _inputBuffer
      • _charTypes

        protected final XmlCharTypes _charTypes
        This is a simple container object that is used to access the decoding tables for characters. Indirection is needed since we actually support multiple utf-8 compatible encodings, not just utf-8 itself.
      • _symbols

        protected final ByteBasedPNameTable _symbols
        For now, symbol table contains prefixed names. In future it is possible that they may be split into prefixes and local names?
      • _quadBuffer

        protected int[] _quadBuffer
        This buffer is used for name parsing. Will be expanded if/as needed; 32 ints can hold names 128 ascii chars long.
    • Constructor Detail

      • StreamScanner

        public StreamScanner​(ReaderConfig cfg,
                             java.io.InputStream in,
                             byte[] buffer,
                             int ptr,
                             int last)
    • Method Detail

      • _closeSource

        protected void _closeSource()
                             throws java.io.IOException
        Specified by:
        _closeSource in class ByteBasedScanner
        Throws:
        java.io.IOException
      • handleEntityInText

        protected abstract int handleEntityInText​(boolean inAttr)
                                           throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • parsePublicId

        protected abstract java.lang.String parsePublicId​(byte quoteChar)
                                                   throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • parseSystemId

        protected abstract java.lang.String parseSystemId​(byte quoteChar)
                                                   throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • nextFromProlog

        public final int nextFromProlog​(boolean isProlog)
                                 throws javax.xml.stream.XMLStreamException
        Specified by:
        nextFromProlog in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • nextFromTree

        public final int nextFromTree()
                               throws javax.xml.stream.XMLStreamException
        Specified by:
        nextFromTree in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • _nextEntity

        protected int _nextEntity()
        Helper method used to isolate things that need to be (re)set in cases where
      • handlePrologDeclStart

        private final int handlePrologDeclStart​(boolean isProlog)
                                         throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • handleDtdStart

        private final int handleDtdStart()
                                  throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • handleCommentOrCdataStart

        private final int handleCommentOrCdataStart()
                                             throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • handlePIStart

        private final int handlePIStart()
                                 throws javax.xml.stream.XMLStreamException
        Method called after leading '
        Throws:
        javax.xml.stream.XMLStreamException
      • handleCharEntity

        protected final int handleCharEntity()
                                      throws javax.xml.stream.XMLStreamException
        Returns:
        Code point for the entity that expands to a valid XML content character.
        Throws:
        javax.xml.stream.XMLStreamException
      • handleStartElement

        protected abstract int handleStartElement​(byte b)
                                           throws javax.xml.stream.XMLStreamException
        Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.
        Throws:
        javax.xml.stream.XMLStreamException
      • handleEndElement

        protected final int handleEndElement()
                                      throws javax.xml.stream.XMLStreamException
        Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1. The reason is that since we already know exact bytes that need to be matched, there's no danger of getting invalid encodings or such. So, for now, let's leave this method here in the base class.
        Throws:
        javax.xml.stream.XMLStreamException
      • handleEndElementSlow

        private final int handleEndElementSlow​(int size)
                                        throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • parsePName

        protected final PName parsePName​(byte b)
                                  throws javax.xml.stream.XMLStreamException
        This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.

        Some notes about assumption implementation makes:

        • Well-formed xml content can not end with a name: as such, end-of-input is an error and we can throw an exception
        Throws:
        javax.xml.stream.XMLStreamException
      • parsePNameMedium

        protected PName parsePNameMedium​(int i2,
                                         int q1)
                                  throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • parsePNameLong

        protected final PName parsePNameLong​(int q,
                                             int[] quads)
                                      throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • parsePNameSlow

        protected final PName parsePNameSlow​(byte b)
                                      throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • findPName

        private final PName findPName​(int onlyQuad,
                                      int lastByteCount)
                               throws javax.xml.stream.XMLStreamException
        Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).
        Parameters:
        onlyQuad - Word with 1 to 4 bytes that make up PName
        lastByteCount - Number of actual bytes contained in onlyQuad; 0 to 3.
        Throws:
        javax.xml.stream.XMLStreamException
      • findPName

        private final PName findPName​(int firstQuad,
                                      int secondQuad,
                                      int lastByteCount)
                               throws javax.xml.stream.XMLStreamException
        Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).
        Parameters:
        firstQuad - First 1 to 4 bytes of the PName
        secondQuad - Word with last 1 to 4 bytes of the PName
        lastByteCount - Number of bytes contained in secondQuad; 0 to 3.
        Throws:
        javax.xml.stream.XMLStreamException
      • findPName

        private final PName findPName​(int lastQuad,
                                      int[] quads,
                                      int qlen,
                                      int lastByteCount)
                               throws javax.xml.stream.XMLStreamException
        Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).
        Parameters:
        lastQuad - Word with last 0 to 3 bytes of the PName; not included in the quad array
        quads - Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)
        qlen - Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)
        lastByteCount - Number of bytes contained in lastQuad; 0 to 3.
        Throws:
        javax.xml.stream.XMLStreamException
      • findPName

        private final PName findPName​(int lastQuad,
                                      int lastByteCount,
                                      int firstQuad,
                                      int qlen,
                                      int[] quads)
                               throws javax.xml.stream.XMLStreamException
        Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).
        Parameters:
        lastQuad - Word with last 0 to 3 bytes of the PName; not included in the quad array
        lastByteCount - Number of bytes contained in lastQuad; 0 to 3.
        firstQuad - First 1 to 4 bytes of the PName (4 if length at least 4 bytes; less only if not).
        qlen - Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)
        quads - Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)
        Throws:
        javax.xml.stream.XMLStreamException
      • addPName

        protected final PName addPName​(int hash,
                                       int[] quads,
                                       int qlen,
                                       int lastQuadBytes)
                                throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • skipInternalWs

        protected byte skipInternalWs​(boolean reqd,
                                      java.lang.String msg)
                               throws javax.xml.stream.XMLStreamException
        Returns:
        First byte following skipped white space
        Throws:
        javax.xml.stream.XMLStreamException
      • matchAsciiKeyword

        private final void matchAsciiKeyword​(java.lang.String keyw)
                                      throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • checkInTreeIndentation

        protected final int checkInTreeIndentation​(int c)
                                            throws javax.xml.stream.XMLStreamException

        Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end). This so that if a CDATA section follows, it can be coalesced in coalescing mode. Although we could check if coalescing mode is enabled, this should seldom have significant effect either way, so it removes one possible source of problems in coalescing mode.

        Returns:
        -1, if indentation was handled; offset in the output buffer, if not
        Throws:
        javax.xml.stream.XMLStreamException
      • checkPrologIndentation

        protected final int checkPrologIndentation​(int c)
                                            throws javax.xml.stream.XMLStreamException
        Returns:
        -1, if indentation was handled; offset in the output buffer, if not
        Throws:
        javax.xml.stream.XMLStreamException
      • loadMore

        protected final boolean loadMore()
                                  throws javax.xml.stream.XMLStreamException
        Specified by:
        loadMore in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • nextByte

        protected final byte nextByte​(int tt)
                               throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • nextByte

        protected final byte nextByte()
                               throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • loadOne

        protected final byte loadOne()
                              throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • loadOne

        protected final byte loadOne​(int type)
                              throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • loadAndRetain

        protected final boolean loadAndRetain​(int nrOfChars)
                                       throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException