com.fasterxml.aalto.in
Class Utf8Scanner

java.lang.Object
  extended by com.fasterxml.aalto.in.XmlScanner
      extended by com.fasterxml.aalto.in.ByteBasedScanner
          extended by com.fasterxml.aalto.in.StreamScanner
              extended by com.fasterxml.aalto.in.Utf8Scanner
All Implemented Interfaces:
XmlConsts, NamespaceContext, XMLStreamConstants

public final class Utf8Scanner
extends StreamScanner

Scanner for tokenizing xml content from a byte stream encoding using UTF-8 encoding, or something suitably close it for decoding purposes (including ISO-Latin1 and US-ASCII).


Field Summary
 
Fields inherited from class com.fasterxml.aalto.in.StreamScanner
_in, _inputBuffer, _inputEnd, _inputPtr
 
Fields inherited from class com.fasterxml.aalto.in.ByteBasedScanner
_charTypes, _pastBytes, _quadBuffer, _rowStartOffset, _symbols, _tmpChar, BYTE_a, BYTE_A, BYTE_AMP, BYTE_APOS, BYTE_C, BYTE_CR, BYTE_D, BYTE_EQ, BYTE_EXCL, BYTE_g, BYTE_GT, BYTE_HASH, BYTE_HYPHEN, BYTE_l, BYTE_LBRACKET, BYTE_LF, BYTE_LT, BYTE_m, BYTE_NULL, BYTE_o, BYTE_p, BYTE_P, BYTE_q, BYTE_QMARK, BYTE_QUOT, BYTE_RBRACKET, BYTE_s, BYTE_S, BYTE_SEMICOLON, BYTE_SLASH, BYTE_SPACE, BYTE_t, BYTE_T, BYTE_TAB, BYTE_u, BYTE_x
 
Fields inherited from class com.fasterxml.aalto.in.XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _publicId, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOI
 
Fields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWN
 
Fields inherited from interface javax.xml.stream.XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT
 
Constructor Summary
Utf8Scanner(ReaderConfig cfg, InputStream in, byte[] buffer, int ptr, int last)
           
 
Method Summary
protected  PName addPName(int hash, int[] quads, int qlen, int lastQuadBytes)
           
 int decodeCharForError(byte b)
          Method called called to decode a full UTF-8 characters, given its first byte.
protected  void finishCData()
           
protected  void finishCharacters()
           
protected  void finishCoalescedCData()
           
protected  void finishCoalescedCharacters()
           
protected  void finishCoalescedText()
          Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been read in text buffer.
protected  void finishComment()
           
protected  void finishDTD(boolean copyContents)
          When this method gets called we know that we have an internal subset, and that the opening '[' has already been read.
protected  void finishPI()
           
protected  void finishSpace()
          Note: this method is only called in cases where it is known that only space chars are legal.
protected  int handleEntityInText(boolean inAttr)
          Method called when an ampersand is encounter in text segment.
protected  int handleStartElement(byte b)
          Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.
protected  String parsePublicId(byte quoteChar)
          Parsing of public ids is bit more complicated than that of system ids, since white space is to be coalesced.
protected  String parseSystemId(byte quoteChar)
           
protected  void reportInvalidInitial(int mask)
           
protected  void reportInvalidOther(int mask)
           
protected  void reportInvalidOther(int mask, int ptr)
           
protected  void skipCData()
           
protected  boolean skipCharacters()
           
protected  boolean skipCoalescedText()
          Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been skipped.
protected  void skipComment()
           
protected  void skipPI()
           
protected  void skipSpace()
           
 
Methods inherited from class com.fasterxml.aalto.in.StreamScanner
_closeSource, _releaseBuffers, checkInTreeIndentation, checkPrologIndentation, handleCharEntity, handleEndElement, loadAndRetain, loadMore, loadOne, loadOne, nextByte, nextByte, nextFromProlog, nextFromTree, parsePName, parsePNameLong, parsePNameMedium, parsePNameSlow, skipInternalWs
 
Methods inherited from class com.fasterxml.aalto.in.ByteBasedScanner
addUtfPName, getCurrentColumnNr, getCurrentLineNr, getCurrentLocation, markLF, markLF
 
Methods inherited from class com.fasterxml.aalto.in.XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, finishToken, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipToken, throwInvalidSpace, throwInvalidXmlChar, throwNullChar, throwUnexpectedChar, verifyXmlChar
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Utf8Scanner

public Utf8Scanner(ReaderConfig cfg,
                   InputStream in,
                   byte[] buffer,
                   int ptr,
                   int last)
Method Detail

handleStartElement

protected int handleStartElement(byte b)
                          throws XMLStreamException
Description copied from class: StreamScanner
Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.

Specified by:
handleStartElement in class StreamScanner
Throws:
XMLStreamException

handleEntityInText

protected final int handleEntityInText(boolean inAttr)
                                throws XMLStreamException
Method called when an ampersand is encounter in text segment. Method needs to determine whether it is a pre-defined or character entity (in which case it will be expanded into a single char or surrogate pair), or a general entity (in which case it will most likely be returned as ENTITY_REFERENCE event)

Specified by:
handleEntityInText in class StreamScanner
Parameters:
inAttr - True, if reference is from attribute value; false if from normal text content
Returns:
0 if a general parsed entity encountered; integer value of a (valid) XML content character otherwise
Throws:
XMLStreamException

addPName

protected final PName addPName(int hash,
                               int[] quads,
                               int qlen,
                               int lastQuadBytes)
                        throws XMLStreamException
Specified by:
addPName in class ByteBasedScanner
Throws:
XMLStreamException

parsePublicId

protected String parsePublicId(byte quoteChar)
                        throws XMLStreamException
Parsing of public ids is bit more complicated than that of system ids, since white space is to be coalesced.

Specified by:
parsePublicId in class StreamScanner
Throws:
XMLStreamException

parseSystemId

protected String parseSystemId(byte quoteChar)
                        throws XMLStreamException
Specified by:
parseSystemId in class StreamScanner
Throws:
XMLStreamException

skipCharacters

protected final boolean skipCharacters()
                                throws XMLStreamException
Specified by:
skipCharacters in class XmlScanner
Returns:
True, if an unexpanded entity was encountered (and is now pending)
Throws:
XMLStreamException

skipComment

protected final void skipComment()
                          throws XMLStreamException
Specified by:
skipComment in class XmlScanner
Throws:
XMLStreamException

skipCData

protected final void skipCData()
                        throws XMLStreamException
Specified by:
skipCData in class XmlScanner
Throws:
XMLStreamException

skipPI

protected final void skipPI()
                     throws XMLStreamException
Specified by:
skipPI in class XmlScanner
Throws:
XMLStreamException

skipSpace

protected final void skipSpace()
                        throws XMLStreamException
Specified by:
skipSpace in class XmlScanner
Throws:
XMLStreamException

finishCData

protected final void finishCData()
                          throws XMLStreamException
Specified by:
finishCData in class XmlScanner
Throws:
XMLStreamException

finishCharacters

protected final void finishCharacters()
                               throws XMLStreamException
Specified by:
finishCharacters in class XmlScanner
Throws:
XMLStreamException

finishComment

protected final void finishComment()
                            throws XMLStreamException
Specified by:
finishComment in class XmlScanner
Throws:
XMLStreamException

finishDTD

protected final void finishDTD(boolean copyContents)
                        throws XMLStreamException
When this method gets called we know that we have an internal subset, and that the opening '[' has already been read.

Specified by:
finishDTD in class XmlScanner
Throws:
XMLStreamException

finishPI

protected final void finishPI()
                       throws XMLStreamException
Specified by:
finishPI in class XmlScanner
Throws:
XMLStreamException

finishSpace

protected final void finishSpace()
                          throws XMLStreamException
Note: this method is only called in cases where it is known that only space chars are legal. Thus, encountering a non-space is an error (WFC or VC). However, an end-of-input is ok.

Specified by:
finishSpace in class XmlScanner
Throws:
XMLStreamException

finishCoalescedText

protected final void finishCoalescedText()
                                  throws XMLStreamException
Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been read in text buffer. Method has to see if the following event would be textual as well, and if so, read it (and any other following textual segments).

Throws:
XMLStreamException

finishCoalescedCharacters

protected final void finishCoalescedCharacters()
                                        throws XMLStreamException
Throws:
XMLStreamException

finishCoalescedCData

protected final void finishCoalescedCData()
                                   throws XMLStreamException
Throws:
XMLStreamException

skipCoalescedText

protected final boolean skipCoalescedText()
                                   throws XMLStreamException
Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been skipped. Method has to see if the following event would be textual as well, and if so, skip it (and any other following textual segments).

Specified by:
skipCoalescedText in class XmlScanner
Returns:
True if we encountered an unexpandable entity
Throws:
XMLStreamException

decodeCharForError

public int decodeCharForError(byte b)
                       throws XMLStreamException
Method called called to decode a full UTF-8 characters, given its first byte. Note: does not do any validity checks, since this is only to be used for informational purposes (often when an error has already been encountered)

Specified by:
decodeCharForError in class ByteBasedScanner
Throws:
XMLStreamException

reportInvalidInitial

protected void reportInvalidInitial(int mask)
                             throws XMLStreamException
Overrides:
reportInvalidInitial in class ByteBasedScanner
Throws:
XMLStreamException

reportInvalidOther

protected void reportInvalidOther(int mask)
                           throws XMLStreamException
Overrides:
reportInvalidOther in class ByteBasedScanner
Throws:
XMLStreamException

reportInvalidOther

protected void reportInvalidOther(int mask,
                                  int ptr)
                           throws XMLStreamException
Throws:
XMLStreamException


Copyright © 2012 Fasterxml.com. All Rights Reserved.