com.fasterxml.aalto.async
Class AsyncByteScanner

java.lang.Object
  extended by com.fasterxml.aalto.in.XmlScanner
      extended by com.fasterxml.aalto.in.ByteBasedScanner
          extended by com.fasterxml.aalto.async.AsyncByteScanner
All Implemented Interfaces:
XmlConsts, NamespaceContext, XMLStreamConstants
Direct Known Subclasses:
AsyncUtfScanner

public abstract class AsyncByteScanner
extends ByteBasedScanner

This is the base class for asynchronous (non-blocking) XML scanners. Due to basic complexity of async approach, character-based doesn't make much sense, so only byte-based input is supported.


Field Summary
protected  int _currQuad
          Bytes parsed for the current, incomplete, quad
protected  int _currQuadBytes
          Number of bytes pending/buffered, stored in _currQuad
protected  boolean _elemAllNsBound
           
protected  boolean _elemAttrCount
           
protected  PName _elemAttrName
           
protected  int _elemAttrPtr
          Pointer for the next character of currently being parsed value within attribute value buffer
protected  byte _elemAttrQuote
           
protected  int _elemNsPtr
          Pointer for the next character of currently being parsed namespace URI for the current namespace declaration
protected  boolean _endOfInput
          Flag that is sent when calling application indicates that there will be no more input to parse.
protected  int _entityValue
          Entity value accumulated so far
protected  byte[] _inputBuffer
          This buffer is actually provided by caller
protected  int _nextEvent
          Due to asynchronous nature of parsing, we may know what event we are trying to parse, even if it's not yet complete.
protected  int _origBufferLen
          In addition to current buffer pointer, and end pointer, we will also need to know number of bytes originally contained.
protected  int _pendingInput
          There are some multi-byte combinations that must be handled as a unit: CR+LF linefeeds, multi-byte UTF-8 characters, and multi-character end markers for comments and PIs.
protected  int _quadCount
          Number of complete quads parsed for current name (quads themselves are stored in ByteBasedScanner._quadBuffer).
protected  int _state
          In addition to the event type, there is need for additional state information
protected  int _surroundingEvent
          For token/state combinations that are 'shared' between events (or embedded in them), this is where the surrounding event state is retained.
 
Fields inherited from class com.fasterxml.aalto.in.ByteBasedScanner
_charTypes, _inputEnd, _inputPtr, _pastBytes, _quadBuffer, _rowStartOffset, _symbols, _tmpChar, BYTE_a, BYTE_A, BYTE_AMP, BYTE_APOS, BYTE_C, BYTE_CR, BYTE_D, BYTE_EQ, BYTE_EXCL, BYTE_g, BYTE_GT, BYTE_HASH, BYTE_HYPHEN, BYTE_l, BYTE_LBRACKET, BYTE_LF, BYTE_LT, BYTE_m, BYTE_NULL, BYTE_o, BYTE_p, BYTE_P, BYTE_q, BYTE_QMARK, BYTE_QUOT, BYTE_RBRACKET, BYTE_s, BYTE_S, BYTE_SEMICOLON, BYTE_SLASH, BYTE_SPACE, BYTE_t, BYTE_T, BYTE_TAB, BYTE_u, BYTE_x
 
Fields inherited from class com.fasterxml.aalto.in.XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _publicId, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOI
 
Fields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWN
 
Fields inherited from interface javax.xml.stream.XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT
 
Constructor Summary
AsyncByteScanner(ReaderConfig cfg)
           
 
Method Summary
protected  void _closeSource()
          Since the async scanner has no access to whatever passes content, there is no input source in same sense as with blocking scanner; and there is nothing to close.
protected abstract  PName addPName(int hash, int[] quads, int qlen, int lastQuadBytes)
           
protected  int decodeCharForError(byte b)
          Method called by methods when encountering a byte that can not be part of a valid character in the current context.
protected  boolean decodeDecEntity()
           
protected  int decodeGeneralEntity(PName entityName)
          Method that verifies that given named entity is followed by a semi-colon (meaning next byte must be available for reading); and if so, whether it is one of pre-defined general entities.
protected  boolean decodeHexEntity()
           
 void endOfInput()
           
 void feedInput(byte[] buf, int start, int len)
           
protected  void finishCData()
           
protected abstract  void finishCharacters()
           
protected abstract  int finishCharactersCoalescing()
           
protected  void finishComment()
           
protected  void finishDTD(boolean copyContents)
           
protected  void finishPI()
           
protected  void finishSpace()
           
protected abstract  boolean handleAttrValue()
           
protected abstract  boolean handleDTDInternalSubset(boolean init)
           
protected  int handleEntityStartingToken()
          Method called when a new token (within tree) starts with an entity.
protected  int handleNamedEntityStartingToken()
          Method called when we see an entity that is starting a new token, and part of its name has been decoded (but not all)
protected abstract  boolean handleNsDecl()
           
protected  int handleNumericEntityStartingToken()
          Method called to handle cases where we find something other than a character entity (or one of 4 pre-defined general entities that act like character entities)
protected  boolean handlePartialCR()
          Method called when there is a pending \r (from past buffer), and we need to see
protected  int handleStartElement()
           
protected  int handleStartElementStart(byte b)
          Method called when '<' and (what appears to be) a name start character have been seen.
protected  boolean loadMore()
           
 boolean needMoreInput()
           
 int nextFromProlog(boolean isProlog)
           
 int nextFromTree()
           
protected abstract  int parseCDataContents()
           
protected abstract  int parseCommentContents()
           
protected  PName parseEntityName()
           
protected  PName parseNewEntityName(byte b)
           
protected  PName parseNewName(byte b)
           
protected abstract  int parsePIData()
           
protected  PName parsePName()
          This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.
protected  void skipCData()
           
protected abstract  boolean skipCharacters()
           
protected  void skipComment()
           
protected  void skipPI()
           
protected  void skipSpace()
           
protected abstract  int startCharacters(byte b)
          Method called to initialize state for CHARACTERS event, after just a single byte has been seen.
protected abstract  int startCharactersPending()
          This method gets called, if the first character of a CHARACTERS event could not be fully read (multi-byte, split over buffer boundary).
protected  int throwInternal()
           
 String toString()
           
protected  void verifyAndAppendEntityCharacter(int charFromEntity)
          Method called to verify validity of given character (from entity) and append it to the text buffer
 
Methods inherited from class com.fasterxml.aalto.in.ByteBasedScanner
_releaseBuffers, addUtfPName, getCurrentColumnNr, getCurrentLineNr, getCurrentLocation, markLF, markLF, reportInvalidInitial, reportInvalidOther
 
Methods inherited from class com.fasterxml.aalto.in.XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, finishToken, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipCoalescedText, skipToken, throwInvalidSpace, throwInvalidXmlChar, throwNullChar, throwUnexpectedChar, verifyXmlChar
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

_inputBuffer

protected byte[] _inputBuffer
This buffer is actually provided by caller


_origBufferLen

protected int _origBufferLen
In addition to current buffer pointer, and end pointer, we will also need to know number of bytes originally contained. This is needed to correctly update location information when the block has been completed.


_nextEvent

protected int _nextEvent
Due to asynchronous nature of parsing, we may know what event we are trying to parse, even if it's not yet complete. Type of that event is stored here.


_state

protected int _state
In addition to the event type, there is need for additional state information


_surroundingEvent

protected int _surroundingEvent
For token/state combinations that are 'shared' between events (or embedded in them), this is where the surrounding event state is retained.


_pendingInput

protected int _pendingInput
There are some multi-byte combinations that must be handled as a unit: CR+LF linefeeds, multi-byte UTF-8 characters, and multi-character end markers for comments and PIs. Since they can be split across input buffer boundaries, first byte(s) may need to be temporarily stored.

If so, this int will store byte(s), in little-endian format (that is, first pending byte is at 0x000000FF, second [if any] at 0x0000FF00, and third at 0x00FF0000). This can be (and is) used to figure out actual number of bytes pending, for multi-byte (UTF-8) character decoding.

Note: it is assumed that if value is 0, there is no data. Thus, if 0 needed to be added pending, it has to be masked.


_endOfInput

protected boolean _endOfInput
Flag that is sent when calling application indicates that there will be no more input to parse.


_quadCount

protected int _quadCount
Number of complete quads parsed for current name (quads themselves are stored in ByteBasedScanner._quadBuffer).


_currQuad

protected int _currQuad
Bytes parsed for the current, incomplete, quad


_currQuadBytes

protected int _currQuadBytes
Number of bytes pending/buffered, stored in _currQuad


_entityValue

protected int _entityValue
Entity value accumulated so far


_elemAllNsBound

protected boolean _elemAllNsBound

_elemAttrCount

protected boolean _elemAttrCount

_elemAttrQuote

protected byte _elemAttrQuote

_elemAttrName

protected PName _elemAttrName

_elemAttrPtr

protected int _elemAttrPtr
Pointer for the next character of currently being parsed value within attribute value buffer


_elemNsPtr

protected int _elemNsPtr
Pointer for the next character of currently being parsed namespace URI for the current namespace declaration

Constructor Detail

AsyncByteScanner

public AsyncByteScanner(ReaderConfig cfg)
Method Detail

toString

public String toString()
Overrides:
toString in class Object

parseCommentContents

protected abstract int parseCommentContents()
                                     throws XMLStreamException
Throws:
XMLStreamException

parseCDataContents

protected abstract int parseCDataContents()
                                   throws XMLStreamException
Throws:
XMLStreamException

parsePIData

protected abstract int parsePIData()
                            throws XMLStreamException
Throws:
XMLStreamException

startCharactersPending

protected abstract int startCharactersPending()
                                       throws XMLStreamException
This method gets called, if the first character of a CHARACTERS event could not be fully read (multi-byte, split over buffer boundary). If so, there is some pending data to be handled.

Throws:
XMLStreamException

finishCharactersCoalescing

protected abstract int finishCharactersCoalescing()
                                           throws XMLStreamException
Throws:
XMLStreamException

needMoreInput

public final boolean needMoreInput()

feedInput

public void feedInput(byte[] buf,
                      int start,
                      int len)
               throws XMLStreamException
Throws:
XMLStreamException

endOfInput

public void endOfInput()

_closeSource

protected void _closeSource()
                     throws IOException
Since the async scanner has no access to whatever passes content, there is no input source in same sense as with blocking scanner; and there is nothing to close. But we can at least mark input as having ended.

Specified by:
_closeSource in class ByteBasedScanner
Throws:
IOException

nextFromProlog

public final int nextFromProlog(boolean isProlog)
                         throws XMLStreamException
Specified by:
nextFromProlog in class XmlScanner
Throws:
XMLStreamException

nextFromTree

public int nextFromTree()
                 throws XMLStreamException
Specified by:
nextFromTree in class XmlScanner
Throws:
XMLStreamException

handleDTDInternalSubset

protected abstract boolean handleDTDInternalSubset(boolean init)
                                            throws XMLStreamException
Parameters:
init - Whether this is the first call (and state needs to be initialized) or not
Returns:
True if parsing was completed; false if not.
Throws:
XMLStreamException

startCharacters

protected abstract int startCharacters(byte b)
                                throws XMLStreamException
Method called to initialize state for CHARACTERS event, after just a single byte has been seen. What needs to be done next depends on whether coalescing mode is set or not: if it is not set, just a single character needs to be decoded, after which current event will be incomplete, but defined as CHARACTERS. In coalescing mode, the whole content must be read before current event can be defined. The reason for difference is that when XMLStreamReader.next() returns, no blocking can occur when calling other methods.

Returns:
Event type detected; either CHARACTERS, if at least one full character was decoded (and can be returned), EVENT_INCOMPLETE if not (part of a multi-byte character split across input buffer boundary)
Throws:
XMLStreamException

handleEntityStartingToken

protected int handleEntityStartingToken()
                                 throws XMLStreamException
Method called when a new token (within tree) starts with an entity.

Returns:
Type of event to return
Throws:
XMLStreamException

handleNamedEntityStartingToken

protected int handleNamedEntityStartingToken()
                                      throws XMLStreamException
Method called when we see an entity that is starting a new token, and part of its name has been decoded (but not all)

Throws:
XMLStreamException

handleNumericEntityStartingToken

protected int handleNumericEntityStartingToken()
                                        throws XMLStreamException
Method called to handle cases where we find something other than a character entity (or one of 4 pre-defined general entities that act like character entities)

Throws:
XMLStreamException

decodeHexEntity

protected final boolean decodeHexEntity()
                                 throws XMLStreamException
Returns:
True if entity was decoded (and value assigned to _entityValue; false otherwise
Throws:
XMLStreamException

decodeDecEntity

protected final boolean decodeDecEntity()
                                 throws XMLStreamException
Returns:
True if entity was decoded (and value assigned to _entityValue; false otherwise
Throws:
XMLStreamException

decodeGeneralEntity

protected final int decodeGeneralEntity(PName entityName)
                                 throws XMLStreamException
Method that verifies that given named entity is followed by a semi-colon (meaning next byte must be available for reading); and if so, whether it is one of pre-defined general entities.

Returns:
Character of the expanded pre-defined general entity (if name matches one); zero if not.
Throws:
XMLStreamException

handleStartElementStart

protected int handleStartElementStart(byte b)
                               throws XMLStreamException
Method called when '<' and (what appears to be) a name start character have been seen.

Throws:
XMLStreamException

handleStartElement

protected int handleStartElement()
                          throws XMLStreamException
Throws:
XMLStreamException

handleAttrValue

protected abstract boolean handleAttrValue()
                                    throws XMLStreamException
Throws:
XMLStreamException

handleNsDecl

protected abstract boolean handleNsDecl()
                                 throws XMLStreamException
Throws:
XMLStreamException

finishCharacters

protected abstract void finishCharacters()
                                  throws XMLStreamException
Specified by:
finishCharacters in class XmlScanner
Throws:
XMLStreamException

finishCData

protected void finishCData()
                    throws XMLStreamException
Specified by:
finishCData in class XmlScanner
Throws:
XMLStreamException

finishComment

protected void finishComment()
                      throws XMLStreamException
Specified by:
finishComment in class XmlScanner
Throws:
XMLStreamException

finishDTD

protected void finishDTD(boolean copyContents)
                  throws XMLStreamException
Specified by:
finishDTD in class XmlScanner
Throws:
XMLStreamException

finishPI

protected void finishPI()
                 throws XMLStreamException
Specified by:
finishPI in class XmlScanner
Throws:
XMLStreamException

finishSpace

protected void finishSpace()
                    throws XMLStreamException
Specified by:
finishSpace in class XmlScanner
Throws:
XMLStreamException

skipCharacters

protected abstract boolean skipCharacters()
                                   throws XMLStreamException
Specified by:
skipCharacters in class XmlScanner
Returns:
True if the whole characters segment was succesfully skipped; false if not
Throws:
XMLStreamException

skipCData

protected void skipCData()
                  throws XMLStreamException
Specified by:
skipCData in class XmlScanner
Throws:
XMLStreamException

skipComment

protected void skipComment()
                    throws XMLStreamException
Specified by:
skipComment in class XmlScanner
Throws:
XMLStreamException

skipPI

protected void skipPI()
               throws XMLStreamException
Specified by:
skipPI in class XmlScanner
Throws:
XMLStreamException

skipSpace

protected void skipSpace()
                  throws XMLStreamException
Specified by:
skipSpace in class XmlScanner
Throws:
XMLStreamException

loadMore

protected boolean loadMore()
                    throws XMLStreamException
Specified by:
loadMore in class XmlScanner
Throws:
XMLStreamException

parseNewName

protected PName parseNewName(byte b)
                      throws XMLStreamException
Throws:
XMLStreamException

parsePName

protected PName parsePName()
                    throws XMLStreamException
This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.

Some notes about assumption implementation makes:

Throws:
XMLStreamException

parseNewEntityName

protected final PName parseNewEntityName(byte b)
                                  throws XMLStreamException
Throws:
XMLStreamException

parseEntityName

protected final PName parseEntityName()
                               throws XMLStreamException
Throws:
XMLStreamException

addPName

protected abstract PName addPName(int hash,
                                  int[] quads,
                                  int qlen,
                                  int lastQuadBytes)
                           throws XMLStreamException
Specified by:
addPName in class ByteBasedScanner
Throws:
XMLStreamException

verifyAndAppendEntityCharacter

protected void verifyAndAppendEntityCharacter(int charFromEntity)
                                       throws XMLStreamException
Method called to verify validity of given character (from entity) and append it to the text buffer

Throws:
XMLStreamException

handlePartialCR

protected final boolean handlePartialCR()
Method called when there is a pending \r (from past buffer), and we need to see

Returns:
True if the linefeed was succesfully processed (had enough input data to do that); or false if there is no data available to check this

decodeCharForError

protected int decodeCharForError(byte b)
                          throws XMLStreamException
Description copied from class: ByteBasedScanner
Method called by methods when encountering a byte that can not be part of a valid character in the current context. Should return the actual decoded character for error reporting purposes.

Specified by:
decodeCharForError in class ByteBasedScanner
Throws:
XMLStreamException

throwInternal

protected int throwInternal()


Copyright © 2012 Fasterxml.com. All Rights Reserved.