public final class BytesToNameCanonicalizer extends Object
Name
s which are constructed directly from a byte-based
input source).
Complications arise from trying to do efficient reuse and merging of
symbol tables, to be able to make use of usually shared vocabulary
of subsequent parsing runs.Modifier and Type | Field and Description |
---|---|
protected int |
_collCount
Total number of Names in collision buckets (included in
_count along with primary entries) |
protected int |
_collEnd
Index of the first unused collision bucket entry (== size of
the used portion of collision list): less than
or equal to 0xFF (255), since max number of entries is 255
(8-bit, minus 0 used as 'empty' marker)
|
protected com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.Bucket[] |
_collList
Array of heads of collision bucket chains; size dynamically
|
protected int |
_count
Total number of Names in the symbol table;
only used for child tables.
|
protected boolean |
_failOnDoS
Flag that indicates whether we should throw an exception if enough
hash collisions are detected (true); or just worked around (false).
|
protected int[] |
_hash
Array of 2^N size, which contains combination
of 24-bits of hash (0 to indicate 'empty' slot),
and 8-bit collision bucket index (0 to indicate empty
collision bucket chain; otherwise subtract one from index)
|
protected int |
_hashMask
Mask used to truncate 32-bit hash value to current hash array
size; essentially, hash array size - 1 (since hash array sizes
are 2^N).
|
protected boolean |
_intern
Whether canonical symbol Strings are to be intern()ed before added
to the table or not.
|
protected int |
_longestCollisionList
We need to keep track of the longest collision list; this is needed
both to indicate problems with attacks and to allow flushing for
other cases.
|
protected Name[] |
_mainNames
Array that contains
Name instances matching
entries in _mainHash . |
protected BitSet |
_overflows
Lazily constructed structure that is used to keep track of
collision buckets that have overflowed once: this is used
to detect likely attempts at denial-of-service attacks that
uses hash collisions.
|
protected BytesToNameCanonicalizer |
_parent
Reference to the root symbol table, for child tables, so
that they can merge table information back as necessary.
|
protected AtomicReference<com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.TableInfo> |
_tableInfo
Member that is only used by the root table instance: root
passes immutable state into child instances, and children
may return new state if they add entries to the table.
|
Modifier and Type | Method and Description |
---|---|
Name |
addName(String name,
int[] q,
int qlen) |
Name |
addName(String name,
int q1,
int q2) |
int |
bucketCount() |
int |
calcHash(int q1) |
int |
calcHash(int[] q,
int qlen) |
int |
calcHash(int q1,
int q2) |
protected static int[] |
calcQuads(byte[] wordBytes) |
int |
collisionCount()
Method mostly needed by unit tests; calculates number of
entries that are in collision list.
|
static BytesToNameCanonicalizer |
createRoot()
Factory method to call to create a symbol table instance with a
randomized seed value.
|
protected static BytesToNameCanonicalizer |
createRoot(int seed)
Factory method that should only be called from unit tests, where seed
value should remain the same.
|
Name |
findName(int q1)
Finds and returns name matching the specified symbol, if such
name already exists in the table.
|
Name |
findName(int[] q,
int qlen)
Finds and returns name matching the specified symbol, if such
name already exists in the table; or if not, creates name object,
adds to the table, and returns it.
|
Name |
findName(int q1,
int q2)
Finds and returns name matching the specified symbol, if such
name already exists in the table.
|
static Name |
getEmptyName() |
int |
hashSeed() |
BytesToNameCanonicalizer |
makeChild(boolean canonicalize,
boolean intern)
Deprecated.
|
BytesToNameCanonicalizer |
makeChild(int flags)
Factory method used to create actual symbol table instance to
use for parsing.
|
int |
maxCollisionLength()
Method mostly needed by unit tests; calculates length of the
longest collision chain.
|
boolean |
maybeDirty()
Method called to check to quickly see if a child symbol table
may have gotten additional entries.
|
void |
release()
Method called by the using code to indicate it is done
with this instance.
|
protected void |
reportTooManyCollisions(int maxLen) |
int |
size() |
protected final BytesToNameCanonicalizer _parent
protected final AtomicReference<com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.TableInfo> _tableInfo
protected boolean _intern
NOTE: non-final to allow disabling intern()ing in case of excessive collisions.
protected final boolean _failOnDoS
protected int _count
protected int _longestCollisionList
protected int _hashMask
protected int[] _hash
protected Name[] _mainNames
Name
instances matching
entries in _mainHash
. Contains nulls for unused
entries.protected com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer.Bucket[] _collList
protected int _collCount
_count
along with primary entries)protected int _collEnd
protected BitSet _overflows
public static BytesToNameCanonicalizer createRoot()
protected static BytesToNameCanonicalizer createRoot(int seed)
public BytesToNameCanonicalizer makeChild(int flags)
@Deprecated public BytesToNameCanonicalizer makeChild(boolean canonicalize, boolean intern)
public void release()
public int size()
public int bucketCount()
public boolean maybeDirty()
public int hashSeed()
public int collisionCount()
size()
- 1), but should usually be much lower, ideally 0.public int maxCollisionLength()
size()
- 1 in the pathological casepublic static Name getEmptyName()
public Name findName(int q1)
Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)
q1
- int32 containing first 4 bytes of the name;
if the whole name less than 4 bytes, padded with zero bytes
in front (zero MSBs, ie. right aligned)public Name findName(int q1, int q2)
Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)
q1
- int32 containing first 4 bytes of the name.q2
- int32 containing bytes 5 through 8 of the
name; if less than 8 bytes, padded with up to 3 zero bytes
in front (zero MSBs, ie. right aligned)public Name findName(int[] q, int qlen)
Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.
q
- Array of int32s, each of which contain 4 bytes of
encoded nameqlen
- Number of int32s, starting from index 0, in quads
parameterpublic int calcHash(int q1)
public int calcHash(int q1, int q2)
public int calcHash(int[] q, int qlen)
protected static int[] calcQuads(byte[] wordBytes)
protected void reportTooManyCollisions(int maxLen)
Copyright © 2014 FasterXML. All Rights Reserved.