|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnet.sf.mmm.util.io.base.EncodingUtilImpl.UtfDetectionProcessor
protected static class EncodingUtilImpl.UtfDetectionProcessor
This inner class is used to perform the actual UTF detection. It processes
the bytes from the underlying InputStream from a lookahead buffer.
It respects a ByteOrderMark, UTF-8 multi-byte-sequences, UTF-16
surrogates, zero-bytes for UTF-16 and UTF-32 ASCII overhead, etc.
| Field Summary | |
|---|---|
private ByteOrderMark |
bom
The ByteOrderMark or null if NOT present (or
detection NOT started). |
private long |
bytePosition
The byte-position in the stream relative to the head. |
private RankMap<String> |
encodingRankMap
The RankMap for encoding detection. |
private long |
firstNonAsciiPosition
The bytePosition where the first non-ascii byte was detected. |
private boolean |
maybeAscii
false if the data can NOT be ASCII, true
otherwise. |
private boolean |
maybeUtf16
false if the data can NOT be UTF-16, true
otherwise. |
private boolean |
maybeUtf8
false if the data can NOT be UTF-8, true
otherwise. |
private String |
nonUtfEncoding
The encoding to use if encoding is neither UTF nor ASCII. |
private EncodingUtilImpl.Surrogate[] |
surrogates
The last EncodingUtilImpl.Surrogates for each of the positions modulo 4. |
private int |
utf8ContinuationByteCount
The expected number of UTF-8 continuation bytes to come or 0
if no UTF-8 multi-byte-sequence is currently processed. |
private int[] |
zeroByteCounts
The number of bytes that have been 0 for each of the
positions modulo 4. |
| Constructor Summary | |
|---|---|
EncodingUtilImpl.UtfDetectionProcessor(String nonUtfEncoding)
The constructor. |
|
| Method Summary | |
|---|---|
String |
getEncoding()
This method gets the detected encoding from the currently processed data. |
String |
getLowByteEncoding()
This method gets the encoding without taking high-bytes (non-ASCII) into account. |
int |
process(byte[] buffer,
int offset,
int length)
This method is called to process the number of length bytes
from the given buffer starting from the given
offset. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private RankMap<String> encodingRankMap
RankMap for encoding detection.
private ByteOrderMark bom
ByteOrderMark or null if NOT present (or
detection NOT started).
private final String nonUtfEncoding
private boolean maybeAscii
false if the data can NOT be ASCII, true
otherwise.
private boolean maybeUtf8
false if the data can NOT be UTF-8, true
otherwise.
private boolean maybeUtf16
false if the data can NOT be UTF-16, true
otherwise.
private long bytePosition
private long firstNonAsciiPosition
bytePosition where the first non-ascii byte was detected.
private int[] zeroByteCounts
0 for each of the
positions modulo 4.
private EncodingUtilImpl.Surrogate[] surrogates
EncodingUtilImpl.Surrogates for each of the positions modulo 4.
private int utf8ContinuationByteCount
0
if no UTF-8 multi-byte-sequence is currently processed.
| Constructor Detail |
|---|
public EncodingUtilImpl.UtfDetectionProcessor(String nonUtfEncoding)
nonUtfEncoding - is the encoding to use if encoding is neither UTF
nor ASCII.| Method Detail |
|---|
public int process(byte[] buffer,
int offset,
int length)
length bytes
from the given buffer starting from the given
offset.buffer. It is NOT permitted to modify the given
buffer unless this is explicitly specified by the calling
object (typically an implementation of ByteProcessable).
process in interface ByteProcessorbuffer - contains the bytes to process.offset - is the index where to start in the buffer.length - is the number of bytes to proceed.
length. However you can also return a
value less than length and greater or equal to zero, in order to
stop processing at a specific position.public String getLowByteEncoding()
null if it looks like ASCII
so far.public String getEncoding()
null if the encoding has
NOT yet been detected and it looks like ASCII so far.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||