intarsys runtime library

de.intarsys.tools.reader
Class ReaderTools

java.lang.Object
  extended by de.intarsys.tools.reader.ReaderTools

public class ReaderTools
extends Object

Tool class for common Reader related tasks.


Constructor Summary
ReaderTools()
           
 
Method Summary
static InputStreamReader createReaderScanBom(InputStream is)
          Try to detect the unicode transformation format (UTF encoding) from the BOM.
static InputStreamReader createReaderScanMeta(InputStream is)
          Try to detect the input stream encoding from the meta tags "$$$" embedded in the stream.
static TaggedReader createTaggedReader(InputStream is, String defaultCharsetName, int size)
          Create a TaggedReader and automatically detect the encoding from different heuristics.
static Map.Entry<String,String> readEntry(Reader reader, char delimiter)
          Read a Map.Entry object from r.
static Map<String,String> readMetaData(Reader reader)
          Try to detect meta data embedded in the input.
static String readMetaEncoding(Reader reader)
          Try to detect encoding specific meta data embedded in the input.
static String readToken(Reader reader, char delimiter)
          Read a string token from r.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ReaderTools

public ReaderTools()
Method Detail

createReaderScanBom

public static InputStreamReader createReaderScanBom(InputStream is)
                                             throws IOException
Try to detect the unicode transformation format (UTF encoding) from the BOM. If no BOM is detected, null is returned and the input buffer is reset.

The InputStream is must support the mark operation! For BOM marker bytes, see http://unicode.org/faq/utf_bom.html

 Bytes                          Encoding Form 
 00 00 FE FF                    UTF-32, big-endian 
 FF FE 00 00                    UTF-32, little-endian 
 FE FF                          UTF-16, big-endian 
 FF FE                          UTF-16, little-endian 
 EF BB BF                       UTF-8
 

Parameters:
is -
Returns:
An InputStreamReader with the correct encoding
Throws:
IOException

createReaderScanMeta

public static InputStreamReader createReaderScanMeta(InputStream is)
                                              throws IOException
Try to detect the input stream encoding from the meta tags "$$$" embedded in the stream. If no encoding is detected, the input is reset and null is returned.

The InputStream is must support the mark operation!

Parameters:
is -
Returns:
An InputStreamReader with the correct encoding
Throws:
IOException

createTaggedReader

public static TaggedReader createTaggedReader(InputStream is,
                                              String defaultCharsetName,
                                              int size)
                                       throws IOException
Create a TaggedReader and automatically detect the encoding from different heuristics. First, the BOM markers are checked, then embedded meta information is scanned.

If no encoding can be guessed, either the defaultCharsetName or the platform encoding is used.

Meta information tags (lines starting with '$$$') are scanned.

Parameters:
is -
defaultCharsetName -
Returns:
A TaggedReader with the correct encoding
Throws:
IOException

readEntry

public static Map.Entry<String,String> readEntry(Reader reader,
                                                 char delimiter)
                                          throws IOException
Read a Map.Entry object from r. The end of the entry is marked by delimiter.

The syntax for an entry is

 ws* key ws* '=' value [delimiter | EOF]
 value = string | quoted_string
 quoted_string = '"' [ char | escape ]* '"'
 

Parameters:
reader -
delimiter -
Returns:
A single map entry read from the reader.
Throws:
IOException

readMetaData

public static Map<String,String> readMetaData(Reader reader)
                                       throws IOException
Try to detect meta data embedded in the input.

Meta data lines start with a '$$$' immediately at the line beginning and end at the line end. Meta data lines are scanned until a line without meta data is found. Meta data is encoded as entries (as provided in readEntry method).

The maximum length for a meta data line is 1024.

After execution reader is either positioned after the last meta tag. The reader instance must support the "mark/reset" sequence.

Parameters:
reader -
Returns:
All meta data in the reader as a Map
Throws:
IOException

readMetaEncoding

public static String readMetaEncoding(Reader reader)
                               throws IOException
Try to detect encoding specific meta data embedded in the input. The first meta data line is read and the value is returned if the meta information key is "encoding".

After execution reader is either positioned at the start or after the "encoding" meta tag. The reader instance must support the "mark/reset" sequence.

For more information on meta data see readMetaData.

Parameters:
reader -
Returns:
The meta data for reader defining the encoding
Throws:
IOException

readToken

public static String readToken(Reader reader,
                               char delimiter)
                        throws IOException
Read a string token from r. The end of the token is marked by delimiter. The syntax for a token is
 value [delimiter | EOF]
 value = string | quoted_string
 quoted_string = '"' [ char | escape ]* '"'
 

Parameters:
reader -
delimiter -
Returns:
A single token.
Throws:
IOException

intarsys runtime library

Copyright © 2012 intarsys consulting GmbH. All Rights Reserved.