public final class TokenLexer extends Object
java.util.StringTokenizer in that
it identifies the token type along with the token and converts
the token string into the type's corresponding Java instance.
There are nine (9) pre-defined token types and two special
types: ERROR and
EOF. ERROR is returned
when an recoverable error occurred. EOF is returned
when the input end is reached and no more tokens will be
returned.
The pre-defined token types are:
CHARACTER:
a single character between single quotes (').
COMMENT:
Either a // or slash star comment.
Supports nested comments.
FLOAT: A decimal number.
INTEGER:
An integer number.
NAME:
An alpha-numeric identifier.
OPERATOR:
Punctuation only identifier.
SOURCE:
Raw, unanalyzed input.
STRING:
Zero or more characters between double quotes
("").
NAME
token is found, the user keywords map is checked if it
contains the token as a keyword. If so, then the associated
token type is returned instead of NAME. When a
OPERATOR token is found,
both the user operators and delimiters maps are checked.
The user-defined token maps should meet the following criteria:
NEXT_TOKEN.
TokenLexer is:
import java.io.Reader;
import net.sf.eBus.text.TokenLexer;
import net.sf.eBus.text.Token;
...
TokenLexer lexer = new TokenLexer(Keywords, Operators, Delimiters);
Token token;
Reader input = ...;
// Set the input to be tokenized.
lexer.input(input);
// Continue retrieving until no more tokens.
while ((token = lexer.nextToken()).type() != TokenLexer.EOF)
{
// Process the next token based on token type.
}
// Finish up the tokenization.
Users may not want the lexer to analyze input between two
well-defined delimiters. This data is collected and returned
as a SOURCE token when the
terminating delimiter is reached. Raw mode requires both an
an opening and closing delimiter specified. This allows the
lexer to track the appearance of nested delimiters within the
input and return only when the top-level terminating delimiter
is found.
Raw lexical mode is used when input contains sub-text to be handled by a different lexer.
p| Modifier and Type | Class and Description |
|---|---|
static class |
TokenLexer.LexMode
The lexer will either analyze the tokens identifying the
type or collect raw input until a terminating delimiter
is found.
|
| Modifier and Type | Field and Description |
|---|---|
static int |
CHARACTER
A single-quoted character token (2).
|
static int |
COMMENT
Either a
// or a slash star
comment (3). |
static int |
EOF
The end of the input is reached (1).
|
static int |
ERROR
An error occurred when seeking the next token (0).
|
static int |
FLOAT
A floating point number (4).
|
static int |
INTEGER
An integer number (5).
|
static int |
NAME
An alphanumberic identifier (6).
|
static int |
NEXT_TOKEN
User-defined tokens must be >= 11.
|
static char |
NO_OPEN_CHAR
When the raw mode open character is set to U+0000, this
means there is no open character, only a close character.
|
static int |
OPERATOR
Token consists solely of punctuation characters (7).
|
static int |
SOURCE
Raw, unanalyzed input (8).
|
static int |
STRING
A double-quoted string (9).
|
static int |
TOKEN_COUNT
There are eleven (11) predefined token types.
|
| Constructor and Description |
|---|
TokenLexer(Map<String,Integer> keywords,
Map<String,Integer> operators,
Map<Character,Integer> delimiters)
Creates a message layout lexer using the specified
keywords, operator and delimiters.
|
| Modifier and Type | Method and Description |
|---|---|
void |
cookedMode()
Switch back to cooked tokenization.
|
void |
input(Reader reader)
Extract tokens from this input stream.
|
int |
lineNumber()
Returns the current line number being tokenized.
|
TokenLexer.LexMode |
mode()
Returns the current lexer mode.
|
Token |
nextToken()
Returns the next token found in the input stream.
|
int |
offset()
Returns the current offset into the input.
|
void |
rawMode(char openChar,
char closeChar)
Switch to raw tokenization.
|
public static final char NO_OPEN_CHAR
public static final int ERROR
public static final int EOF
public static final int CHARACTER
java.lang.Character instance.public static final int COMMENT
// or a slash star
comment (3). Nested comments are supported.public static final int FLOAT
java.lang.Double instance.public static final int INTEGER
java.lang.Long instance.public static final int NAME
public static final int OPERATOR
Punctuation characters are:
! " # $ % & ' ( ) *
+ , - . / : ; < = >
? @ [ \ ] ^ _ ` { }
| ~
public static final int SOURCE
TokenLexer.LexMode.RAW,
Constant Field Valuespublic static final int STRING
public static final int TOKEN_COUNT
public static final int NEXT_TOKEN
public TokenLexer(Map<String,Integer> keywords, Map<String,Integer> operators, Map<Character,Integer> delimiters) throws IllegalArgumentException
null.keywords - Keyword to integer identifier mapping.operators - Operator to integer identifier mapping.delimiters - Delimiter to integer identifier mapping.IllegalArgumentException - if any of the user maps contains a value <
NEXT_TOKEN.public int lineNumber()
public int offset()
public TokenLexer.LexMode mode()
public void input(Reader reader)
reader - Tokenize this input.public void rawMode(char openChar,
char closeChar)
openChar - The open clause delimiter.closeChar - The close clause delimiter.cookedMode()public void cookedMode()
rawMode(char, char)Copyright © 2019. All rights reserved.