|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectjava.util.AbstractCollection<K>
it.unimi.dsi.fastutil.objects.AbstractObjectCollection<K>
it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>
it.unimi.dsi.util.FrontCodedStringList
public class FrontCodedStringList
Compact storage of strings using front-coding compression.
This class stores a list of strings using front-coding compression (of course,
the compression will be reasonable only if the list is sorted, but you could
also use instances of this class just as a handy way to manage a large
amount of strings). It implements an immutable ObjectList that returns the i-th
string (as a MutableString) when the get(int) method is
called with argument i. The returned mutable string may be freely
modified.
As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated words, and writes a corresponding serialized front-coded string list.
To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on
the value of the utf8 parameter at creation time. In the first case, if the
strings are ASCII-oriented the resulting array will be much smaller, but
access times will increase manifold, as each string must be UTF-8 encoded
before being returned.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList |
|---|
AbstractObjectList.ObjectSubList<K> |
| Field Summary | |
|---|---|
protected ByteArrayFrontCodedList |
byteFrontCodedList
The underlying ByteArrayFrontCodedList, or null. |
protected CharArrayFrontCodedList |
charFrontCodedList
The underlying CharArrayFrontCodedList, or null. |
static long |
serialVersionUID
|
protected boolean |
utf8
Whether this front-coded list is UTF-8 encoded. |
| Constructor Summary | |
|---|---|
FrontCodedStringList(Collection<? extends CharSequence> c,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences contained in the given collection. |
|
FrontCodedStringList(Iterator<? extends CharSequence> words,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences returned by the given iterator. |
|
| Method Summary | |
|---|---|
protected static char[] |
byte2Char(byte[] a,
char[] s)
|
protected static int |
countUTF8Chars(byte[] a)
|
MutableString |
get(int index)
Returns the element at the specified position in this front-coded as a mutable string. |
void |
get(int index,
MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string. |
ObjectListIterator<MutableString> |
listIterator(int k)
|
static void |
main(String[] arg)
|
int |
ratio()
Returns the ratio of the underlying front-coded list. |
int |
size()
|
boolean |
utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes. |
| Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList |
|---|
add, add, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, objectListIterator, objectListIterator, objectSubList, peek, pop, push, remove, removeElements, set, size, subList, top, toString |
| Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectCollection |
|---|
containsAll, isEmpty, objectIterator, removeAll, retainAll, toArray, toArray |
| Methods inherited from class java.util.AbstractCollection |
|---|
clear, remove |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface java.util.List |
|---|
clear, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray |
| Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection |
|---|
objectIterator, toArray |
| Methods inherited from interface it.unimi.dsi.fastutil.Stack |
|---|
isEmpty |
| Field Detail |
|---|
public static final long serialVersionUID
protected final ByteArrayFrontCodedList byteFrontCodedList
ByteArrayFrontCodedList, or null.
protected final CharArrayFrontCodedList charFrontCodedList
CharArrayFrontCodedList, or null.
protected final boolean utf8
| Constructor Detail |
|---|
public FrontCodedStringList(Iterator<? extends CharSequence> words,
int ratio,
boolean utf8)
words - an iterator returning character sequences.ratio - the desired ratio.utf8 - if true, the strings will be stored as UTF-8 byte arrays.
public FrontCodedStringList(Collection<? extends CharSequence> c,
int ratio,
boolean utf8)
c - a collection containing character sequences.ratio - the desired ratio.utf8 - if true, the strings will be stored as UTF-8 byte arrays.| Method Detail |
|---|
public boolean utf8()
public int ratio()
public MutableString get(int index)
get in interface List<MutableString>index - an index in the list.
MutableString that will contain the string at the specified position. The string may be freely modified.
public void get(int index,
MutableString s)
index - an index in the list.s - a mutable string that will contain the string at the specified position.protected static int countUTF8Chars(byte[] a)
protected static char[] byte2Char(byte[] a,
char[] s)
public ObjectListIterator<MutableString> listIterator(int k)
listIterator in interface ObjectList<MutableString>listIterator in interface List<MutableString>listIterator in class AbstractObjectList<MutableString>public int size()
size in interface Collection<MutableString>size in interface List<MutableString>size in class AbstractCollection<MutableString>
public static void main(String[] arg)
throws IOException,
JSAPException,
NoSuchMethodException
IOException
JSAPException
NoSuchMethodException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||