All Classes Interface Summary Class Summary Enum Summary Exception Summary
| Class |
Description |
| AbstractActivationFunction |
Implements the boiler plate code for applying functions on container classes
like vectors and matrices by applying the function on every element.
|
| AbstractClassifier |
Abstract base class for classifiers.
|
| AbstractKNearestNeighbours |
K nearest neighbour classification algorithm that is seeded with a "database"
of known examples and predicts based on the k-nearest neighbours majority
vote for a class.
|
| AbstractMiniBatchCostFunction |
Mini Batch cost function.
|
| AbstractMinimizer |
Abstract minimizer class that adds functionality that can be shared between
many minimizers.
|
| AbstractPredictor |
|
| AbstractTreeNode |
|
| ActivationFunction |
Squashing function interface to provide multiple activation functions, e.G.
|
| ActivationFunctionSelector |
Singleton helper to get the activation functions as singleton.
|
| AgglomerativeClustering |
"Bottom Up" clustering (agglomerative) using average single linkage
clustering.
|
| AgglomerativeClustering.ClusterNode |
Tree structure for containing information about linkages and distances.
|
| ArrayIterator<E> |
Generic ArrayIterator.
|
| ArrayJoiner |
A Joiner utility that works for primitive arrays which Guava's Joiner
can't deal with.
|
| ArrayUtils |
Array utils for stuff that isn't included in Arrays.
|
| ArticleContentExtrator |
Extractor for news articles.
|
| ArticleContentExtrator.ContentFetchResult |
Article content fetch result.
|
| AsyncBufferedOutputStream |
BufferedOutputStream that asynchronously flushes to disk, so callers don't
have to wait until the flush happens.
|
| BasicFeatureExtractor |
Basic feature extraction for sequence learning, takes the current word into
account and the previous label - as well as the joint version of both.
|
| BigramTokenizer |
Advanced tokenizer that lowercases, adds start and end tags, deduplicates
tokens and builds bigrams.
|
| BitSetWritable |
|
| BlockPartitioner |
This partitioner partitions connected ranges from 0 to numberOfRows into
sizeOfCluster buckets.
|
| Boundaries |
|
| Boundaries.Range |
|
| ByteBufferInputStream |
|
| CanopyClustering |
Sequential canopy clusterer.
|
| Classifier |
Classifier interface for predicting categorial variables.
|
| ClassifierFactory<A extends Classifier> |
Factory interface for building new classifiers, majorly used in
crossvalidation to generate new classifiers when needed.
|
| Cluster |
|
| CollectionInputProvider<T> |
Provider for generic collections to read.
|
| ConditionalLikelihoodCostFunction |
Conditional likelihood cost function, used in a maximum entropy markov model
to optimize the weights.
|
| ConsoleResultWriter<T extends FetchResult> |
Simple class that outputs to console.
|
| CosineDistance |
|
| CostFunction |
Cost function interface to be implemented when using with a optimizer like
conjugate gradient for example.
|
| CostGradientTuple |
More readable variant of the before used Tuple<> in CostFunction.
|
| CountMinSketch<T> |
|
| Crawler<T extends FetchResult> |
Basic Crawler Interface, all implements should implicit give a constructor
with the same arguments like setup and redirect the call to it.
|
| CrossEntropyLoss |
|
| CsvDatasetReader |
Binary dataset reader from CSVs.
|
| Dataset |
Simplistic dataset to carry information about them.
|
| DBSCAN |
Sequential version of DBSCAN to evaluate if this algorithm is suitable for
arbitrary parallelization paradigms that can crunch graphs.
|
| DBSCANClustering |
Plain sequential DBSCAN clustering.
|
| DecisionTree |
A decision tree that can be used for classification with numerical or
categorical features.
|
| DenseMatrixFolder |
|
| DiskList<E extends org.apache.hadoop.io.Writable> |
A file backed disk for adding elements and reading from them in a sequential
fashion.
|
| DiskList.IORuntimeException |
|
| DistanceMeasurer |
|
| DistanceResult<TYPE> |
Immutable generic distance result that contains a document type object and
its distance (to some artificial queried document).
|
| DocumentSimilarity |
Simply distance measure wrapper for debug string similarity measuring.
|
| ElliotActivationFunction |
Implementation of the elliot activation function.
|
| EuclidianDistance |
|
| EvaluationListener<A extends Classifier> |
The evaluation listener is majorly used to track the overfitting of a
classifier while training.
|
| EvaluationSplit |
Split data class that contains the division of train/test vectors.
|
| Evaluator |
Binary-/Multi-class classification evaluator utility that takes care of
test/train splitting and its evaluation with various metrics.
|
| Evaluator.EvaluationResult |
|
| Extractor<T extends FetchResult> |
Simple extraction logic interface for a site and a result.
|
| FeatureOutcomePair |
|
| FeatureType |
Denotes a feature type, either numerical or nominal.
|
| FetchResult |
Fetch Result class, contains the origin url and its outlinks for further
crawling.
|
| FetchResultPersister<T extends FetchResult> |
Asynchronous persister thread, taking a resultwriter and handles the logic
behind asynchronous writing to disk or an arbitrary sink implemented by the
ResultWriter.
|
| FetchThread<T extends FetchResult> |
Callable fetcher that extracts, for a given list of URLs and with a
given Extractor, the content from the list of urls.
|
| Fmincg |
Minimize a continuous differentialble multivariate function.
|
| GradientDescent |
Gradient descent implementation with some neat features like momentum,
divergence detection, delta breaks and bold driver and scheduled annealing
adaptive learning rates.
|
| GradientDescent.GradientDescentBuilder |
|
| HaversineDistance |
Haversine distance implementation that picks up lat/lng in degrees at
array/vector index 0 and 1 and returns the distance in meters between those
two vectors.
|
| HingeLoss |
Hinge-loss for linear SVMs.
|
| HMM |
Hidden Markov Model implementation for multiple observations for all three
types of problems HMM aims to solve (Decoding, likelihood estimation,
unsupervised/supervised learning).
|
| HtmlExtrator |
Extractor for raw html.
|
| HtmlExtrator.HtmlFetchResult |
Article content fetch result.
|
| HuffmanTree<VALUE> |
|
| ImageReader |
BufferedImage reader that exports the raw bytes as a feature vectors with
different encodings.
|
| InputProvider<T> |
A provider that provides generic input from a generic source as an iterator
that can be read over and over again.
|
| IntegerFunnel |
|
| InvertedIndex<DOCUMENT_TYPE,KEY_TYPE> |
Inverted Index, mainly developed for sparse vectors to speedup dimension
lookups for fast distance measurement and search space reduction.
|
| InvertedIndex.DocumentDistanceMeasurer<DOCUMENT_TYPE,KEY_TYPE> |
Measurer that measures distance of two documents.
|
| InvertedIndex.DocumentMapper<DOCUMENT_TYPE,KEY_TYPE> |
Mapper that maps a document to its keys.
|
| IrisReader |
Dataset vectorizer for the iris dataset.
|
| Iterables |
Some fancy utilities for iterables, e.G.
|
| IterationCompletionListener |
Callback that should be triggered when a iteration was finished.
|
| IterativeSimilarityAggregation |
Iterative similarity aggregation for named entity recognition and set
expansion based on the paper
"SEISA: Set Expansion by Iterative Similarity Aggregation".
|
| JaccardDistance |
|
| KMeansClustering |
Sequential version of k-means clustering.
|
| KNearestNeighbours |
K nearest neighbour classification algorithm that is seeded with a "database"
of known examples and predicts based on the k-nearest neighbours majority
vote for a class.
|
| LeafNode |
|
| LinearActivationFunction |
Linear activation function.
|
| ListUtils |
List util class for some fancy operations on generic lists.
|
| LogActivationFunction |
Log activation function, guarded against NaN and infinity edge cases.
|
| LogisticRegression |
|
| LogisticRegressionCostFunction |
|
| LogLoss |
Logistic error function implementation.
|
| LongFunnel |
|
| LossFunction |
Calculates the error, for example in the last layer of a neural net.
|
| LRUCache<K,V> |
Normal LRU cache based on LinkedHashMap.
|
| LUVColorSpace |
Represents the LUV colorspace.
|
| ManhattanDistance |
|
| MarkovChain |
Markov chain, that can "learn" the state transition probabilities by a given
input and returns the probability for a given sequence of states.
|
| MathUtils |
Math utils that features normalizations and other fancy stuff.
|
| MathUtils.PredictionOutcomePair |
|
| MatrixWritable |
Writable class for dense and sparse matrices.
|
| MaxEntMarkovModel |
Maximum entropy markov model for named entity recognition (classifying labels
in sequence learning).
|
| MeanAbsoluteLoss |
MAE for regression problems.
|
| MeanShiftClustering |
Sequential Mean Shift Clustering using a gaussian kernel and euclidian
distance measurement.
|
| Merger<M extends org.apache.hadoop.io.WritableComparable> |
Sorted segment merger on disk.
|
| MinHash |
Linear MinHash algorithm to find near duplicates faster or to speedup nearest
neighbour searches.
|
| MinHash.HashType |
|
| Minimizer |
Minimizer interface for various function minimizers.
|
| MLPWeightMapper |
|
| MNISTReader |
MNIST CSV reader from kaggle: www.kaggle.com/c/digit-recognizer/
|
| MultilayerPerceptron |
Multilayer perceptron implementation that works on GPU via JCuda and CPU.
|
| MultilayerPerceptron.MultilayerPerceptronBuilder |
Configuration for training a neural net through the Classifier
|
| MultilayerPerceptronCostFunction |
Neural network costfunction for a multilayer perceptron.
|
| MultilayerPerceptronCostFunction.NetworkConfiguration |
|
| MultinomialNaiveBayes |
Multinomial naive bayes classifier.
|
| MultithreadedCrawler<T extends FetchResult> |
Fast multithreaded crawler, will start a fixed threadpool of 32 threads each
will be fed by 10 urls at once.
|
| MushroomReader |
Dataset vectorizer for the mushroom dataset.
|
| NegatedCostFunction |
Negated cost function to implement maximization problems.
|
| NominalNode |
|
| NumericalNode |
|
| OnePassExclusiveClustering |
A one pass exclusive clustering algorithm.
|
| OutlinkExtractor |
Outlink extractor, parses a page just for its outlinks.
|
| OWLQN |
Java translation of C++ code of
"Orthant-Wise Limited-memory Quasi-Newton Optimizer for L1-regularized Objectives"
(@see http://research.microsoft.com/).
|
| Pair<S,T> |
Pair implementation, unlike Tuple this implements hashcode and equals
on both parts of this pair.
|
| ParticleSwarmOptimization |
Particle Swarm Optimization algorithm to minimize costfunctions.
|
| Partitioner |
Used to partition a list/matrix-like structure to a number of cores /
buckets.
|
| Permutations<T extends java.lang.Comparable<? super T>> |
|
| Predictor |
|
| RandomForest |
A decision tree forest, using bagging.
|
| RBM |
Class for training and stacking Restricted Boltzmann Machines (RBMs).
|
| RBM.RBMBuilder |
|
| RBMCostFunction |
Restricted Boltzmann machine implementation using Contrastive Divergence 1
(CD1).
|
| ReferencedContext<REF_TYPE,CONTEXT_TYPE> |
Reference and its context.
|
| ReluActivationFunction |
Rectified linear units implementation.
|
| ResultWriter<T extends FetchResult> |
Result writing interface.
|
| ResultWriterAdapter<T extends FetchResult> |
|
| SequenceFeatureExtractor<K> |
Interface for feature extraction in sequence learning.
|
| SequenceFileResultWriter<T extends FetchResult> |
Writes the result into a sequencefile "files/crawl/result.seq".
|
| SequentialCrawler<T extends FetchResult> |
Sequential crawler, mainly for debugging or development.
|
| SigmoidActivationFunction |
Implementation of the sigmoid function.
|
| SimilarityMeasurer |
Similarity measurer wrapper.
|
| SingleLinkedList<T> |
Single Linked list with less overhead in memory than the double linked list
of Java utils.
|
| SoftMaxActivationFunction |
Softmax activation that only works on vectors, because it needs to sum and
divide the probabilities.
|
| SoftplusReluActivationFunction |
|
| SortedFile<M extends org.apache.hadoop.io.WritableComparable> |
A file that serializes WritableComparables to a buffer, once it hits a
threshold this buffer will be sorted in memory.
|
| SparseFeatureExtractorHelper<K> |
Convenient helper for creating vectors out of text features for sequence
learning.
|
| SparseKNearestNeighbours |
K nearest neighbour classification algorithm that is seeded with a "database"
of known examples and predicts based on the k-nearest neighbours majority
vote for a class.
|
| SparseVectorDocumentMapper |
Mapper that maps sparse vectors into a set of their indices so they can be
used in the InvertedIndex for fast lookup.
|
| Split |
From Mahout, split class with better naming.
|
| SquaredLoss |
|
| StackMap<K,V> |
A stack that also provides random access lookup of values.
|
| StandardTokenizer |
Just a basic tokenizer by certain attributes with normalization.
|
| Statistics |
Small statistics utility to describe data based on its
min/max/mean/median/deviation.
|
| StepActivationFunction |
Classic perceptron-like step function.
|
| StepLoss |
|
| StringPool |
Simple map based StringPool that is considered faster than using
String.intern(), but uses a bit more memory.
|
| TanhActivationFunction |
Implementation of the Tanh activation based on FastMath.
|
| TestSetIterationCallback<T extends Classifier> |
This callback is used to evaluate the performance on a held-out test set.
|
| TextLineInputProvider |
Line reader for plain text that contains data in lines.
|
| Tokenizer |
Standard tokenizer interface.
|
| TokenizerUtils |
Nifty text utility for majorly tokenizing tasks.
|
| TrainingSplit |
|
| TrainingType |
Train normally on the CPU or on the GPU via CUDA?
|
| TreeCompiler |
Compilation unit for the object tree structure of the DecisionTree.
|
| TwentyNewsgroupReader |
Reads the "20news-bydate" dataset into a vector space model as well as
predictions based on the category.
|
| UnrollableDoubleVector |
Unrollable proxy double vector class, that wraps multiple vectors into one
that can be later unrolled.
|
| UntrainableClassifier |
|
| VectorDocumentDistanceMeasurer<T> |
Document distance measurer on vectors (basically a proxy to the real
DistanceMeasurer).
|
| VectorFunnel |
A funnel that funnels a DoubleVector into bytes by taking the non-zero items
from a vector for sparse instances.
|
| VectorizerUtils |
Vectorizing utility for basic tf-idf and wordcount vectorizing of
tokens/strings.
|
| VectorWritable |
New and updated VectorWritable class that has all the other fancy
combinations of vectors that are possible in my math library.
This class is not compatible to the one in the clustering package that has a
totally different byte alignment in binary files.
|
| ViterbiUtils |
Viterbi Utilities for forward backward passes and his famous decoding
algorithm for hidden markov models.
|
| Voter<A extends Classifier> |
Implementation of vote ensembling.
|
| Voter.CombiningType |
|
| Voter.SelectionType |
|
| WeightMapper<A extends Classifier> |
|
| WeightMatrix |
Weight matrix wrapper to encapsulate the random initialization.
|
| ZeroDistance |
|