public class SectorAnnotator extends Annotator
| Modifier and Type | Class and Description |
|---|---|
static class |
SectorAnnotator.Builder
Builder pattern for creating new SECTOR Annotators.
|
static class |
SectorAnnotator.SegmentationMethod |
| Modifier and Type | Field and Description |
|---|---|
protected static org.slf4j.Logger |
log |
components, it, provenance, tagger| Modifier | Constructor and Description |
|---|---|
|
SectorAnnotator()
used for JSON deserialization
|
protected |
SectorAnnotator(AnnotatorComponent comp) |
|
SectorAnnotator(Tagger root) |
| Modifier and Type | Method and Description |
|---|---|
void |
annotate(Collection<Document> docs)
Annotate given Documents using SECTOR, i.e. attach SectorAnnotator vectors to sentences.
|
void |
annotate(Collection<Document> docs,
SectorAnnotator.SegmentationMethod segmentation)
Annotate given Documents using SECTOR, i.e. attach SectorEncoder vectors to sentences.
|
protected static void |
attachVectorsToAnnotations(Document doc,
LookupCacheEncoder targetEncoder)
Add vectors and class labels for all existing GOLD and PRED annotations.
|
protected static org.nd4j.linalg.api.ndarray.INDArray |
deltaMatrix(org.nd4j.linalg.api.ndarray.INDArray data)
Returns a matrix [Tx1] that contains cosine distances between time steps.
|
protected static org.nd4j.linalg.api.ndarray.INDArray |
detectEdges(org.nd4j.linalg.api.ndarray.INDArray dev)
Returns a matrix [Tx1] that contains edges in deviation.
|
protected static org.nd4j.linalg.api.ndarray.INDArray |
detectEdges(org.nd4j.linalg.api.ndarray.INDArray dev,
int count)
Returns a matrix [Tx1] that contains edges with given count in deviation.
|
protected void |
detectSections(Collection<Document> docs,
SectorAnnotator.SegmentationMethod segmentation) |
protected static org.nd4j.linalg.api.ndarray.INDArray |
deviation(org.nd4j.linalg.api.ndarray.INDArray target)
Returns a matrix [Tx1] that contains cosine distances between t-1 and t.
|
protected static org.nd4j.linalg.api.ndarray.INDArray |
deviation(org.nd4j.linalg.api.ndarray.INDArray fw,
org.nd4j.linalg.api.ndarray.INDArray bw)
Returns a matrix [Tx1] that contains cosine distances between forward and backward layer.
|
double |
evaluateModel(Dataset test)
Evaluate SECTOR model using a given Dataset.
|
double |
evaluateModel(Dataset test,
boolean evalSentenceClassification,
boolean evalSegmentation,
boolean evalSegmentClassification)
Evaluate SECTOR model using a given Dataset.
|
protected static org.nd4j.linalg.api.ndarray.INDArray |
gaussianSmooth(org.nd4j.linalg.api.ndarray.INDArray target) |
protected static org.nd4j.linalg.api.ndarray.INDArray |
gaussianSmooth(org.nd4j.linalg.api.ndarray.INDArray target,
double sd) |
protected static org.nd4j.linalg.api.ndarray.INDArray |
getEmbeddingMatrix(Document doc) |
protected static org.nd4j.linalg.api.ndarray.INDArray |
getLayerMatrix(Document doc,
Class layerClass) |
protected static org.nd4j.linalg.api.ndarray.INDArray |
getLayerMatrix(Document doc,
String layerClass) |
SectorTagger |
getTagger() |
LookupCacheEncoder |
getTargetEncoder() |
protected static org.nd4j.linalg.api.ndarray.INDArray |
pca(org.nd4j.linalg.api.ndarray.INDArray m,
int dimensions) |
void |
segment(Collection<Document> docs,
SectorAnnotator.SegmentationMethod segmentation,
boolean mergeSections)
Attach SectionAnnotations to each Document with a given segmentation strategy.
|
void |
trainModel(Dataset train)
Train a SECTOR model with configured number of epochs.
|
void |
trainModel(Dataset train,
int numEpochs)
Train a SECTOR model with given fixed number of epochs.
|
void |
trainModelEarlyStopping(Dataset train,
Dataset validation,
int minEpochs,
int minEpochsNoImprovement,
int maxEpochs)
Train a SECTOR model with early stopping based on MAP score.
|
addComponent, annotate, annotate, annotate, createDataset, createDocument, getComponent, getProvenance, isModelAvailable, isModelAvailableInChildren, readModel, writeComponents, writeHTML, writeModel, writeModel, writeTestLog, writeTrainLogpublic SectorAnnotator()
public SectorAnnotator(Tagger root)
protected SectorAnnotator(AnnotatorComponent comp)
public SectorTagger getTagger()
public LookupCacheEncoder getTargetEncoder()
public void annotate(Collection<Document> docs)
public void annotate(Collection<Document> docs, SectorAnnotator.SegmentationMethod segmentation)
public void segment(Collection<Document> docs, SectorAnnotator.SegmentationMethod segmentation, boolean mergeSections)
protected void detectSections(Collection<Document> docs, SectorAnnotator.SegmentationMethod segmentation)
public double evaluateModel(Dataset test)
public double evaluateModel(Dataset test, boolean evalSentenceClassification, boolean evalSegmentation, boolean evalSegmentClassification)
evalSentenceClassification - - enable/disable the evaluation of sentence-level classification (P/R scores)evalSegmentation - - enable/disable the evaluation of text segmentation (Pk/WD scores)evalSegmentClassification - - enable/disable the evaluation of segment-level classification (P/R scores)public void trainModel(Dataset train)
trainModel in class Annotatorpublic void trainModel(Dataset train, int numEpochs)
public void trainModelEarlyStopping(Dataset train, Dataset validation, int minEpochs, int minEpochsNoImprovement, int maxEpochs)
train - training Dataset with GOLD Annotationsvalidation - validation Dataset with GOLD AnnotationsminEpochs - training will not be stopped before this number of epochs (absolute value)minEpochsNoImprovement - training will be stopped after this number of epochs without a MAP improvement (relative value)maxEpochs - training will be stopped after this number of epochs (absolute value)protected static void attachVectorsToAnnotations(Document doc, LookupCacheEncoder targetEncoder)
protected static org.nd4j.linalg.api.ndarray.INDArray getLayerMatrix(Document doc, String layerClass)
protected static org.nd4j.linalg.api.ndarray.INDArray getLayerMatrix(Document doc, Class layerClass)
protected static org.nd4j.linalg.api.ndarray.INDArray getEmbeddingMatrix(Document doc)
protected static org.nd4j.linalg.api.ndarray.INDArray pca(org.nd4j.linalg.api.ndarray.INDArray m,
int dimensions)
protected static org.nd4j.linalg.api.ndarray.INDArray gaussianSmooth(org.nd4j.linalg.api.ndarray.INDArray target)
protected static org.nd4j.linalg.api.ndarray.INDArray gaussianSmooth(org.nd4j.linalg.api.ndarray.INDArray target,
double sd)
protected static org.nd4j.linalg.api.ndarray.INDArray deviation(org.nd4j.linalg.api.ndarray.INDArray fw,
org.nd4j.linalg.api.ndarray.INDArray bw)
protected static org.nd4j.linalg.api.ndarray.INDArray deviation(org.nd4j.linalg.api.ndarray.INDArray target)
protected static org.nd4j.linalg.api.ndarray.INDArray detectEdges(org.nd4j.linalg.api.ndarray.INDArray dev)
protected static org.nd4j.linalg.api.ndarray.INDArray detectEdges(org.nd4j.linalg.api.ndarray.INDArray dev,
int count)
protected static org.nd4j.linalg.api.ndarray.INDArray deltaMatrix(org.nd4j.linalg.api.ndarray.INDArray data)
Copyright © 2019. All rights reserved.