Class InvertedIndex<DOCUMENT_TYPE,​KEY_TYPE>

  • Type Parameters:
    DOCUMENT_TYPE - the type of document one wants to retrieve.
    KEY_TYPE - the type of key that is going to be extracted out of documents and is searchable (needs hashCode&equals implementations).

    public final class InvertedIndex<DOCUMENT_TYPE,​KEY_TYPE>
    extends java.lang.Object
    Inverted Index, mainly developed for sparse vectors to speedup dimension lookups for fast distance measurement and search space reduction. But of course it can also be used to behave like a fulltext index to find relevant documents by their textual representation.
    Author:
    thomas.jungblut
    • Method Detail

      • build

        public void build​(java.util.List<DOCUMENT_TYPE> items)
        Builds this inverted index.
        Parameters:
        items - the items that needs to be indexed.
      • query

        public java.util.List<DistanceResult<DOCUMENT_TYPE>> query​(DOCUMENT_TYPE document)
        Queries this invertex index. This is not bounding the result, so you'll get all items.
        Parameters:
        document - the document to query with
        Returns:
        an array of results descending sorted, so the best matching item resides on the first index.
      • query

        public java.util.List<DistanceResult<DOCUMENT_TYPE>> query​(DOCUMENT_TYPE document,
                                                                   double minDistance)
        Queries this invertex index. This is not bounding the result, so you'll get all items that have at least minDistance.
        Parameters:
        document - the document to query with
        minDistance - the minimum (lower than: <=) distance the items should have.
        Returns:
        an array of results descending sorted, so the best matching item resides on the first index.
      • query

        public java.util.List<DistanceResult<DOCUMENT_TYPE>> query​(DOCUMENT_TYPE document,
                                                                   int maxResults,
                                                                   double minDistance)
        Queries this inverted index.
        Parameters:
        document - the document to query with-
        maxResults - the maximum number of results to obtain.
        minDistance - the minimum (lower than: <=) distance the items should have.
        Returns:
        an array list of results descending sorted, so the best matching item resides on the first index.
      • createVectorIndex

        public static InvertedIndex<de.jungblut.math.DoubleVector,​java.lang.Integer> createVectorIndex​(DistanceMeasurer measurer)
        Creates an inverted index for vectors (usually sparse vectors are used) that maps dimensions to the corresponding vectors if they are non-zero.
        Parameters:
        measurer - the distance measurer on two vectors.
        Returns:
        a brand new inverted index.