Class MemFileEmbeddingStore<Embedded>

java.lang.Object
dev.langchain4j.community.store.embedding.memfile.MemFileEmbeddingStore<Embedded>
Type Parameters:
Embedded - The type of the embedded object associated with an embedding. Commonly TextSegment.
All Implemented Interfaces:
dev.langchain4j.store.embedding.EmbeddingStore<Embedded>

public class MemFileEmbeddingStore<Embedded> extends Object implements dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
An EmbeddingStore implementation that stores embeddings in memory while persisting associated embedded content (e.g., TextSegment) as files in a local directory.

Design highlights:

  • Embeddings are kept fully in memory for fast similarity search.
  • Embedded content is serialized and stored as separate files on disk, reducing memory footprint for large content.
  • Optional LRU cache keeps frequently accessed embedded content in memory.
  • Supports adding, removing, and searching embeddings with optional metadata filtering.

Persistence of embedded content:

  • When an embedding is added with associated content, the content is saved to the configured chunkStorageDirectory.
  • Content is reloaded from disk on demand and optionally cached for reuse.
  • Chunk files are named after the embedding ID

Thread safety:

  • The store uses concurrent collections and is safe for concurrent reads/writes.
  • Constructor Details

    • MemFileEmbeddingStore

      public MemFileEmbeddingStore()
      Creates a new MemFileEmbeddingStore with default settings. Uses a temporary directory for chunk storage and no caching.
    • MemFileEmbeddingStore

      public MemFileEmbeddingStore(Path chunkStorageDirectory)
      Creates a new MemFileEmbeddingStore with specified chunk storage directory.
      Parameters:
      chunkStorageDirectory - Directory where embedded content will be stored as files
    • MemFileEmbeddingStore

      public MemFileEmbeddingStore(Path chunkStorageDirectory, int cacheSize)
      Creates a new MemFileEmbeddingStore with specified chunk storage directory and cache size.
      Parameters:
      chunkStorageDirectory - Directory where embedded content will be stored as files
      cacheSize - Size of LRU cache for recently loaded chunks (0 = no caching)
    • MemFileEmbeddingStore

      public MemFileEmbeddingStore(Collection<MemFileEmbeddingStore.Entry<Embedded>> entries, Path chunkStorageDirectory, int cacheSize)
  • Method Details

    • add

      public String add(dev.langchain4j.data.embedding.Embedding embedding)
      Specified by:
      add in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • add

      public void add(String id, dev.langchain4j.data.embedding.Embedding embedding)
      Specified by:
      add in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • add

      public String add(dev.langchain4j.data.embedding.Embedding embedding, Embedded embedded)
      Specified by:
      add in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • add

      public void add(String id, dev.langchain4j.data.embedding.Embedding embedding, Embedded embedded)
    • addAll

      public List<String> addAll(List<dev.langchain4j.data.embedding.Embedding> embeddings)
      Specified by:
      addAll in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • addAll

      public void addAll(List<String> ids, List<dev.langchain4j.data.embedding.Embedding> embeddings, List<Embedded> embedded)
      Specified by:
      addAll in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • removeAll

      public void removeAll(Collection<String> ids)
      Specified by:
      removeAll in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • removeAll

      public void removeAll(dev.langchain4j.store.embedding.filter.Filter filter)
      Specified by:
      removeAll in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • removeAll

      public void removeAll()
      Specified by:
      removeAll in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • search

      public dev.langchain4j.store.embedding.EmbeddingSearchResult<Embedded> search(dev.langchain4j.store.embedding.EmbeddingSearchRequest embeddingSearchRequest)
      Specified by:
      search in interface dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
    • withChunkStorageDirectory

      public MemFileEmbeddingStore<Embedded> withChunkStorageDirectory(Path chunkStorageDirectory)
      Configures the chunk storage base directory.
      Parameters:
      chunkStorageDirectory - New directory for storing chunk files
      Returns:
      A new MemFileEmbeddingStore instance with the specified directory
    • memFileStoreData

    • serialize

      public String serialize(StoreSerializationStrategy<Embedded> storeSerializationStrategy)
      Serializes the entire embedding store to a string representation using the specified serialization strategy.

      This method captures the complete state of the embedding store, including:

      • All embeddings and their associated IDs
      • References to embedded content files (chunk file paths)
      • Chunk storage directory configuration
      • Cache size configuration

      Note: The actual embedded content (e.g., TextSegment objects) stored in chunk files is not included in the serialized string. Only references to these files are serialized. To fully restore the store, both the serialized string and the original chunk files from the storage directory are required.

      Thread Safety: This method is thread-safe and can be called concurrently with other operations on the store.

      Example usage:

      
       MemFileEmbeddingStore<TextSegment> store = new MemFileEmbeddingStore<>();
       JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>();
      
       // Add some embeddings
       store.add(embedding, textSegment);
      
       // Serialize to string
       String serializedStore = store.serialize(strategy);
       
      Parameters:
      storeSerializationStrategy - the strategy to use for serialization; must not be null
      Returns:
      a string representation of the embedding store that can be used for persistence or transfer
      Throws:
      RuntimeException - if serialization fails due to I/O errors or strategy-specific issues
      See Also:
    • serializeToFile

      public void serializeToFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, Path filePath)
      Serializes the entire embedding store to a file using the specified serialization strategy and file path.

      This method is equivalent to calling serialize(StoreSerializationStrategy) and then writing the result to the specified file. The file will be created if it doesn't exist, or overwritten if it does exist.

      File Structure: The serialized data contains metadata about the embedding store but not the actual embedded content. The chunk files containing the embedded content remain in the original chunk storage directory and must be preserved separately.

      Backup Strategy: For complete backup, you should:

      1. Serialize the store metadata using this method
      2. Backup the entire chunk storage directory

      Example usage:

      
       MemFileEmbeddingStore<TextSegment> store = new MemFileEmbeddingStore<>();
       JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>();
       Path backupFile = Paths.get("/backup/store-metadata.json");
      
       // Serialize store metadata to file
       store.serializeToFile(strategy, backupFile);
       
      Parameters:
      storeSerializationStrategy - the strategy to use for serialization; must not be null
      filePath - the path where the serialized data should be written; must not be null
      Throws:
      RuntimeException - if the file cannot be created, written to, or if serialization fails
      See Also:
    • serializeToFile

      public void serializeToFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String filePath)
      Serializes the entire embedding store to a file using the specified serialization strategy and file path string.

      This is a convenience method that converts the string path to a Path object and delegates to serializeToFile(StoreSerializationStrategy, Path).

      Path Resolution: The file path can be absolute or relative. Relative paths are resolved against the current working directory. The parent directories will be created if they don't exist (depending on the serialization strategy implementation).

      Example usage:

      
       MemFileEmbeddingStore<TextSegment> store = new MemFileEmbeddingStore<>();
       JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>();
      
       // Serialize to relative path
       store.serializeToFile(strategy, "backup/store.json");
      
       // Serialize to absolute path
       store.serializeToFile(strategy, "/home/user/backups/store.json");
       
      Parameters:
      storeSerializationStrategy - the strategy to use for serialization; must not be null
      filePath - the file path as a string where the serialized data should be written; must not be null or blank
      Throws:
      RuntimeException - if the file cannot be created, written to, or if serialization fails
      IllegalArgumentException - if the file path is invalid
      See Also:
    • deserialize

      public MemFileEmbeddingStore<Embedded> deserialize(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String data)
      Deserializes an embedding store from its string representation using the specified serialization strategy.

      This method creates a new MemFileEmbeddingStore instance from serialized data. The deserialized store will have the same configuration (chunk storage directory, cache size) and embedding entries as the original store that was serialized.

      Important: This method only restores the store metadata and embedding vectors. The actual embedded content (e.g., TextSegment objects) will be loaded on-demand from the chunk files in the original chunk storage directory. Therefore, the chunk storage directory and its files must be accessible and unchanged for the deserialized store to function properly.

      Restoration Process:

      1. Parse the serialized string to extract store metadata
      2. Create a new store instance with the original configuration
      3. Restore all embedding entries with their IDs and chunk file references
      4. Initialize the cache (if configured) but don't preload chunk content

      Example usage:

      
       JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>();
       String serializedData = "..."; // Previously serialized store data
      
       // Restore the store from serialized data
       MemFileEmbeddingStore<TextSegment> restoredStore = store.deserialize(strategy, serializedData);
      
       // The restored store can be used normally
       List<EmbeddingMatch<TextSegment>> results = restoredStore.search(searchRequest);
       
      Parameters:
      storeSerializationStrategy - the strategy to use for deserialization; must not be null
      data - the serialized string representation of the embedding store; must not be null or blank
      Returns:
      a new MemFileEmbeddingStore instance restored from the serialized data
      Throws:
      RuntimeException - if deserialization fails due to invalid data format, I/O errors, or strategy-specific issues
      IllegalArgumentException - if the serialized data is malformed or incompatible
      See Also:
    • deserializeFromFile

      public MemFileEmbeddingStore<Embedded> deserializeFromFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, Path filePath)
      Deserializes an embedding store from a file using the specified serialization strategy and file path.

      This method reads the serialized data from the specified file and creates a new MemFileEmbeddingStore instance. It is equivalent to reading the file content and calling deserialize(StoreSerializationStrategy, String).

      File Requirements: The file must contain data that was previously created by serializeToFile(StoreSerializationStrategy, Path) using a compatible serialization strategy. The file must be readable and contain valid serialized store data.

      Directory Structure: After deserialization, the restored store will reference the same chunk storage directory as specified in the serialized data. Ensure that:

      • The chunk storage directory exists and is accessible
      • All referenced chunk files are present and unchanged
      • The application has read permissions for the chunk directory and files

      Example usage:

      
       JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>();
       Path backupFile = Paths.get("/backup/store-metadata.json");
      
       // Restore the store from file
       MemFileEmbeddingStore<TextSegment> restoredStore = store.deserializeFromFile(strategy, backupFile);
      
       // Verify the restoration was successful
       EmbeddingSearchResult<TextSegment> results = restoredStore.search(searchRequest);
       System.out.println("Restored " + results.matches().size() + " embeddings");
       
      Parameters:
      storeSerializationStrategy - the strategy to use for deserialization; must not be null
      filePath - the path to the file containing serialized store data; must not be null
      Returns:
      a new MemFileEmbeddingStore instance restored from the file data
      Throws:
      RuntimeException - if the file cannot be read, doesn't exist, or if deserialization fails
      IllegalArgumentException - if the file contains malformed or incompatible data
      See Also:
    • deserializeFromFile

      public MemFileEmbeddingStore<Embedded> deserializeFromFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String filePath)
      Deserializes an embedding store from a file using the specified serialization strategy and file path string.

      This is a convenience method that converts the string path to a Path object and delegates to deserializeFromFile(StoreSerializationStrategy, Path).

      Path Resolution: The file path can be absolute or relative. Relative paths are resolved against the current working directory. The file must exist and be readable.

      Use Cases: This method is particularly useful when:

      • Loading stores from configuration files that specify string paths
      • Integrating with systems that work with string-based file paths
      • Building command-line tools that accept file paths as arguments

      Example usage:

      
       JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>();
      
       // Load from relative path
       MemFileEmbeddingStore<TextSegment> store1 = store.deserializeFromFile(strategy, "backup/store.json");
      
       // Load from absolute path
       MemFileEmbeddingStore<TextSegment> store2 = store.deserializeFromFile(strategy, "/home/user/backups/store.json");
      
       // Load from user home directory
       String userHome = System.getProperty("user.home");
       MemFileEmbeddingStore<TextSegment> store3 = store.deserializeFromFile(strategy, userHome + "/store.json");
       
      Parameters:
      storeSerializationStrategy - the strategy to use for deserialization; must not be null
      filePath - the file path as a string containing serialized store data; must not be null or blank
      Returns:
      a new MemFileEmbeddingStore instance restored from the file data
      Throws:
      RuntimeException - if the file cannot be read, doesn't exist, or if deserialization fails
      IllegalArgumentException - if the file path is invalid or contains malformed data
      See Also: