Class MemFileEmbeddingStore<Embedded>
- Type Parameters:
Embedded- The type of the embedded object associated with an embedding. CommonlyTextSegment.
- All Implemented Interfaces:
dev.langchain4j.store.embedding.EmbeddingStore<Embedded>
EmbeddingStore implementation that stores embeddings in memory
while persisting associated embedded content (e.g.,
TextSegment) as files in a local
directory.
Design highlights:
- Embeddings are kept fully in memory for fast similarity search.
- Embedded content is serialized and stored as separate files on disk, reducing memory footprint for large content.
- Optional LRU cache keeps frequently accessed embedded content in memory.
- Supports adding, removing, and searching embeddings with optional metadata filtering.
Persistence of embedded content:
- When an embedding is added with associated content, the content is saved
to the configured
chunkStorageDirectory. - Content is reloaded from disk on demand and optionally cached for reuse.
- Chunk files are named after the embedding ID
Thread safety:
- The store uses concurrent collections and is safe for concurrent reads/writes.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classstatic class -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new MemFileEmbeddingStore with default settings.MemFileEmbeddingStore(Path chunkStorageDirectory) Creates a new MemFileEmbeddingStore with specified chunk storage directory.MemFileEmbeddingStore(Path chunkStorageDirectory, int cacheSize) Creates a new MemFileEmbeddingStore with specified chunk storage directory and cache size.MemFileEmbeddingStore(Collection<MemFileEmbeddingStore.Entry<Embedded>> entries, Path chunkStorageDirectory, int cacheSize) -
Method Summary
Modifier and TypeMethodDescriptionadd(dev.langchain4j.data.embedding.Embedding embedding) voidvoidvoidaddAll(List<String> ids, List<dev.langchain4j.data.embedding.Embedding> embeddings, List<Embedded> embedded) deserialize(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String data) Deserializes an embedding store from its string representation using the specified serialization strategy.deserializeFromFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String filePath) Deserializes an embedding store from a file using the specified serialization strategy and file path string.deserializeFromFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, Path filePath) Deserializes an embedding store from a file using the specified serialization strategy and file path.voidvoidremoveAll(dev.langchain4j.store.embedding.filter.Filter filter) voidremoveAll(Collection<String> ids) dev.langchain4j.store.embedding.EmbeddingSearchResult<Embedded> search(dev.langchain4j.store.embedding.EmbeddingSearchRequest embeddingSearchRequest) serialize(StoreSerializationStrategy<Embedded> storeSerializationStrategy) Serializes the entire embedding store to a string representation using the specified serialization strategy.voidserializeToFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String filePath) Serializes the entire embedding store to a file using the specified serialization strategy and file path string.voidserializeToFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, Path filePath) Serializes the entire embedding store to a file using the specified serialization strategy and file path.withChunkStorageDirectory(Path chunkStorageDirectory) Configures the chunk storage base directory.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface dev.langchain4j.store.embedding.EmbeddingStore
addAll, generateIds, remove
-
Constructor Details
-
MemFileEmbeddingStore
public MemFileEmbeddingStore()Creates a new MemFileEmbeddingStore with default settings. Uses a temporary directory for chunk storage and no caching. -
MemFileEmbeddingStore
Creates a new MemFileEmbeddingStore with specified chunk storage directory.- Parameters:
chunkStorageDirectory- Directory where embedded content will be stored as files
-
MemFileEmbeddingStore
Creates a new MemFileEmbeddingStore with specified chunk storage directory and cache size.- Parameters:
chunkStorageDirectory- Directory where embedded content will be stored as filescacheSize- Size of LRU cache for recently loaded chunks (0 = no caching)
-
MemFileEmbeddingStore
public MemFileEmbeddingStore(Collection<MemFileEmbeddingStore.Entry<Embedded>> entries, Path chunkStorageDirectory, int cacheSize)
-
-
Method Details
-
add
- Specified by:
addin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
add
- Specified by:
addin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
add
- Specified by:
addin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
add
-
addAll
- Specified by:
addAllin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
addAll
public void addAll(List<String> ids, List<dev.langchain4j.data.embedding.Embedding> embeddings, List<Embedded> embedded) - Specified by:
addAllin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
removeAll
- Specified by:
removeAllin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
removeAll
public void removeAll(dev.langchain4j.store.embedding.filter.Filter filter) - Specified by:
removeAllin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
removeAll
public void removeAll()- Specified by:
removeAllin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
search
public dev.langchain4j.store.embedding.EmbeddingSearchResult<Embedded> search(dev.langchain4j.store.embedding.EmbeddingSearchRequest embeddingSearchRequest) - Specified by:
searchin interfacedev.langchain4j.store.embedding.EmbeddingStore<Embedded>
-
withChunkStorageDirectory
Configures the chunk storage base directory.- Parameters:
chunkStorageDirectory- New directory for storing chunk files- Returns:
- A new MemFileEmbeddingStore instance with the specified directory
-
memFileStoreData
-
serialize
Serializes the entire embedding store to a string representation using the specified serialization strategy.This method captures the complete state of the embedding store, including:
- All embeddings and their associated IDs
- References to embedded content files (chunk file paths)
- Chunk storage directory configuration
- Cache size configuration
Note: The actual embedded content (e.g., TextSegment objects) stored in chunk files is not included in the serialized string. Only references to these files are serialized. To fully restore the store, both the serialized string and the original chunk files from the storage directory are required.
Thread Safety: This method is thread-safe and can be called concurrently with other operations on the store.
Example usage:
MemFileEmbeddingStore<TextSegment> store = new MemFileEmbeddingStore<>(); JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>(); // Add some embeddings store.add(embedding, textSegment); // Serialize to string String serializedStore = store.serialize(strategy);- Parameters:
storeSerializationStrategy- the strategy to use for serialization; must not benull- Returns:
- a string representation of the embedding store that can be used for persistence or transfer
- Throws:
RuntimeException- if serialization fails due to I/O errors or strategy-specific issues- See Also:
-
serializeToFile
public void serializeToFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, Path filePath) Serializes the entire embedding store to a file using the specified serialization strategy and file path.This method is equivalent to calling
serialize(StoreSerializationStrategy)and then writing the result to the specified file. The file will be created if it doesn't exist, or overwritten if it does exist.File Structure: The serialized data contains metadata about the embedding store but not the actual embedded content. The chunk files containing the embedded content remain in the original chunk storage directory and must be preserved separately.
Backup Strategy: For complete backup, you should:
- Serialize the store metadata using this method
- Backup the entire chunk storage directory
Example usage:
MemFileEmbeddingStore<TextSegment> store = new MemFileEmbeddingStore<>(); JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>(); Path backupFile = Paths.get("/backup/store-metadata.json"); // Serialize store metadata to file store.serializeToFile(strategy, backupFile);- Parameters:
storeSerializationStrategy- the strategy to use for serialization; must not benullfilePath- the path where the serialized data should be written; must not benull- Throws:
RuntimeException- if the file cannot be created, written to, or if serialization fails- See Also:
-
serializeToFile
public void serializeToFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String filePath) Serializes the entire embedding store to a file using the specified serialization strategy and file path string.This is a convenience method that converts the string path to a
Pathobject and delegates toserializeToFile(StoreSerializationStrategy, Path).Path Resolution: The file path can be absolute or relative. Relative paths are resolved against the current working directory. The parent directories will be created if they don't exist (depending on the serialization strategy implementation).
Example usage:
MemFileEmbeddingStore<TextSegment> store = new MemFileEmbeddingStore<>(); JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>(); // Serialize to relative path store.serializeToFile(strategy, "backup/store.json"); // Serialize to absolute path store.serializeToFile(strategy, "/home/user/backups/store.json");- Parameters:
storeSerializationStrategy- the strategy to use for serialization; must not benullfilePath- the file path as a string where the serialized data should be written; must not benullor blank- Throws:
RuntimeException- if the file cannot be created, written to, or if serialization failsIllegalArgumentException- if the file path is invalid- See Also:
-
deserialize
public MemFileEmbeddingStore<Embedded> deserialize(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String data) Deserializes an embedding store from its string representation using the specified serialization strategy.This method creates a new
MemFileEmbeddingStoreinstance from serialized data. The deserialized store will have the same configuration (chunk storage directory, cache size) and embedding entries as the original store that was serialized.Important: This method only restores the store metadata and embedding vectors. The actual embedded content (e.g., TextSegment objects) will be loaded on-demand from the chunk files in the original chunk storage directory. Therefore, the chunk storage directory and its files must be accessible and unchanged for the deserialized store to function properly.
Restoration Process:
- Parse the serialized string to extract store metadata
- Create a new store instance with the original configuration
- Restore all embedding entries with their IDs and chunk file references
- Initialize the cache (if configured) but don't preload chunk content
Example usage:
JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>(); String serializedData = "..."; // Previously serialized store data // Restore the store from serialized data MemFileEmbeddingStore<TextSegment> restoredStore = store.deserialize(strategy, serializedData); // The restored store can be used normally List<EmbeddingMatch<TextSegment>> results = restoredStore.search(searchRequest);- Parameters:
storeSerializationStrategy- the strategy to use for deserialization; must not benulldata- the serialized string representation of the embedding store; must not benullor blank- Returns:
- a new
MemFileEmbeddingStoreinstance restored from the serialized data - Throws:
RuntimeException- if deserialization fails due to invalid data format, I/O errors, or strategy-specific issuesIllegalArgumentException- if the serialized data is malformed or incompatible- See Also:
-
deserializeFromFile
public MemFileEmbeddingStore<Embedded> deserializeFromFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, Path filePath) Deserializes an embedding store from a file using the specified serialization strategy and file path.This method reads the serialized data from the specified file and creates a new
MemFileEmbeddingStoreinstance. It is equivalent to reading the file content and callingdeserialize(StoreSerializationStrategy, String).File Requirements: The file must contain data that was previously created by
serializeToFile(StoreSerializationStrategy, Path)using a compatible serialization strategy. The file must be readable and contain valid serialized store data.Directory Structure: After deserialization, the restored store will reference the same chunk storage directory as specified in the serialized data. Ensure that:
- The chunk storage directory exists and is accessible
- All referenced chunk files are present and unchanged
- The application has read permissions for the chunk directory and files
Example usage:
JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>(); Path backupFile = Paths.get("/backup/store-metadata.json"); // Restore the store from file MemFileEmbeddingStore<TextSegment> restoredStore = store.deserializeFromFile(strategy, backupFile); // Verify the restoration was successful EmbeddingSearchResult<TextSegment> results = restoredStore.search(searchRequest); System.out.println("Restored " + results.matches().size() + " embeddings");- Parameters:
storeSerializationStrategy- the strategy to use for deserialization; must not benullfilePath- the path to the file containing serialized store data; must not benull- Returns:
- a new
MemFileEmbeddingStoreinstance restored from the file data - Throws:
RuntimeException- if the file cannot be read, doesn't exist, or if deserialization failsIllegalArgumentException- if the file contains malformed or incompatible data- See Also:
-
deserializeFromFile
public MemFileEmbeddingStore<Embedded> deserializeFromFile(StoreSerializationStrategy<Embedded> storeSerializationStrategy, String filePath) Deserializes an embedding store from a file using the specified serialization strategy and file path string.This is a convenience method that converts the string path to a
Pathobject and delegates todeserializeFromFile(StoreSerializationStrategy, Path).Path Resolution: The file path can be absolute or relative. Relative paths are resolved against the current working directory. The file must exist and be readable.
Use Cases: This method is particularly useful when:
- Loading stores from configuration files that specify string paths
- Integrating with systems that work with string-based file paths
- Building command-line tools that accept file paths as arguments
Example usage:
JsonStoreSerializationStrategy<TextSegment> strategy = new JsonStoreSerializationStrategy<>(); // Load from relative path MemFileEmbeddingStore<TextSegment> store1 = store.deserializeFromFile(strategy, "backup/store.json"); // Load from absolute path MemFileEmbeddingStore<TextSegment> store2 = store.deserializeFromFile(strategy, "/home/user/backups/store.json"); // Load from user home directory String userHome = System.getProperty("user.home"); MemFileEmbeddingStore<TextSegment> store3 = store.deserializeFromFile(strategy, userHome + "/store.json");- Parameters:
storeSerializationStrategy- the strategy to use for deserialization; must not benullfilePath- the file path as a string containing serialized store data; must not benullor blank- Returns:
- a new
MemFileEmbeddingStoreinstance restored from the file data - Throws:
RuntimeException- if the file cannot be read, doesn't exist, or if deserialization failsIllegalArgumentException- if the file path is invalid or contains malformed data- See Also:
-