Class GitHubDocumentLoader
java.lang.Object
dev.langchain4j.data.document.loader.github.GitHubDocumentLoader
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionGitHubDocumentLoader(String gitHubToken, String gitHubTokenOrganization) GitHubDocumentLoader(String apiUrl, String gitHubToken, String gitHubTokenOrganization) GitHubDocumentLoader(org.kohsuke.github.GitHub gitHub) -
Method Summary
Modifier and TypeMethodDescriptionstatic GitHubDocumentLoader.Builderbuilder()dev.langchain4j.data.document.DocumentloadDocument(String owner, String repo, String ref, String path, dev.langchain4j.data.document.DocumentParser parser) Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).List<dev.langchain4j.data.document.Document> loadDocuments(String owner, String repo, String branch, dev.langchain4j.data.document.DocumentParser parser) List<dev.langchain4j.data.document.Document> loadDocuments(String owner, String repo, String branch, String path, dev.langchain4j.data.document.DocumentParser parser) Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.
-
Constructor Details
-
GitHubDocumentLoader
-
GitHubDocumentLoader
-
GitHubDocumentLoader
public GitHubDocumentLoader() -
GitHubDocumentLoader
public GitHubDocumentLoader(org.kohsuke.github.GitHub gitHub)
-
-
Method Details
-
loadDocument
public dev.langchain4j.data.document.Document loadDocument(String owner, String repo, String ref, String path, dev.langchain4j.data.document.DocumentParser parser) Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).This method retrieves the contents of a file from a GitHub repository at a specific version (ref), parses it using the provided
DocumentParser, and returns the resultingDocumentobject.Parameters:
- owner - The GitHub username or organization name that owns the repository. Must not be blank.
- repo - The name of the GitHub repository. Must not be blank.
- ref - The Git reference which can be one of the following:
- A branch name (e.g.,
main,develop) - A tag name (e.g.,
v1.0.0) - A commit SHA (e.g.,
a3c6e1b...)
nullor blank, GitHub will use the repository’s default branch (usuallymainormaster). - A branch name (e.g.,
- path - The relative file path within the repository to the content to be loaded (e.g.,
docs/README.md). - parser - An implementation of
DocumentParserused to parse the retrieved file content into aDocumentobject.
Returns:
ADocumentparsed from the contents of the file at the specified location and ref in the GitHub repository.Throws:
IllegalArgumentExceptionif theownerorrepois blank or null.RuntimeExceptionif the GitHub API call fails or the content cannot be retrieved (wrapsIOException).
Usage Example:
Document doc = loader.loadDocument("langchain4j", "langchain4j", "main", "pom.xml", new TextDocumentParser());- Parameters:
owner- the GitHub repository owner (user or organization)repo- the name of the GitHub repositoryref- the name of the commit SHA, branch, or tag. Ifnull, the repository’s default branch is usedpath- the relative path to the file in the repositoryparser- the parser used to convert the GitHub content into a Document- Returns:
- the parsed Document object representing the content of the file
-
loadDocuments
public List<dev.langchain4j.data.document.Document> loadDocuments(String owner, String repo, String branch, String path, dev.langchain4j.data.document.DocumentParser parser) Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.This method recursively scans the specified directory in a GitHub repository at a given branch, retrieves all files contained within (including nested directories), parses each file using the provided
DocumentParser, and returns a list ofDocumentobjects.Parameters:
- owner - The GitHub username or organization name that owns the repository. Must not be blank.
- repo - The name of the GitHub repository. Must not be blank.
- branch - The name of the Git branch from which to read the directory contents (e.g.,
main,develop). - path - The relative path to the directory within the repository to scan (e.g.,
docs/orsrc/resources/). - parser - An implementation of
DocumentParserused to convert file contents intoDocumentobjects.
Returns:
A list ofDocumentobjects parsed from the files found in the specified directory and its subdirectories.Throws:
IllegalArgumentExceptionifownerorrepois blank or null.RuntimeExceptionif anIOExceptionoccurs while accessing the GitHub repository content.
Usage Example:
List<Document> docs = loader.loadDocuments( "langchain4j", "langchain4j", "main", "docs/", new MarkdownParser() );- Parameters:
owner- the GitHub repository owner (user or organization)repo- the name of the GitHub repositorybranch- the name of the Git branch to fetch the directory contents frompath- the relative path to the directory in the repositoryparser- the parser used to convert each file into a Document- Returns:
- a list of parsed Document objects from the specified directory
-
loadDocuments
-
builder
-