public abstract class AbstractFileBasedConnector extends Object implements Connector
| Modifier and Type | Class and Description |
|---|---|
protected static interface |
AbstractFileBasedConnector.RecordReader
A reader for
Records. |
protected static interface |
AbstractFileBasedConnector.RecordWriter
A writer for
Records. |
| Constructor and Description |
|---|
AbstractFileBasedConnector() |
| Modifier and Type | Method and Description |
|---|---|
protected reactor.core.publisher.Flux<Record> |
applyPerFileLimits(reactor.core.publisher.Flux<Record> records)
Applies per-file limits to a stream of records coming from
readSingleFile(java.net.URL,
URI). |
void |
close() |
void |
configure(com.typesafe.config.Config settings,
boolean read,
boolean retainRecordSources) |
protected abstract String |
getConnectorName()
Returns the connector name, e.g.
|
protected URL |
getOrCreateDestinationURL()
Returns the URL that the connector should write to.
|
void |
init() |
protected boolean |
isDataSizeSamplingAvailable()
Checks whether it is safe to perform data size sampling on this connector's data source.
|
protected List<URL> |
loadURLs(com.typesafe.config.Config settings)
Validates and computes the list of URLs to be read or written.
|
protected abstract AbstractFileBasedConnector.RecordReader |
newSingleFileReader(URL url,
URI resource)
Returns a new
AbstractFileBasedConnector.RecordReader instance; cannot be null. |
protected abstract AbstractFileBasedConnector.RecordWriter |
newSingleFileWriter()
Returns a new
AbstractFileBasedConnector.RecordWriter instance; cannot be null. |
protected void |
processURLsForRead()
Inspects the list or URLs as loaded by
loadURLs(Config) and determines exactly what
files and folders need to be read. |
protected void |
processURLsForWrite()
Inspects the list or URLs as loaded by
loadURLs(Config) and determines if the
connector should write to a single file, or to a directory of files. |
org.reactivestreams.Publisher<Resource> |
read() |
int |
readConcurrency() |
protected reactor.core.publisher.Flux<Record> |
readSingleFile(URL url,
URI resource)
Reads a single text file accessible through the given URL.
|
protected reactor.core.publisher.Flux<URL> |
scanRootDirectory(Path root)
Scans a directory for readable files and returns the files found as a stream.
|
Function<org.reactivestreams.Publisher<Record>,org.reactivestreams.Publisher<Record>> |
write() |
int |
writeConcurrency() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetRecordMetadata, supportsprotected static final String URL
protected static final String URLFILE
protected static final String COMPRESSION
protected static final String ENCODING
protected static final String FILE_NAME_PATTERN
protected static final String SKIP_RECORDS
protected static final String MAX_RECORDS
protected static final String MAX_CONCURRENT_FILES
protected static final String RECURSIVE
protected static final String FILE_NAME_FORMAT
protected boolean read
protected boolean retainRecordSources
protected Charset encoding
protected String compression
protected String fileNameFormat
protected boolean recursive
protected String pattern
protected long skipRecords
protected long maxRecords
protected int resourceCount
protected int maxConcurrentFiles
protected Deque<AbstractFileBasedConnector.RecordWriter> writers
protected AbstractFileBasedConnector.RecordWriter singleWriter
protected List<AbstractFileBasedConnector.RecordWriter> writersToClose
protected AtomicInteger fileCounter
protected AtomicInteger nextWriterIndex
public int readConcurrency()
readConcurrency in interface Connectorpublic int writeConcurrency()
writeConcurrency in interface Connectorpublic void configure(@NonNull
com.typesafe.config.Config settings,
boolean read,
boolean retainRecordSources)
public void init()
throws URISyntaxException,
IOException
init in interface ConnectorURISyntaxExceptionIOException@NonNull public org.reactivestreams.Publisher<Resource> read()
@NonNull public Function<org.reactivestreams.Publisher<Record>,org.reactivestreams.Publisher<Record>> write()
public void close()
close in interface Connectorclose in interface AutoCloseable@NonNull protected abstract String getConnectorName()
connector.foo.@NonNull protected reactor.core.publisher.Flux<Record> readSingleFile(@NonNull URL url, @NonNull URI resource)
Implementors should not care about maxRecords and skipRecords, these will be
applied later on, see applyPerFileLimits(Flux).
url - The URL to read; must not be null; must be accessible and readable (but not
necessarily hosted on the local filesystem).resource - The resource URI.Records; never null but may be empty.@NonNull protected abstract AbstractFileBasedConnector.RecordReader newSingleFileReader(@NonNull URL url, URI resource) throws IOException
AbstractFileBasedConnector.RecordReader instance; cannot be null. Only used when reading. Each
invocation of this method is expected to return a newly-allocated instance. The reader is
expected to be initialized already, and ready to emit its first record. It is possible to throw
IOException if the reader cannot be initialized.IOException@NonNull protected abstract AbstractFileBasedConnector.RecordWriter newSingleFileWriter()
AbstractFileBasedConnector.RecordWriter instance; cannot be null. Only used when writing. Each
invocation of this method is expected to return a newly-allocated instance.@NonNull protected List<URL> loadURLs(@NonNull com.typesafe.config.Config settings)
URLFILE and URL configuration options.
Expected to be called during the configuration phase.
protected void processURLsForRead()
throws URISyntaxException,
IOException
loadURLs(Config) and determines exactly what
files and folders need to be read.
Should be called at the beginning of the initialization process, but only when reading, never when writing.
This method expects that loadURLs(Config) has been previously called.
URISyntaxExceptionIOExceptionprotected void processURLsForWrite()
throws URISyntaxException,
IOException
loadURLs(Config) and determines if the
connector should write to a single file, or to a directory of files.
Should be called at the beginning of the initialization process, but only when writing, never when reading.
This method expects that loadURLs(Config) has been previously called, and also
expects exactly one URL to be present, which can be either a directory or a file.
URISyntaxExceptionIOException@NonNull protected reactor.core.publisher.Flux<URL> scanRootDirectory(@NonNull Path root)
@NonNull protected reactor.core.publisher.Flux<Record> applyPerFileLimits(@NonNull reactor.core.publisher.Flux<Record> records)
readSingleFile(java.net.URL,
URI).@NonNull protected URL getOrCreateDestinationURL()
This can be either a single file or a directory of files. If the former, each invocation of this method will return the same URL; if the latter, each invocation of this method will generate a new URL inside the directory, with a unique file name.
protected boolean isDataSizeSamplingAvailable()
This implementation simply checks that none of the urls to read is reading from standard input, since standard input is not rewindable. Any other URL protocol is considered safe.
Copyright © 2017–2023 DataStax. All rights reserved.