Class HBaseIO


  • @Experimental(SOURCE_SINK)
    public class HBaseIO
    extends java.lang.Object
    A bounded source and sink for HBase.

    For more information, see the online documentation at HBase.

    Reading from HBase

    The HBase source returns a set of rows from a single table, returning a PCollection<Result>.

    To configure a HBase source, you must supply a table id and a Configuration to identify the HBase instance. By default, HBaseIO.Read will read all rows in the table. The row range to be read can optionally be restricted using with a Scan object or using the HBaseIO.Read.withKeyRange(org.apache.beam.sdk.io.range.ByteKeyRange), and a Filter using HBaseIO.Read.withFilter(org.apache.hadoop.hbase.filter.Filter), for example:

    
     // Scan the entire table.
     p.apply("read",
         HBaseIO.read()
             .withConfiguration(configuration)
             .withTableId("table"));
    
     // Filter data using a HBaseIO Scan
     Scan scan = ...
     p.apply("read",
         HBaseIO.read()
             .withConfiguration(configuration)
             .withTableId("table"))
             .withScan(scan));
    
     // Scan a prefix of the table.
     ByteKeyRange keyRange = ...;
     p.apply("read",
         HBaseIO.read()
             .withConfiguration(configuration)
             .withTableId("table")
             .withKeyRange(keyRange));
    
     // Scan a subset of rows that match the specified row filter.
     p.apply("filtered read",
         HBaseIO.read()
             .withConfiguration(configuration)
             .withTableId("table")
             .withFilter(filter));
     

    readAll() allows to execute multiple Scans to multiple Tables. These queries are encapsulated via an initial PCollection of HBaseIO.Reads and can be used to create advanced compositional patterns like reading from a Source and then based on the data create new HBase scans.

    Note: HBaseIO.ReadAll only works with runners that support Splittable DoFn.

    
     PCollection<Read> queries = ...;
     queries.apply("readAll", HBaseIO.readAll().withConfiguration(configuration));
     

    Writing to HBase

    The HBase sink executes a set of row mutations on a single table. It takes as input a PCollection<Mutation>, where each Mutation represents an idempotent transformation on a row.

    To configure a HBase sink, you must supply a table id and a Configuration to identify the HBase instance, for example:

    
     Configuration configuration = ...;
     PCollection<Mutation> data = ...;
    
     data.apply("write",
         HBaseIO.write()
             .withConfiguration(configuration)
             .withTableId("table"));
     

    Experimental

    The design of the API for HBaseIO is currently related to the BigtableIO one, it can evolve or be different in some aspects, but the idea is that users can easily migrate from one to the other .