Class SequentialCrawler<T extends FetchResult>

  • All Implemented Interfaces:
    Crawler<T>

    public final class SequentialCrawler<T extends FetchResult>
    extends java.lang.Object
    implements Crawler<T>
    Sequential crawler, mainly for debugging or development.
    Author:
    thomas.jungblut
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void process​(java.lang.String... seedUrl)
      Starts the crawler, starting by the seedURL.
      void setup​(int fetches, Extractor<T> extractor, ResultWriter<T> writer)
      Setups this crawler.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SequentialCrawler

        public SequentialCrawler​(int fetches,
                                 Extractor<T> extractor,
                                 ResultWriter<T> writer)
                          throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • setup

        public final void setup​(int fetches,
                                Extractor<T> extractor,
                                ResultWriter<T> writer)
                         throws java.io.IOException
        Description copied from interface: Crawler
        Setups this crawler.
        Specified by:
        setup in interface Crawler<T extends FetchResult>
        Parameters:
        fetches - how many maximum fetches it should do.
        extractor - the given Extractor to extract a FetchResult.
        writer - the ResultWriter to write the result to a sink.
        Throws:
        java.io.IOException
      • process

        public final void process​(java.lang.String... seedUrl)
                           throws java.lang.InterruptedException,
                                  java.util.concurrent.ExecutionException
        Description copied from interface: Crawler
        Starts the crawler, starting by the seedURL. The real logic is implemented by the crawler itself.
        Specified by:
        process in interface Crawler<T extends FetchResult>
        Throws:
        java.lang.InterruptedException
        java.util.concurrent.ExecutionException