Interface Crawler<T extends FetchResult>

  • Type Parameters:
    T - the result type that can be overriden by FetchResult.
    All Known Implementing Classes:
    MultithreadedCrawler, SequentialCrawler

    public interface Crawler<T extends FetchResult>
    Basic Crawler Interface, all implements should implicit give a constructor with the same arguments like setup and redirect the call to it.
    Author:
    thomas.jungblut
    • Method Detail

      • setup

        void setup​(int fetches,
                   Extractor<T> extractor,
                   ResultWriter<T> writer)
            throws java.io.IOException
        Setups this crawler.
        Parameters:
        fetches - how many maximum fetches it should do.
        extractor - the given Extractor to extract a FetchResult.
        writer - the ResultWriter to write the result to a sink.
        Throws:
        java.io.IOException
      • process

        void process​(java.lang.String... seedUrl)
              throws java.lang.InterruptedException,
                     java.util.concurrent.ExecutionException
        Starts the crawler, starting by the seedURL. The real logic is implemented by the crawler itself.
        Throws:
        java.lang.InterruptedException
        java.util.concurrent.ExecutionException