Class PDSProductCrawler

  • All Implemented Interfaces:
    gov.nasa.jpl.oodt.cas.commons.spring.SpringSetIdInjectionType, gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
    Direct Known Subclasses:
    CollectionCrawler, PDS3ProductCrawler

    public class PDSProductCrawler
    extends gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
    Class that extends the Cas-Crawler to crawl a directory or PDS inventory file and register products to the PDS Registry Service.
    Author:
    mcayanan
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean inPersistanceMode
      Flag for crawler persistance.
      protected Map<File,​Long> touchedFiles
      A map of files that were touched during crawler persistance.
      • Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler

        DIR_FILTER, FILE_FILTER, LOG
      • Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean

        MIME_TYPES_HIERARCHY
      • Fields inherited from interface gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys

        FILE_LOCATION, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addAction​(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
      Adds a crawler action.
      void addActions​(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
      Adds a list of crawler actions.
      protected void addKnownMetadata​(File product, gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
      Method not implemented at the moment.
      void crawl​(File dir)
      Crawls the given directory.
      List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
      Gets a list of crawler actions defined for the crawler.
      protected gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct​(File product)
      Extracts metadata from the given product.
      Pds4MetExtractorConfig getMetExtractorConfig()
      Get the MetExtractor configuration object.
      protected boolean passesPreconditions​(File product)
      Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.
      void setCounter​(SearchDocState searchDocState)  
      void setDirectoryFilter​(DirectoryFilter filter)
      Sets the directory filter for the crawler.
      void setFileFilter​(FileFilter filter)
      Sets the file filter for the crawler.
      void setInPersistanceMode​(boolean value)  
      void setMetExtractorConfig​(Pds4MetExtractorConfig config)  
      void setSearchUrl​(String url)
      Sets the Search Service URL location.
      • Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler

        clearIngestStatus, crawl, getIngestStatus, handleFile, setActionRepo
      • Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean

        addRequiredMetadata, getActionIds, getApplicationContext, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getIngester, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setIngester, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngest
    • Field Detail

      • inPersistanceMode

        protected boolean inPersistanceMode
        Flag for crawler persistance.
      • touchedFiles

        protected Map<File,​Long> touchedFiles
        A map of files that were touched during crawler persistance.
    • Constructor Detail

      • PDSProductCrawler

        public PDSProductCrawler()
        Default constructor.
      • PDSProductCrawler

        public PDSProductCrawler​(Pds4MetExtractorConfig extractorConfig)
        Constructor.
        Parameters:
        extractorConfig - A configuration class that tells the crawler what data product types to look for and what metadata to extract.
    • Method Detail

      • getMetExtractorConfig

        public Pds4MetExtractorConfig getMetExtractorConfig()
        Get the MetExtractor configuration object.
        Returns:
        The PDSMetExtractorConfig object.
      • setInPersistanceMode

        public void setInPersistanceMode​(boolean value)
      • setFileFilter

        public void setFileFilter​(FileFilter filter)
        Sets the file filter for the crawler.
        Parameters:
        filter - A File Filter defined in the Harvest policy config.
      • setDirectoryFilter

        public void setDirectoryFilter​(DirectoryFilter filter)
        Sets the directory filter for the crawler.
        Parameters:
        filter - A Directory Filter defined in the Harvest policy config.
      • addKnownMetadata

        protected void addKnownMetadata​(File product,
                                        gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
        Method not implemented at the moment.
        Overrides:
        addKnownMetadata in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
        Parameters:
        product - The product file.
        productMetadata - The metadata associated with the product.
      • crawl

        public void crawl​(File dir)
        Crawls the given directory.
        Overrides:
        crawl in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
        Parameters:
        dir - The directory to crawl.
      • addAction

        public void addAction​(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
        Adds a crawler action.
        Parameters:
        action - A crawler action.
      • addActions

        public void addActions​(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
        Adds a list of crawler actions.
        Parameters:
        actions - A list of crawler actions.
      • getActions

        public List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
        Gets a list of crawler actions defined for the crawler.
        Returns:
        A list of crawler actions that will be performed during crawling.
      • getMetadataForProduct

        protected gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct​(File product)
        Extracts metadata from the given product.
        Specified by:
        getMetadataForProduct in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
        Parameters:
        product - A PDS file.
        Returns:
        A Metadata object, which holds metadata from the product.
      • passesPreconditions

        protected boolean passesPreconditions​(File product)
        Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.
        Specified by:
        passesPreconditions in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
        Parameters:
        product - A file.
        Returns:
        true if the file passes.
      • setSearchUrl

        public void setSearchUrl​(String url)
                          throws MalformedURLException
        Sets the Search Service URL location.
        Parameters:
        url - A url of the Search Service location.
        Throws:
        MalformedURLException - If the given url is malformed.
      • setCounter

        public void setCounter​(SearchDocState searchDocState)