Class NCBIGeneConceptCreator

  • All Implemented Interfaces:
    ConceptCreator, de.julielab.jssf.commons.spi.ExtensionPoint, de.julielab.jssf.commons.spi.ParameterExposing

    public class NCBIGeneConceptCreator
    extends Object
    implements ConceptCreator
    • Constructor Detail

      • NCBIGeneConceptCreator

        public NCBIGeneConceptCreator()
    • Method Detail

      • getGeneCoordinates

        public static de.julielab.neo4j.plugins.datarepresentation.ConceptCoordinates getGeneCoordinates​(String originalId)
      • convertGeneInfoToTerms

        protected Stream<de.julielab.neo4j.plugins.datarepresentation.ImportConcept> convertGeneInfoToTerms​(File geneInfo,
                                                                                                            Set<String> organismSet,
                                                                                                            File geneDescriptions)
                                                                                                     throws IOException
        Throws:
        IOException
      • exposeParameters

        public void exposeParameters​(String basePath,
                                     org.apache.commons.configuration2.HierarchicalConfiguration<org.apache.commons.configuration2.tree.ImmutableNode> template)
        Specified by:
        exposeParameters in interface de.julielab.jssf.commons.spi.ParameterExposing
      • createConcepts

        public Stream<de.julielab.neo4j.plugins.datarepresentation.ImportConcepts> createConcepts​(org.apache.commons.configuration2.HierarchicalConfiguration<org.apache.commons.configuration2.tree.ImmutableNode> importConfig)
                                                                                           throws ConceptCreationException,
                                                                                                  FacetCreationException
        • geneInfo Original gene_info file download from the NCBI. Should reside on our servers at /data/data_resources/biology/entrez/gene/gene_info (or similar, path could change over time).
        • organisms A list of NCBI Taxonomy IDs specifying the organisms for which genes should be included. The whole of the gene database contains around 16M entries, as of August 2014, most of which do not stand in the focus of research. The list given here should be the same list used for GeNo resource generation (organisms.taxid) to create a match between terms in the term database and actually mapped genes in the documents.
        • ncbiTaxNames The names.dmp file included in the original NCBI Taxonomy download. Should reside on our servers at /data/data_resources/biology/ncbi_tax/names.dmp (or similar, path could change over time).
        • geneSummary This file - unfortunately - cannot be downloaded directly. However, it should already exist, somewhere, since it is part of GeNo resource generation. You can either ask someone who is responsible for GeNo, or just build the semantic context index yourself with the script that is included in the jules-gene-mapper-ae project. Please note that summary download takes a while (a few hours) and thus is filtered to only download summaries for the genes that are included in GeNo.
        • homologene
        Specified by:
        createConcepts in interface ConceptCreator
        Throws:
        FacetCreationException
        IOException
        ConceptCreationException
      • getName

        public String getName()
        Specified by:
        getName in interface de.julielab.jssf.commons.spi.ExtensionPoint