Package de.julielab.concepts.db.creators
Class NCBIGeneConceptCreator
- java.lang.Object
-
- de.julielab.concepts.db.creators.NCBIGeneConceptCreator
-
- All Implemented Interfaces:
ConceptCreator,de.julielab.jssf.commons.spi.ExtensionPoint,de.julielab.jssf.commons.spi.ParameterExposing
public class NCBIGeneConceptCreator extends Object implements ConceptCreator
-
-
Field Summary
Fields Modifier and Type Field Description static StringBASEPATHstatic StringGENE_GROUPstatic StringGENE_GROUP_PREFIX"gene_group" is the name of the file specifying the ortholog relationships between genes.static StringGENE_INFOstatic StringGENEDESCRIPTIONSstatic StringHOMOLOGENE_PREFIXstatic StringNCBI_GENE_SOURCEstatic StringORGANISMLISTstatic StringORGANISMNAMESstatic StringSEMEDICO_RESOURCE_MANAGEMENT_SOURCEstatic StringTOP_HOMOLOGY_PREFIXstatic StringTOP_ORTHOLOGY_PREFIX
-
Constructor Summary
Constructors Constructor Description NCBIGeneConceptCreator()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Stream<de.julielab.neo4j.plugins.datarepresentation.ImportConcept>convertGeneInfoToTerms(File geneInfo, Set<String> organismSet, File geneDescriptions)Stream<de.julielab.neo4j.plugins.datarepresentation.ImportConcepts>createConcepts(org.apache.commons.configuration2.HierarchicalConfiguration<org.apache.commons.configuration2.tree.ImmutableNode> importConfig)geneInfo Original gene_info file download from the NCBI.voidexposeParameters(String basePath, org.apache.commons.configuration2.HierarchicalConfiguration<org.apache.commons.configuration2.tree.ImmutableNode> template)static de.julielab.neo4j.plugins.datarepresentation.ConceptCoordinatesgetGeneCoordinates(String originalId)StringgetName()
-
-
-
Field Detail
-
SEMEDICO_RESOURCE_MANAGEMENT_SOURCE
public static final String SEMEDICO_RESOURCE_MANAGEMENT_SOURCE
- See Also:
- Constant Field Values
-
NCBI_GENE_SOURCE
public static final String NCBI_GENE_SOURCE
- See Also:
- Constant Field Values
-
BASEPATH
public static final String BASEPATH
- See Also:
- Constant Field Values
-
GENE_INFO
public static final String GENE_INFO
- See Also:
- Constant Field Values
-
GENEDESCRIPTIONS
public static final String GENEDESCRIPTIONS
- See Also:
- Constant Field Values
-
ORGANISMLIST
public static final String ORGANISMLIST
- See Also:
- Constant Field Values
-
ORGANISMNAMES
public static final String ORGANISMNAMES
- See Also:
- Constant Field Values
-
GENE_GROUP
public static final String GENE_GROUP
- See Also:
- Constant Field Values
-
HOMOLOGENE_PREFIX
public static final String HOMOLOGENE_PREFIX
- See Also:
- Constant Field Values
-
GENE_GROUP_PREFIX
public static final String GENE_GROUP_PREFIX
"gene_group" is the name of the file specifying the ortholog relationships between genes. Also, NCBI Gene, searching for a specific ortholog group works by search for "ortholog_gene_2475[group]" where the number is the ID of the gene that represents the group, the human gene, most of the time.- See Also:
- Constant Field Values
-
TOP_ORTHOLOGY_PREFIX
public static final String TOP_ORTHOLOGY_PREFIX
- See Also:
- Constant Field Values
-
TOP_HOMOLOGY_PREFIX
public static final String TOP_HOMOLOGY_PREFIX
- See Also:
- Constant Field Values
-
-
Method Detail
-
getGeneCoordinates
public static de.julielab.neo4j.plugins.datarepresentation.ConceptCoordinates getGeneCoordinates(String originalId)
-
convertGeneInfoToTerms
protected Stream<de.julielab.neo4j.plugins.datarepresentation.ImportConcept> convertGeneInfoToTerms(File geneInfo, Set<String> organismSet, File geneDescriptions) throws IOException
- Throws:
IOException
-
exposeParameters
public void exposeParameters(String basePath, org.apache.commons.configuration2.HierarchicalConfiguration<org.apache.commons.configuration2.tree.ImmutableNode> template)
- Specified by:
exposeParametersin interfacede.julielab.jssf.commons.spi.ParameterExposing
-
createConcepts
public Stream<de.julielab.neo4j.plugins.datarepresentation.ImportConcepts> createConcepts(org.apache.commons.configuration2.HierarchicalConfiguration<org.apache.commons.configuration2.tree.ImmutableNode> importConfig) throws ConceptCreationException, FacetCreationException
- geneInfo Original gene_info file download from the NCBI. Should reside on our servers at /data/data_resources/biology/entrez/gene/gene_info (or similar, path could change over time).
- organisms A list of NCBI Taxonomy IDs specifying the organisms for which genes should be included. The whole of the gene database contains around 16M entries, as of August 2014, most of which do not stand in the focus of research. The list given here should be the same list used for GeNo resource generation (organisms.taxid) to create a match between terms in the term database and actually mapped genes in the documents.
- ncbiTaxNames The names.dmp file included in the original NCBI Taxonomy download. Should reside on our servers at /data/data_resources/biology/ncbi_tax/names.dmp (or similar, path could change over time).
- geneSummary This file - unfortunately - cannot be downloaded directly. However, it should already exist, somewhere, since it is part of GeNo resource generation. You can either ask someone who is responsible for GeNo, or just build the semantic context index yourself with the script that is included in the jules-gene-mapper-ae project. Please note that summary download takes a while (a few hours) and thus is filtered to only download summaries for the genes that are included in GeNo.
- homologene
- Specified by:
createConceptsin interfaceConceptCreator- Throws:
FacetCreationExceptionIOExceptionConceptCreationException
-
getName
public String getName()
- Specified by:
getNamein interfacede.julielab.jssf.commons.spi.ExtensionPoint
-
-