Package de.unirostock.sems.xmlutils.ds
Class DocumentNode
- java.lang.Object
-
- de.unirostock.sems.xmlutils.ds.TreeNode
-
- de.unirostock.sems.xmlutils.ds.DocumentNode
-
public class DocumentNode extends TreeNode
The class DocumentNode, representing a node in an XML tree.- Author:
- Martin Scharm
-
-
Constructor Summary
Constructors Constructor Description DocumentNode(org.jdom2.Element element, DocumentNode parent, TreeDocument doc, Weighter w, int numChild, int level)Instantiates a new document node.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddChild(DocumentNode toAdd)Adds a child to this node.Stringdump(String prefix)Dump this node.booleanevaluate(ConnectionManager conMgmr)Evaluate the modifications of this node.DocumentNodeextract()Extracts this subtree.org.jdom2.AttributegetAttribute(String attr)Gets the an attribute.doublegetAttributeDistance(DocumentNode cmp)Calculates the distance of attributes.doublegetAttributeDistance(DocumentNode cmp, boolean allowDifferentIds, boolean careAboutNames, boolean stricterNames)Calculates the distance of attributes.Set<String>getAttributes()Gets set attributes.StringgetAttributeValue(String attr)Gets the value of an attribute.StringgetAttributeValue(String attr, String nsContains)Gets the value of an attribute with matching name space.List<TreeNode>getChildren()Gets the children.HashMap<String,List<TreeNode>>getChildrenTagMap()Gets the children tag map.List<TreeNode>getChildrenWithTag(String tag)Gets the children sharing a certain tag.StringgetId()Gets the value of the id attribute.org.jdom2.NamespacegetNameSpace()Gets the name space associated with this node.StringgetNameSpacePrefix()Gets the name space prefix.StringgetNameSpaceUri()Gets the name space uri.voidgetNodeStats(HashMap<String,Integer> map)Gets the node statistics of the subtree rooted in this node: tagname => number nodes having this tag name.intgetNoOfChild(TreeNode kid)Gets the child number of a child.intgetNumChildren()Gets the number of children in this node.intgetNumLeaves()Gets the number of leaves in the subtree rooted by this node.StringgetOwnHash()Gets the calculated hash of this single element (ignoring subtree).intgetSizeSubtree()Gets the size of this subtree (number of nodes under the current node, current node excluded).org.jdom2.ElementgetSubDoc(org.jdom2.Element parent)Attaches the subtree rooted in this node to the node parent.StringgetSubTreeHash()Gets the calculated hash of the subtree rooted in this node.StringgetTagName()Gets the tag name.doublegetWeight()Gets the weight of this node.WeightergetWeighter()Gets the weighter used to compute the weight of this document.booleanisBelow(DocumentNode parent)Checks if this node is a child of some other node (multilevel).protected voidreSetupStructureDown(TreeDocument doc, int numChild)Re-setup the document structure downwards.protected voidreSetupStructureUp()Re-setup the document structure upwards.voidrmAttribute(String attr)Removes an attribute.voidrmChild(DocumentNode toRemove)Remove a child.voidsetAttribute(String attr, String value)Overrides an attribute.voidsetAttribute(org.jdom2.Attribute attr)Overrides an attribute.static voidsetIdAttr(String id)Sets the id attribute.StringtoString()-
Methods inherited from class de.unirostock.sems.xmlutils.ds.TreeNode
addModification, contentDiffers, getDocument, getLevel, getModification, getParent, getType, getXPath, hasModification, isRoot, networkDiffers, resetModifications, rmModification, setModification
-
-
-
-
Field Detail
-
ID_ATTR
public static String ID_ATTR
The id attr.
-
-
Constructor Detail
-
DocumentNode
public DocumentNode(org.jdom2.Element element, DocumentNode parent, TreeDocument doc, Weighter w, int numChild, int level)Instantiates a new document node.- Parameters:
element- the corresponding elementparent- the parent nodedoc- the corresponding documentw- the weighternumChild- the number among its siblingslevel- the level in the tree
-
-
Method Detail
-
extract
public DocumentNode extract()
Extracts this subtree. Creates a copy of the subtree rooted in this node and returns a DocumentNode that has no parent, e.g. to transfer it to another document.- Returns:
- the copy of this DocumentNode
-
getSubTreeHash
public String getSubTreeHash()
Gets the calculated hash of the subtree rooted in this node.- Specified by:
getSubTreeHashin classTreeNode- Returns:
- the hash
-
getOwnHash
public String getOwnHash()
Gets the calculated hash of this single element (ignoring subtree).- Specified by:
getOwnHashin classTreeNode- Returns:
- the hash
-
getSizeSubtree
public int getSizeSubtree()
Gets the size of this subtree (number of nodes under the current node, current node excluded).- Returns:
- the size of the subtree
-
getNumLeaves
public int getNumLeaves()
Gets the number of leaves in the subtree rooted by this node. If this is a leave it will return 1.- Returns:
- the num leaves
-
setIdAttr
public static final void setIdAttr(String id)
Sets the id attribute. (you may want to use something like themetaidinstead of theidas identifier)- Parameters:
id- the new id attribute
-
getTagName
public String getTagName()
Description copied from class:TreeNodeGets the tag name. For document nodes it's the actual tag name, in case of text nodes you'll receiveTreeNode.TEXT_TAG.- Specified by:
getTagNamein classTreeNode- Returns:
- the tag name
-
getId
public String getId()
Gets the value of the id attribute.- Returns:
- the id
-
addChild
public void addChild(DocumentNode toAdd)
Adds a child to this node.- Parameters:
toAdd- the new child
-
rmChild
public void rmChild(DocumentNode toRemove) throws XmlDocumentConsistencyException
Remove a child.- Parameters:
toRemove- the child to remove- Throws:
XmlDocumentConsistencyException- thrown if there is no such child
-
getAttributeValue
public String getAttributeValue(String attr)
Gets the value of an attribute. Don't use it to get the id, use getId () instead!- Parameters:
attr- the name of the attribute- Returns:
- the value of the attribute
-
getAttribute
public org.jdom2.Attribute getAttribute(String attr)
Gets the an attribute.- Parameters:
attr- the name of the attribute- Returns:
- the the attribute
-
getAttributeValue
public String getAttributeValue(String attr, String nsContains)
Gets the value of an attribute with matching name space. Don't use it to get the id, use getId () instead!- Parameters:
attr- the name of the attributensContains- the name space must containnsContains- Returns:
- the value of the attribute
-
setAttribute
public void setAttribute(org.jdom2.Attribute attr)
Overrides an attribute.- Parameters:
attr- the attribute
-
setAttribute
public void setAttribute(String attr, String value)
Overrides an attribute.- Parameters:
attr- the name of the attributevalue- the new value
-
rmAttribute
public void rmAttribute(String attr)
Removes an attribute.- Parameters:
attr- the attributes name
-
isBelow
public boolean isBelow(DocumentNode parent)
Checks if this node is a child of some other node (multilevel). Both nodes have to be from the same origin document and the XPath of the current node has to start with the parent's XPath.- Parameters:
parent- the parent in question- Returns:
- true, if is this is a child of parent
-
getNumChildren
public int getNumChildren()
Gets the number of children in this node.- Returns:
- the number of children
-
getChildrenWithTag
public List<TreeNode> getChildrenWithTag(String tag)
Gets the children sharing a certain tag.- Parameters:
tag- the tag- Returns:
- the children having tag as tag name or an empty list if there are no such children
-
getChildrenTagMap
public HashMap<String,List<TreeNode>> getChildrenTagMap()
Gets the children tag map.- Returns:
- the children tag map
-
getNoOfChild
public int getNoOfChild(TreeNode kid)
Gets the child number of a child. Will return 1 if it's the first child and getNumChildren () for the last child. If there is no such child it returns -1.- Parameters:
kid- the kid- Returns:
- the no of child
-
getAttributeDistance
public double getAttributeDistance(DocumentNode cmp)
Calculates the distance of attributes. Basically callsgetAttributeDistance(DocumentNode, boolean, boolean, boolean)allowing different ids, caring about names, but not stricter names. Returns a double in [0,1]. If all attributes match the distance will be 0, if none of the attributes match the distance will be 1.- Parameters:
cmp- the node to compare- Returns:
- the attribute distance in [0,1]
-
getAttributeDistance
public double getAttributeDistance(DocumentNode cmp, boolean allowDifferentIds, boolean careAboutNames, boolean stricterNames)
Calculates the distance of attributes. Returns a double in [0,1].- If all attributes match the distance will be 0, if none of the attributes match the distance will be 1.
-
If
allowDifferentIdsis set to false, the distance will always be 1 if the two nodes do not share the same attribute. -
If
careAboutNamesis set to true, we will treat the name attributes differently. That means the difference between Glucose5Phosphate and Glucose6Phosphate (might just be a typo) will be rated less and the difference between Glucose3Phosphate and MAPKK2 will be rated much higher. We are using the `Michaelis–Menten kinetics` (:-)) to calc the difference between names:vmax=6; km=min(length(name1),length(name2))/4; [S]=levenshteinDistance(name1,name2) -
If
careAboutNamesis set to true ANDstricterNamesis set to true a difference in the name is treated very strictly. So if you're sure that your names are very similar, you should go for that option:vmax=12; km=min(length(name1),length(name2))/6; [S]=levenshteinDistance(name1,name2)
- Parameters:
cmp- the node to compareallowDifferentIds- are different ids allowed?careAboutNames- should we care about names?stricterNames- should we handle names very strictly?- Returns:
- the attribute distance in [0,1]
-
getWeighter
public Weighter getWeighter()
Gets the weighter used to compute the weight of this document.- Returns:
- the weighter
-
getNameSpaceUri
public String getNameSpaceUri()
Gets the name space uri.- Returns:
- the name space uri
-
getNameSpacePrefix
public String getNameSpacePrefix()
Gets the name space prefix.- Returns:
- the name space prefix
-
getSubDoc
public org.jdom2.Element getSubDoc(org.jdom2.Element parent)
Description copied from class:TreeNodeAttaches the subtree rooted in this node to the node parent. Recursively attaches its children. Will fail forparent == null && this.getType () == TreeNode.TEXT_NODEThat means a text node cannot become root.
-
getNodeStats
public void getNodeStats(HashMap<String,Integer> map)
Description copied from class:TreeNodeGets the node statistics of the subtree rooted in this node: tagname => number nodes having this tag name.- Specified by:
getNodeStatsin classTreeNode- Parameters:
map- the map to write our statistics to
-
evaluate
public boolean evaluate(ConnectionManager conMgmr)
Description copied from class:TreeNodeEvaluate the modifications of this node. Just useful for tree comparisons.
-
getWeight
public double getWeight()
Description copied from class:TreeNodeGets the weight of this node.
-
reSetupStructureDown
protected void reSetupStructureDown(TreeDocument doc, int numChild)
Description copied from class:TreeNodeRe-setup the document structure downwards. (e.g. recompute XPaths etc.)- Specified by:
reSetupStructureDownin classTreeNode- Parameters:
doc- the document this node corresponds tonumChild- the child number of this node
-
reSetupStructureUp
protected void reSetupStructureUp()
Description copied from class:TreeNodeRe-setup the document structure upwards. (e.g. recompute hashes etc.)- Specified by:
reSetupStructureUpin classTreeNode
-
dump
public String dump(String prefix)
Description copied from class:TreeNodeDump this node. Just for debugging purposes..
-
getNameSpace
public org.jdom2.Namespace getNameSpace()
Gets the name space associated with this node.- Returns:
- the name space
-
-