Class TreeDocument


  • public class TreeDocument
    extends Object
    The Class TreeDocument representing hierarchically structured content.
    Author:
    Martin Scharm
    • Constructor Detail

      • TreeDocument

        public TreeDocument​(org.jdom2.Document d,
                            URI baseUri)
                     throws XmlDocumentParseException
        Instantiates a new tree document.
        Parameters:
        d - the document
        baseUri - the base URI (needed to resolve relative imports)
        Throws:
        XmlDocumentParseException - the xml document parse exception
      • TreeDocument

        public TreeDocument​(org.jdom2.Document d,
                            Weighter w,
                            URI baseUri)
                     throws XmlDocumentParseException
        Instantiates a new tree document.
        Parameters:
        d - the document
        w - the weighter to weight the nodes and subtrees
        baseUri - the base URI (needed to resolve relative imports)
        Throws:
        XmlDocumentParseException - the xml document parse exception
      • TreeDocument

        public TreeDocument​(org.jdom2.Document d,
                            URI baseUri,
                            boolean ordered)
                     throws XmlDocumentParseException
        Instantiates a new tree document.
        Parameters:
        d - the document
        baseUri - the base URI (needed to resolve relative imports)
        ordered - the ordered flag, if true we consider this tree to be ordered
        Throws:
        XmlDocumentParseException - the xml document parse exception
      • TreeDocument

        public TreeDocument​(org.jdom2.Document d,
                            Weighter w,
                            URI baseUri,
                            boolean ordered)
                     throws XmlDocumentParseException
        Instantiates a new tree document.
        Parameters:
        d - the document
        w - the weighter to weight the nodes and subtrees
        baseUri - the base URI (needed to resolve relative imports)
        ordered - the ordered
        Throws:
        XmlDocumentParseException - the xml document parse exception
      • TreeDocument

        public TreeDocument​(TreeDocument td)
        Instantiates a new tree document as a copy of another tree document.
        Parameters:
        td - the tree document to copy
    • Method Detail

      • resortSubtrees

        @Deprecated
        public void resortSubtrees()
        Deprecated.
        We are now using a sorted set, no resorting necessary anymore. This method doesn't do anything anymore.
        Resort subtrees.
      • integrate

        public void integrate​(TreeNode node,
                              boolean recursively)
        Integrate an node into this tree. This will update hash-/id-/tag-mappers etc.
        Parameters:
        node - the node to integrate
        recursively - recursively integrate the node's children
      • separate

        public void separate​(TreeNode node,
                             boolean recursively)
        Extract a node from this tree. Will delete its hash/id/xpath etc from corresponding mappers.
        Parameters:
        node - the node
        recursively - recursively separate the node's children
      • getBaseUri

        public URI getBaseUri()
        Gets the base URI.
        Returns:
        the base URI
      • uniqueIds

        public boolean uniqueIds()
        Are occurring IDs unique?.
        Returns:
        true, if all IDs are unique
      • resetAllModifications

        public void resetAllModifications()
        Resets all modifications.
      • getRoot

        public DocumentNode getRoot()
        Gets the root node.
        Returns:
        the root node
      • getNumNodes

        public int getNumNodes()
        Gets the number of nodes in this document.
        Returns:
        the number nodes
      • getTreeWeight

        public double getTreeWeight()
        Gets the tree weight. (equals the weight of the root node)
        Returns:
        the tree weight
      • getTextNodes

        public List<TextNode> getTextNodes()
        Gets all text nodes.
        Returns:
        the text nodes
      • getNodesByTag

        public List<DocumentNode> getNodesByTag​(String tag)
        Gets the nodes sharing a certain tag name. May return null if there is no such tag.
        Parameters:
        tag - the tag name to search for
        Returns:
        the nodes sharing this tag name
      • getSubtreesBySize

        public TreeNode[] getSubtreesBySize()
        Gets the subtrees ordered by size, biggest first.
        Returns:
        the subtrees by size
      • getNodesByHash

        public List<TreeNode> getNodesByHash​(String hash)
        Gets the nodes by hash. May return null if there is no such hash.
        Parameters:
        hash - the hash
        Returns:
        the nodes having this hash value
      • getNodeById

        public DocumentNode getNodeById​(String id)
        Gets the node by id. May return null if there is no such id or if the id's in this document aren't unique.
        Parameters:
        id - the id
        Returns:
        the node having this id value
      • getNodeByPath

        public TreeNode getNodeByPath​(String path)
        Gets the node by XPath expression. Currently only XPath expressions computed by us are supported. A common use case is for example docB.getNodeByPath (nodeFromA.getXPath ()); to search for a node at the same path in another document.
        Parameters:
        path - the path
        Returns:
        the node by path
      • getOccurringXPaths

        public Set<String> getOccurringXPaths()
        Get all known XPaths.
        Returns:
        the occurring XPaths
      • getOccurringIds

        public Set<String> getOccurringIds()
        Get all known identifiers.
        Returns:
        the occurring identifiers
      • getOccurringTags

        public Set<String> getOccurringTags()
        Get all known tag names.
        Returns:
        the occurring tags
      • getOccurringHashes

        public Set<String> getOccurringHashes()
        Get all known hashes.
        Returns:
        the occurring hashes
      • getNodeStats

        public HashMap<String,​Integer> getNodeStats()
        Gets the node statistics as a map `tag name` => `nodes sharing this tag`.
        Returns:
        the node stats
      • dump

        public String dump()
        Dump mech for debugging purposes.
        Returns:
        the string to debug this object