org.apache.oodt.cas.pushpull.filerestrictions
Class FileRestrictions

java.lang.Object
  extended by org.apache.oodt.cas.pushpull.filerestrictions.FileRestrictions

public class FileRestrictions
extends Object

This class allows the creation of restrictions for files and directories created below an actual directory which is passed into the constructor. These restriction are loaded by passing a FileInputStream which contains a XML File into the #loadRestrictions(InputStream) method and can be tested against by using the #isAllowed(VirtualFile) method.

 The XML file schema is:
        <root>
	   <variables>
	      <variable name="variable-name">
	         <type>INT-or-STRING</type>
	         <value>variable-value</value>
	         <precision>
	            <locations>number-of-fill-locations</locations>
	            <fill>fill-value</fill>
	            <side>front-or-back</side>
	         </precision>
	      </variable>
	      ...
	      ...
	   </variables>

	   <methods>
	      <method name="method-name">
	         <args>
	            <arg name="argument-name">
	               <type>INT-or-STRING</type>
	            </arg>
	            ...
	            ...
	         </args>
	         <action>method-behavior</action>
	      </method>
	      ...
	      ...
	   </methods>
   
	   <dirstruct name="root-directory-name">
	      <nodirs/>
	      <nofiles/>
	      <file name="file-name"/>
	      <dir name="directory-name">
	         <nodirs/>
	         <nofiles/>
	         <file name="file-name"/>
	         <dir name="directory-name">
	            ...
	            ...
	         </dir>
	      </dir>
	      ...
	      ...
	   </dirstruct>
	</root>
 
<variables> and <methods> can be created in this XML file so that they can be used in the <dirstruct> portion of the XML file. These <variables> and <methods> can be used inside the <dir> and <file> elements within the <dirstruct> element to allow for varrying directory and file names beyond the capability of regular expressions (which are also allowed).

VARIABLES (OPTIONAL):

Let's start with describing the <variables> portion of the XML file. As many <variable> elements as you would like can be specified inside the <variables> tag. The <variable> element must have a parameter, 'name', which is the name of this <variable>. Every <variable> is a global variable (that is, global in the scope of the XML file it is declared in -- it is not usable in other XML file, unless redeclared) so variable names are unique (however, are case sensitive) so thus a name can only be applied to one <variable>. Within the <variable> element there are three possible sub-elements that can be included. <type> and <value> are required and <precision> is optional. <type> can be either (and it must be in all UPPERCASE) INT or STRING (sorry, floating point numbers are not supported as of yet). This specifies what type of value will be given in <value>. This allows you to both use numerical values as either an integer or a string. <precision> can also be specified for each <variable>. This allows you to insure that an integer or string will take up a certain amount of space. This is especially useful when dealing with dates. For instance, say you had the following in your XML file:
        <variable name="myVariable">
	   <type>INT</type>
	   <value>3</value>
	</variable>
 
When myVariable was finally returned it would look like 3, however many times for dates you would like 03 returned. You can specify this by adding precision to the following XML:
        <variable name="myVariable">
	   <type>INT</type>
	   <value>3</value>
	   <precision>
	      <locations>2</locations>
	      <fill>0</fill>
	      <side>front</side>
	   </precision>
	</variable>
 
This insures that the number is always printed with 2 digits and if the number does not take up 2 digits worth of space a fill value 0 will be added to the front side of the integer, thus, in this example would give us 03. Note: <value>03</value> would NOT accomplish the same!!!!

METHODS (OPTIONAL):

Next let's look at the <methods> portion of the XML file. <method> elements must have a 'name' parameter, which is the name of the <method>. Every <method> is also global in the same way as is every <variable> and are also case-sensitive, thus method names must be unique. A <method> element may contain an <args> sub-element, however this is optional and only needed if the method is to take any arguments. If an <args> element is given, then it should contain at least one <arg> element. A <method> may contain as many <arg> elements as it needs. What is being specified by a <method> element is what would be known in java code as the method signature. Thus all we are going to specify is each argument's name and type. Thus each <arg> element must contain a 'name' parameter, which is the name of the argument and must contain a <type> sub-element, so it is known how to treat the arguments when the method is used within the <dirstruct> section of the XML file. Another sub-element, which is required, for the <method> element is the <action> element. This element contains the behavior of the <method>. Before going into detail about what can be placed within the <action> element let's first cover some syntax requirements for the XML file.

SYNTAX REQUIREMENTS:

When a <variable> is used it must be preceded by $ and inclosed in {} (e.g. ${myVariable}).
When a <method> is used it must be preceded by % and end with () (e.g. %myMethod(), however if arguments are given then %myMethod(12,9)).
When a <method> argument (<arg> element) is used is must be preceded by $ (e.g. $myArg).
When a literal integer is used it must be preceded by # (e.g. #234).
When a literal string is used it must be inclosed in " (e.g. "my age is 56 -- no not really").

NOTE: When passing arguments into methods the string and integer literal rules do not need to be followed because you have already defined what each argument type should be and they will be evaluated as such.

NOTE*: Also note that at present a <variable> cannot be passed as an argument to the methods. Just use the <variable> where needed inside the <action> element. This feature should hopefully be added in a later release.

METHOD'S ACTION ELEMENT USAGE:

The <action> element will evaluate expressions that contain both integers and strings. It obeys the rules of mathematical precedence and will also handle parentheses. It also, like Java, still follows the order of precedence when strings are present. That is, if you have the expression:
   #2+#4+" years old, going on "+#2+#4
It would evaluate to:
   6 years old, going on 24
You may use any <variable> declared within the same XML file and may also use any argument (<arg> element) declared within that <method>. Also string and integer literals may be used. Currently the only operators supported are +,-,*,/ (which are respectively: addition, subtraction, multiplication, and division). Parentheses, (), and embedded parentheses, (()()), are also all allowed.

DIRSTRUCT:

The final section of the XML file is the actual main purpose of the XML file. This is the XML that controls which directories the crawler will be allowed to crawl and which files will be allowed. The <dirstruct> element requires a 'name' parameter which is the path to the root directory that is to be considered (that is, all other directories below the given directory are unimportant and will not be crawled). You want your root directory path to stop at the first directory in which you are interested in more than one of its sub-directories or want file(s) inside it. For example, let say we want to crawl a remote site that has the following directory structure:
        -parent
           -child1
              -grandChild1
                 -greatGrandChild1
              -file1
                 -greatGrandChild2
              -grandChild2
              -file1
           -child2 
              -file1
           -child3
              -file1
              -file2
              -grandChild1
                 -file1
                 -file2
           -child4
 
Now, say, we only are interested in directories and files below the two shown 'grandChild1' directories. This would mean that for our <dirstruct> 'name' parameter we would put name="parent". This is because we need access to both 'child1' and 'child3' subdirectories. Now in order to avoid crawling 'child2' and 'child4' directories we have to specify <dir> elements. This would give us the following XML:
        <dirstruct name="/parent">
	   <dir name="child1"/>
	   <dir name="child3"/>
	</dirstruct>
 
This would restrict the directories allowed under 'parent' to only be directories with names either 'child1' or 'child3', all other directory names will be rejected. However, more must be added to this example because we have not yet specified any restrictions on files allowed beneath 'parent', we have to add the <nofiles/> element:
        <dirstruct name="/parent">
           <nofiles/>
           <dir name="child1"/>
	   <dir name="child3"/>
	</dirstruct> 
 
Now the only thing acceptable below parent is 'child1' and 'child3'. We have to still further our restrictions under 'child1' and 'child3'. Since under 'child1' we only want 'grandChild1' we would have to make another <dir> element and also add a <nofiles/> element:
        <dirstruct name="/parent">
	   <nofiles/>
	   <dir name="child1">
              <nofiles/>
	      <dir name="grandChild1"/>
           </dir>
	   <dir name="child3"/>
	</dirstruct>  
 
We have to do the same also for 'child3', giving us:
        <dirstruct name="/parent">
	   <nofiles/>
	   <dir name="child1">
	      <nofiles/>
	      <dir name="grandChild1"/>
	   </dir>
	   <dir name="child3">
              <nofiles/>
	      <dir name="grandChild1"/>
           </dir>
	</dirstruct>  
 
From the example directory structure above, with this XML file specified, that directory structure would be limited to:
        -parent
           -child1
              -grandChild1
                 -greatGrandChild1
              -file1
                 -greatGrandChild2
           -child3
              -grandChild1
                 -file1
                 -file2
 
Say we now decide that we only want files below the two 'grandChild1' directories -- that is, no directories. So we would change or XML by adding in the <nodir/> element:
        <dirstruct name="/parent">
	   <nofiles/>
	   <dir name="child1">
	      <nofiles/>
	      <dir name="grandChild1">
                  <nodirs/>
              </dir>
	   </dir>
	   <dir name="child3">
	      <nofiles/>
	      <dir name="grandChild1">
                 <nodirs/>
              </dir>
	   </dir>
	</dirstruct>   
 
Which now restricts our directory structure to:
        -parent
           -child1
              -grandChild1
           -child3
              -grandChild1
                 -file1
                 -file2
 
Let's further specify now that we only want 'file1' in the '/parent/child3/grandChild1' directory. This would change the XML to:
        <dirstruct name="/parent">
	   <nofiles/>
	   <dir name="child1">
	      <nofiles/>
	      <dir name="grandChild1">
	          <nodirs/>
	      </dir>
	   </dir>
	   <dir name="child3">
	      <nofiles/>
	      <dir name="grandChild1">
	         <nodirs/>
                 <file name="file1"/>
              </dir>
	   </dir>
	</dirstruct>
 
Our new allowed directory structure would now be:
        -parent
           -child1
              -grandChild1
           -child3
              -grandChild1
                 -file1
 
NOTES: -You would not want to use the <nofiles/> and <file> elements in the same directory (same goes for the <nodirs/> and <dir> elements) because you would be specifying that you don't want any files in that directory, and then contradict yourself by specifying a <file> element that is okay to have. The <file> element states that no other file but the file I specified is allowed. The only exception is if you have two or more <file> elements in the same directory -- this is allowed. It follows the same rules as the <dir> element in the example given above where only 'child1' and 'child3' were allow. The two don't cancel each other out.

ADVANCED USAGES OF DIRSTRUCT:

Regular expressions are allowed in the 'name' parameter of both <dir> and <file> elements. Also any <method> or <variable> element declared can be used within the 'name' parameter of both <dir> and <file> elements. There are also several predefined variables that can be used.

REGULAR EXPRESSIONS:

The regular expressions are parsed by the Pattern class (See its documentation on rule for specifying regular expressions). Here is an example use of a regular expression:
        <dirstruct name="/.../temp/test">
	   <nofiles/>
	   <dir name="\d{4}-\d{2}-\d{2}">
	      <nodirs/>
	   </dir>
	</dirstruct>
 
This would restrict the directory files in directories below /.../temp/test to only directories whose names are dates of the format: YYYY-MM-DD.

PREDEFINED DATE VARIABLES:

There are several predefined date variables than can be put as the <value> of a <variable> and then used. These variables are:
        [DATE.DAY]      - day of today's date
        [DATE.MONTH]    - month of today's date
        [DATE.YEAR]     - year of today's date
        [DATE-N.DAY]    - the day of the date N days ago
        [DATE-N.MONTH]  - the month of the date N days ago
        [DATE-N.YEAR]   - the year of the date N days ago
        [DATE+N.DAY]    - the day of the date N days from now
        [DATE+N.MONTH]  - the month of the date N days from now
        [DATE+N.YEAR]   - the year of the date N days from now
 
        -sorry, no DayOfYear implemented yet -- hopefully in a later release
 
Usage:
        <root>
 	   <variables>
 	      <variable name="todaysDay">
 	         <type>INT</type>
 	         <value>[DATE.DAY]</value>
	         <precision>
	            <locations>2</locations>
	            <fill>0</fill>
	            <side>front</side>
	         </precision>
	      </variable>
	   </variabls>
	   <dirstruct name="/path/to/parent/dir">
	      <nofiles/>
	      <dir name="MyFiles">
	         <nodirs/>
	         <file name="MyPaper_${todaysDay}"/>
	      </dir>
	   </dirstruct>
	</root>
  
This would allow only a file in /path/to/parent/dir/MyFiles which had the name which started with MyPaper_ and ended with the day of the current day of the month. For example, if to days date was 03/23/2005, then the file name allowed would be MyPaper_23.

METHOD AND VARIABLE USAGE IN DIRSTRUCT:

Here is an example of using <variables> and <methods>:
        <root>
	   <variables>
	      <variable name="DAY">
	         <type>INT</type>
	         <value>[DATE.DAY]</value>
	         <precision>
	            <locations>2</locations>
	            <fill>0</fill>
	            <side>front</side>
	         </precision>
	      </variable>
	      <variable name="MONTH">
	         <type>INT</type>
	         <value>[DATE.MONTH]</value>
	         <precision>
	            <locations>2</locations>
	            <fill>0</fill>
	            <side>front</side>
	         </precision>
	      </variable>
	      <variable name="YEAR">
	         <type>INT</type>
	         <value>[DATE.YEAR]</value>
	      </variable>
	   </variables>
   
	   <methods>
	      <method name="ADD">
	         <args>
	            <arg name="1">
	               <type>INT</type>
	            </arg>
	         </args>
	         <action>"THE_YEAR_PLUS_"+$1+": "+(${YEAR}+$1)</action>
	      </method>
	      <method name="HOW_OLD_AM_I">
	         <action>${YEAR}-#1984</action>
	      </method>
	      <method name="DATE">
	         <action>${YEAR}+"-"+${MONTH}+"-"+${DAY}</action>
	      </method>
	   </methods>

	   <dirstruct name="/path/to/parent/dir">
	      <nofiles/>
	      <dir name="AGE_%HOW_OLD_AM_I()"/>
	      <dir name="DATE">
	         <nodirs/>
	         <file name="%ADD(5)"/>
	      </dir>
	   </dirstruct>
	</root>
 
This would accept only the directories under /path/to/parent/dir which had the name (given today is 9/7/2007) 'AGE_23' or '2007-09-07'. This would allow any file or directory in under 'AGE_23', but would only allow a file with the name 'THE_YEAR_PLUS_5: 2012' in the directory '2007-09-07'.

Author:
bfoster

Method Summary
static boolean isAllowed(ProtocolPath path, VirtualFile root)
           
static boolean isAllowed(VirtualFile file, VirtualFile root)
           
static LinkedList<String> toStringList(VirtualFile root)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

isAllowed

public static boolean isAllowed(ProtocolPath path,
                                VirtualFile root)
Parameters:
path -
Returns:
The initial cd directory which needs to be changed to (in order to take care of possible auto-mounted directories)

isAllowed

public static boolean isAllowed(VirtualFile file,
                                VirtualFile root)

toStringList

public static LinkedList<String> toStringList(VirtualFile root)


Copyright © 1999-2011 Apache Incubator. All Rights Reserved.