< Custom Rules | Request Filters >

Basic Rules


Specify here special rules to customize the Job Package download process. The available basic rules are:

  1. Depth
    Defines the maximum recursion depth that must be reached during a Job Package download job. A website structure, with regard to the web pages depth, can be viewed like this:



    depth.png


  2. Links Limit
    Defines the maximum number of links that must be followed during the Job Package download process. When this limit is reached then the download process will stop;

  3. Time Limit
    Defiles the maximum time (in milliseconds) that a Job Package download process must not exceed. When this time limit is reached then the download process will stop.

File Size Filter
By means of these settings, the decisions can be made depending on the web resources file size. For example, in order to avoid downloading large files, these settings may be used.

The available file size filter properties are:

  1. File Size (from)
    Defines the start value of the file size interval, from which files will be considered by this fiter;

  2. File Size (to)
    Defines the end value of the file size interval, from which files will be considered by this fiter;

  3. Reply "Content-Length' not available action
    Defines the actual action that must be took for the file whose size if between File Size (from) and File Size (to). At this moment the two possible values are:

Limits

Max. recursion depth:

The recursion depth defines how deep ItSucks should crawl through linked web sites. Think about a site structure like this:



depth.png


                                      _ site3.html
                                     /
                         _ site2.html
                        /
            _ site1.html
           /            \_ yellow.png
 index.html
           \_ background.png

If you set the recursion depth to 0, you will only get the index.html.
With a value of 1, you will get index.html, site1.html and background.png.
With a value of 2, you will get index.html, site1.html, background.png, site2.html and yellow.png.
With a value of 3, you will get index.html, site1.html, background.png, site2.html, yellow.png and site3.html.
When set to -1 it's unlimited.

Time limit

Defines a time limit. If the time limit is reached, no more links are added to the "open" list. After all links in the "open" list are finished, the download ends.

Max. links to follow

Define a maximum limit of links (URLs). When the limitation is reached, no more links are added to the "open" list. After all links in the "open" list are finished, the download ends.

URL prefix Filter

Defines a prefix for the URL. When set, only URLs which are beginning with the prefix are accepted. This can be handy of only a specific directory should be downloaded. Only a string is allowed, no regular expresions.

Example: http:www.example.com/section1/

Hostname Filter

A host filter can be set if ItSucks should follow only links whose hostname matches an regular expression. To do so, remove the ".*" entry from the "Allowed Hostname" box and add something like ".*google.de". In this case ItSucks will only retrieve files from an host like "images.google.de", "google.de" or "http://www.google.de". Be careful to not remove all entries from the filter list. In this case no hostname is allowed.

"Save to Disk" Filter

To control which filetypes should be saved on disk, this filter can be used. Only files matching one of the regular expressions are saved on disk. As an example to accept only jpegs, remove the ".*" entry from the list and add ".*jpg$". When removing all entries from the filter list, no files will be saved on disk.

< Custom Rules | Request Filters >