Rules Files Format

This page describes the raw format of a Rules File.

This describes the format used in Dirk’s configuration file, mainly used to set Organization Rules.

In most cases, you can use Dirk’s internal Rule Editor to configure rules. However, there are some cases where may be necessary or easier to manually configure them.

This is intended to be a reference. For general information about rules and configuration, please visit here.

Layout of a Configuration File

Comments

The configuration file supports single-line comments, indicated by the ‘#’ character at the beginning of a line.

Example:

# This is a comment and will be ignored

Sections

The configuration file is separated into the following sections, which are separated from each other by separators with the following format:

<SectionName>

These separators must begin at the start of a new line, and no text should follow it on the same line.

Currently, the following sections are recognized:

Globs

Globs are a convenient method of specifying groups of files in a folder of a specific pattern and are used in various locations throughout the configuration file.

The glob patterns should be mostly familiar to anyone who has used them in other applications.

In general, the following rules apply to Glob patterns used in the various rules:

  • If NO leading slash is given, then the pattern will be checked regardless of the path (so, it will work in all directories that are descendants of the current directory)

  • If a leading slash IS given, then the pattern will only apply to the current directory.

  • A single ‘*’ means all files in a directory

  • A double ‘**’ means all files in the current directory and all of its descendants

  • A ‘>’ character at the end of a pattern indicates that it is NOT case-sensitive

Examples Globs:

Glob Pattern Description
* match all entries in every directory that is processed
\* match only entries in the current directory
*.jpg match all jpg files in every directory that is processed
\*.jpg match only the jpg files in the current directory
*.jpg> match all jpg files (or JPG, or Jpg, etc) in every directory that is processed
secret\**\*.doc match all doc files located in any directory named ‘secret’ or any descendants of a directory named ‘secret’

Ignore Section

Files that are ignored will be removed from consideration for ALL operations - they essentially become invisible to Dirk.

Examples:


<ignore>

# Ignore 'tmp' directories
tmp\

# Ignore 'bak' files
*.bak

Move Section

This section defines rules for moving local files contained within a Dirk-managed repository

Note that there can be multiple <move> sections in a file, each with its own set of rules.

Examples with extended documentation:


<move>

# There may be multiple match variables in each section
# Take note of the globbing patterns used for match_directories:
#
#     /dump/pictures      = will include /dump/pictures itself.
#     /dump/pictures/*    = will include any immediate subdirectories of /dump/pictures (but will NOT include
#                           /dump/pictures itself !!)
#     /dump/pictures/**   = will recursively include any immediate OR more-deeply nested subdirectories of /dump/picture
#                           (but will NOT include /dump/pictures itself !!)
#
# So if you want to match any files within /dump/pictures AND its entire hierachy, you'd have to specify it thusly:
# match_directories=/dump/pictures;/dump/pictures/**
#

name=Move multimedia files
match_directories=/**

# Example for matching image files
match_files=*.gif>;*.jpeg>;*.jpg>;*.pcx>;*.png>;*.tif>;*.tif2>

# Example for matching video files
match_files=*.avi>;*.mk4>;*.mov>;*.mp4>;*.mpeg>;*.mpg>;*.qt>;*.vob>;*.webm>;*.wmv>

# Example for matching audio files
match_files=*.aa>;*.aac>;*.aax>;*.flac>;*.m4a>;*.m4b>;*.m4p>;*.mp3>;*.ogg>;*.oga>;*.wav>;*.wma>

# Set a minimum filesize - by default this is 1 byte (ie, ignore 0-byte files); set to 0 to match ALL files regardless
# of their size; set to 1 to ignore 0-byte files; set to any other value to ignore files smaller than the specified
# value
match_min_filesize=1


#
# Variables can be specified using the format: {VariableName}. See the "Path Variables" reference page for more
# information.
#
# The literals '{' or '}' can be specified as '{{' and '}}', respectively.
#
# If a file already exists in the destination path with the same name, a number will be added to it incrementally until
# it finds a name that does not exist (e.g.: Blah-1.jpg, Blah-2, Blah-3.jpg, Blah-4.jpg etc..)
#
# Once a file is moved, it is considered to be in its "canonical" location. This means that if the file is located
# elsewhere, those other duplicated locations may be deleted (depending on settings)
#
# NOTE: The destination_directory is, by default, an absolute path. So, if you specify /dir_name/photos, that refers to
#       the /dir_name directory from the root of the system. To make a path relative to the directory where the rule is
#       defined, either do not prepend the path with any slashes, OR prepend the path with a dot (eg: ./dir_name). To
#       make a path relative to the base directory of the repository, prepend the path with the {RepoBase} variable
#       (eg: {RepoBase}/dir_name)
#

# There may be only one of each destination variable in each section
destination_directory=./pictures/{CreateYear}/{CreateMonth}/{CreateDay}
destination_name={OriginalFilename}

# If delete_subsequent_duplicates is set, then subsequent identical files found after an initial file is moved will be
# automatically deleted; otherwise it will be kept
delete_subsequent_duplicates=0

# The Move Conflict Strategy is necessary to deal with cases where multiple files are matched to be moved to the same
# destination path. There are currently three strategies to deal with this case:
#   rename_on_commit = Allows files to be renamed when queued Move operations are being committed. This is the strategy
#                      that is most likely to succeed at moving files. However, this comes at the cost of being the
#                      least deterministic (ie, the paths that are in the Action Queue will not always match the final
#                      path of the moved file). This is the default.
#   rename_on_organize = This allows files to be renamed at Organization time if it is discovered that there would be
#                        a conflict otherwise. This is more deterministic than rename_on_commit, however there are some
#                        scenarios where the filesystem can change between when the file is added to the Action Queue
#                        and when it is committed to the filesystem - in which case the move will fail.
#   block = this strategy blocks a move as soon as it is detected during Organization time; if the Organize step
#           discovers a conflict, then the file will simply not be moved
move_conflict_strategy=rename_on_commit

Dedupe Section

This section defines rules for removing duplicate files contained within a Dirk-managed repository.

Note that there can be multiple <dedupe> sections in a file, each with its own set of rules.

For de-duplication, Dirk will attempt to delete all duplicate files that are NOT in a canonical location. The following is a list of ways a location can be considered canonical:

  • The file was moved to the location by a rule. When the happens, the destination path is automatically marked as the canonical location for that file.

  • Directories can be marked as canonical in a rule, in which case files within it are marked as canonical if duplicates are found AND there are no conflicting canonical locations.

Examples:


<dedupe>

name=Deduplicate all

# Define which directories should be searched for duplicates
match_directories=*
match_files=*

# Set a minimum filesize - by default this is 1 byte (ie, ignore 0-byte files); set to 0 to match ALL files regardless
# of their size; set to 1 to ignore 0-byte files; set to any other value to ignore files smaller than the specified
# value
match_min_filesize=1

# Define "canonical directories" - essentially locations that are to be marked as the "canonical" location for a file
# with the repository if multiple exist. This can be used as a hint to decide which sets of files to delete
# (files in their canonical locations are not deleted)

# Any directory named organized_folder2 or organized_folder3
canonical_directories=organized_folder2;organized_folder3

#
# The following setting controls what happens when there is a set of duplicate files and none of the files in the set
# are determined to be in the Canonical location.
#
# If this is enabled, then the first file found in a set of duplicate files will be kept and all others will be deleted.
# Note that this does NOT mark that first file as a Canonical location as it's possible that a "better" Canonical
# location might be derived at some point in the future.
# If this is NOT set (the default), then NONE of the files in the set of duplicates will be deleted
allow_delete_if_no_canonical=1

Clean Section

This section defines rules for cleaning files and folders contained within a Dirk-managed repository.

Currently, the delete_empty_directories is the only support clean rule.

Example:


<clean>

name=Clean all empty folders

# Define which directories should be searched for cleaning
match_directories=**

# If enabled, any matched directory will be deleted if it contains no other files or directories
delete_empty_directories=1