NAME

d2dbackup.conf (5) -- configuration file for d2dbackup

DESCRIPTION

The d2dbackup configuration file contains various configuration settings for d2dbackup, the disk-to-disk backup system.

The file is perl code.

REQUIRED PARAMETERS

$BDISKSTEM = '/bkup'

The common portion of the full path to all the backup disks. This stem will be used to find all the backup disks using a unix file glob of "$BDISKSTEM*", and will form the basis for a regexp to extract the backup disk portion of paths.

$ERRORMAIL

Email address to whom errors should be sent

OPTIONAL PARAMETERS

%preference_factor

A hash mapping disk names (mount points) to preference factors. The preference factor weights the probability of the disk being selected when stochastically choosing a disk for a backup version. The larger the preference factor, the greater the chance that disk will be chosen. Typically you might want to weight more recent, modern disks higher than older disks closer to their expected MTBF. It is not necessary to set higher preference factor for larger disks over smaller disks, as the probability of choosing a disk is already proportional to the amount (not percentage) of free space on the disk measured in bytes.

The default preference factor for all disks is 1.

$EXCLUDEDISKS = 1

When stochastically selecting which disk to write a new version to, exclude the disks which hold the last $EXCLUDEDISKS versions of the file. For example: Suppose that out of 4 disks, disks 1 2 and 3 contain versions 1 2 and 3 of the file, respectively. If $EXCLUDEDISKS is 1, then disk 3 will be excluded and the new version will be written to disks 1, 2, or 4. If $EXCLUDEDISKS is 2, then disks 2 and 3 will be excluded.

If you set $EXCLUDEDISKS to a number N the same as the number of backup disks you have, you will exclude all the disks, and once N versions have been written, the next backup will fail with a fatal error.

If you set $EXCLUDEDISKS to one less than the number of backup disks you have, then you force future versions to be written to the same disk as the V mod Nth version, and you make it nearly impossible for the disks to be filled up stochastically equally. This is probably bad.

If you set $EXCLUDEDISKS to 0, you can in principle run d2dbackup with just a single backup disk.

Setting $EXCLUDEDISKS to 1 guarantees that the next to last and last versions will always be on different disks, and is probably sufficient.

$BLOCKSIZE = 1024

The size in bytes of a block as returned by the perl module Filesys::Df. According to the documentation of that module, 1024 is the default. However, I know that on some OSs the unix df command uses 512. Just in case it turns out that Filesys::Df does not compensate appropriately on your OS, or some strange cross-NFS thing is happening, this parameter can be set to whatever units Filesys::Df is found to return. Please let us know if you find a situation where Filesys::Df does not do the right thing.

$MINCOMPRESS = 4096

Minimum size for compression. Files smaller than this in bytes will not be compressed.

$NOCOMPRESSPAT = '\.(gz|tgz|zip|hqx|sit|z|bz|bz2)\b';

A perl regexp. If a file name matches, it will not be compressed as it is backed up, just copied. The regexp will be matched case-insensitive (/i).

$DISKCHECKPARAM = .001

$MAXDISKCHECKCOUNT = 200

df is re-run on a backup disk when either the proportion of accumulated writes to remaining free space grows larger than $DISKCHECKPARAM or the number of writes to the disk exceeds $MAXDISKCHECKCOUNT. This is just a small optimization to save having to re-run df every time we write a file.

$MAXVERSIONS = 60;

The maximum number of backup versions for a single file. We recommend setting this number fairly high, so that you will have backup versions of small, frequently changing but very important files going back a long time, such as /etc/passwd. Remember that a new version is created only if the file actually changes.

$MAXSIZE

This is the critical parameter for tuning your system. MAXSIZE is the maximum size of all backup versions of a given file in bytes on an empty set of backup disks. Whenever a new version of a file is written, we delete old versions of the file until the total disk space the backup versions occupy is less than MAXSIZE times how full the backup system is right now.

What this boils down to is that MAXSIZE really determines how you allocate your backup space between small files with lots of versions and large files with a few versions. A large value of MAXSIZE will tend to favor the large files, a small value of MAXSIZE will tend to favor the small files. Unfortunately it is difficult to predict the effect of various settings. In any event, MAXSIZE will rarely come into play if your backup disks are sized adequately.

See also HOWTO.set_maxsize

$MAXDIRENTS = undef;

Max number of files in a directory.

Not used by d2dbackup (yet, as of 201409) but put here because used by copytreeplus.

$MAXFILESIZE = undef;

MAXFILESIZE is simply a limit on how large a file d2dbackup will try to back up. Files larger than this limit will be skipped. By default it is undefined, meaning there is no limit. If a file is skipped for this reason, an error message will be generated: the intent is that MAXFILESIZE should be set very high if at all. At NBER we use it to avoid backing up a temporary .tar file of another disk.

$MAXFILESIZEREGEXP = undef;

If this regular expression is set, then only apply MAXFILESIZE if the file path matches the regular expression. If it is not set, always apply MAXFILESIZE.

$DIFFREMOVE = 0

If DIFFREMOVE is set, d2dbackup will remove duplicates from the set of backup versions of each file. Duplicates are detected by having the same size and modification time.

$TIMESTAMPREADFILE = "";

$TIMESTAMPWRITEFILE = "";

In normal operation, you do not need to worry about time stamp files, since d2dbackup checks every file in the source file system every time you do a backup. See HOWTO.improve_performance for hints on using time stamps to speed up a daily d2dbackup run.

The timestamp read file is the file whose time stamp (last modified by default, see below) is used as a cut-off, so that source files older than that are skipped.

The timestamp write file is the file which is touched before the backup begins. Obviously this would require d2dbackup to have write acess to the source filesystem (or wherever the timestamp write file is). A typical setup would be to have both of these files be the same, so that you do today's backup since the time of the beginning of yesterday's backup.

Both filenames are assumed to be relative to the source filesystem path, unless a full path (starting with /) is provided.

If the values are set to the empty string, then no timestamp file will be read or written, respectively.

The special parameter '=SOURCE=' will be replaced by a string representing the path to the source file system.

$WHICHTIME = "M";

When looking at the timestamp read file, $WHICHTIME specifies whether we should use the modification time (M) or the access time (A). Only capital A or capital M are allowed as values. See HOWTO.snapshots for a case where access time might be preferred.

$NOCHECKLASTV = 0;

Set this variable to 1 if you want to SKIP checking whether the modification time of the source is the same as the mod time of the last existing backup version before creating a new version. You may want to set this to 1 if you are running a daily incremental backup against a timestamp, and you know that only files later than the timestamp are being considered.

Setting this variable is dangerous in that you may unnecessarily copy additional backup versions of a file that are exact duplicates of an existing backup version. That can happen if you are not using read timestamps correctly. You might want to set it if you know what you are doing and you want your daily incremental backup to run absolutely as fast as possible. See HOWTO.improve_performance.

HOST = ""

The value of this variable is inserted into the path to the backup files, between the disk mount point and the path to files. For example, suppose you are backing up /etc/passwd, and the backup disks are /bkup[1-3]. A backup version might normally look something like:

/bkup1/etc/passwd;1

If you set $HOST to "myhost", you would get:

/bkup1/myhost/etc/passwd;1

This might be handy if you are running d2dbackup on different systems, to an NFS-mounted set of backup disks.

SLOW = 0

Ideally backups should be done when no one is using the file system being backed up so as not to inconvenience anyone. That is not always possible. Set $SLOW to force d2dbackup to pause for that many seconds between processing each directory.

DAYS_BUCKETS = (7,30,365)

For statistics during garbage collection, these specify the boundaries between categories for the histogram of count of files by age. 7,30,365 lets you see information on what proportion of files are less than a week, less than a month, and less than a year old.

LOGFILE = "/var/log/d2dbackup.log"

A file for logging error messages. Very few errors in d2dbackup are fatal. You may wish to check this file from time to time.

NOLOCKFILE = 0

Normally, d2dbackup creates a lock file for each source fs when it starts to run. If by some chance d2dbackup is killed or interrupted in the middle, that lock file will stick around, and the next time an attempt is made to d2dbackup that source FS, it will die with an error message unless the lock file is manually removed.

To disable this feature, set NOLOCKFILE=1.

mapsrc2backup( <source path> , <backup path> )

This function maps a portion of the full path to a source file to a different path on the backup target. This is handy either to compress an inordinately long path being backed up, or in the case of backing up from a snapshot directory, allowing different source paths to map to the same backup path in different circumstances.

For example, suppose you normally run nightly backups off the nightly snapshot. You might say:

mapsrc2backup('/disk/nber7/.snapshot/nightly.0' => '/nber7');

But when you want to do garbage collection which may take more than 24 hours to run, you need to use a longer-term snapshot:

mapsrc2backup('/disk/nber7/.snapshot/weekly.0' => '/nber7');

Both these paths map to the same backup path.

A good way to support this distinction is to put the second version above into a separate "d2dbackup.gc.conf" file, and use it with d2dbackup -f in order to use the different mapping.

@DIR_KEEP_RE

A list of perl regexps that are applied to directories encountered in traversing the source tree. Each element of this list is a pointer to a 2-element list. First elt is a regexp, second elt is either 1 or 0, indicating what to do with the directory if it matches. 1 means keep that directory and descend into it, 0 means purge it from the find.

Regexps are applied in the order listed. Once a match is found, that disposition takes place and the remainder of the list is ignored.

@PRE_RE

A list of regexps to be applied to the full path including the file name of a file to be backed up. The data strucutre is as above. A 0 means ignore the file, 1 means back it up.

SEE ALSO

d2dbackup(1)

AUTHOR

Alex Aminoff, alex_aminoff@alum.mit.edu

COPYRIGHT

Copyright 2002-2004, shared by the National Bureau of Economic Research and Alexander Aminoff

Permission is granted to copy, modify, and use this software under the Gnu General Public License, found in the file LICENSE.