This is the home page for the National Bureau of Economic Research's home-grown disk to disk backup program written in Perl and available under the GPL.
As the cost of disk storage has dropped relative to tape old rules of thumb about backup become out of date. Rotating storage has different characteristics from tape storage, chiefly the low cost of leaving drives online and the ease of random access. We want to take advantage of these characteristics, rather than merely reproduce the inconveniences of tape in this system.
d2dbackup copies and compresses files from one or more "source" directories to two or more "target" disks. Only new or changed files are copied. Files are never copied over existing backups - instead a semicolon and version number are appended to the file name in the backup store. The target disks are ordinary Unix filesystems and reproduce the directory structure of the source (including file ownership and permissions but not directory ownership and permissions) on each target drive. Normal filesystem commands are used to locate and retrieve backups, for example:
There is a garbage collection system that makes intelligent choices about what file versions to remove. Small files are kept in many versions. Obsolete versions of large files may be deleted, especially versions that had a short lifetime before replacement on the source filesystem. Various parameters that control this selection process may vary adaptively and dynamically as the free space in the backup pool varies.
It is not possible to reproduce the state of the filesystem at an arbitrary prior date, but reconstructing a failed source directory or drive as of the last backup is possible.
We use JBOD rather than a RAID array for the target drives. We found the inability to expand RAID volumes an insurmountable disadvantage in an era of rapidly declining drive prices. We also worry that a simultaneous failure of two drives could destroy all the files on all the drives.
With our JBOD we don't worry much about individual backup drive failures. Any lost files are copied again from the source on the next backup run. Although some version history may be lost d2dbackup doesn't store adjacent version on the same disk, so the loss is limited. Alternatively one could make each target a small RAID, and have the best of both worlds.
We do not use a database, probably for religious reasons, but we tell ourselves it is because of the difficulty keeping it current and because it would stand in the way of users doing their own restores.
It isn't hard to list disadvantages of d2dbackup. It isn't off-site, can't handle a bare metal restore, does not store accurate directory permissions, (typically) does the compress after network transmission rather than before, and rather over depends upon a creaky NFS network filesystem. For the moment, we continue a traditional weekly "ufsdump" to tape as a protection against d2dbackup failures and as an off-site backup solution.
Nevertheless, we feel d2dbackup serves our interests better than anything we have seen, either free or proprietary. It takes full advantage of random access, requires no specialized client or restoration software, eliminates the excessive proliferation of copies of long-lived files, allows easy access using tools users already know, and allows us to keep multiple file versions over long periods of time.
Comments, complaints and suggestions should be sent to aminoff@nber.org. In fact, at this point we would like to hear from every actual user.
Daniel Feenberg
Alex Aminoff