2· exclude/include patterns
In my opinion <ignore_file> should be removed because with <exclude> you
have full power and is not beautifull to have these files around just to
override the general setup. instead a user conf file should reside in
some dir like ~/.cback and user can setup cron job as well...
Heh. Well, if it's not beautiful to you, there's no need to use it. :)
Seriously, keep in mind that a per-user ~/.cback file does not really
provide equivalent functionality to a per-directory ignore indicator
file.
Per-directory ignore files apply system-wide. If any user creates an
ignore file in a particular directory, any Cedar Backup run by any user
will ignore that directory (assuming the backup is configured to pay
attention to an ignore indicator file at all).
Assuming that the per-user ~/.cback file behaved in the "standard" way,
it would only apply to Cedar Backup runs executed by that particular
user, which is not the same thing as you get with ignore indicator
files.
Remember, Cedar Backup is primarily intended to be run as root for large
parts of a system which might contain multiple users, rather than being
run by lots of individual users on a system. (See the distinction?)
Instead what I really miss from cback is the ability to include files
from excluded dir. I'd like to make exceptions to subtree/files from
dirs excluded. This is in my opinion the most needed feature.
Hmm. I can see why you might want that, but you can accomplish the same
thing today by specifying finer-grained backups and exclusions, so it's
kind of low on my priority list.
What would you expect configuration to look like? Some sort of
exclusion-within-an-exclusion? That makes me wonder whether it's really
worth making configuration any more complicated than it already is.
3· global collect
I don't get what global collect/collect_mode and collect/exclude are
for. Since you specify it per dir basis does it override the parent
settings? So if I specify on a dir a collect_mode daily and a
collect/collect_mode weekly globaly when does it run?
If you specify a global configuration parameter like collect mode, it
applies to all directories being collected, acting as a default. If you
specify a different value for collect mode on an individual directory,
then that value overrides the default for that directory only.
For your particular example, if you specify a global "daily" collect
mode, but a "weekly" collect mode on a single directory, than by default
directories will use the "daily" mode but that one directory will use
the "weekly" mode.
Does that make sense?
Personally, I never specify a default because Cedar Backup v1.0 didn't
allow it, and I never got around to updating my configuration files.
However, if I were starting from scratch, I could see myself specifying
a default mode of "incr" and overriding just a few directories like /etc
with "daily" or something.
4· index/search
There is no way to know in which backup disc you had a certain file. It
would be really nice if the digest file can be easly searched and
connected to a backup-set in order to know where you can find a certain
file you had now deleted. I still have to found backup/restore solutions
providing this :(
Ah. I think what you're looking for is a way to say, "which backup disc
should I look in to find this particular file?".
You're right, Cedar Backup doesn't give you a way to do this. In fact,
it doesn't even really know anything about this, because (for instance)
it has no way to know whether you switched discs at the beginning of the
week, or overwrote your current disc, or even if perhaps you put in
another bogus disc which Cedar Backup was able to attach a new ISO
session to. It's not something that I could add very easily. :(
5· slowness/debug info
The backup process seems really to much slow. For example on one
excluded tree it took 11 minutes just to realize it had not to proccess it:
2005-10-19T10:32:09 CEST --> [DEBUG ] Path [/home] is excluded based on
excludePaths.
2005-10-19T10:43:52 CEST --> [DEBUG ] Path [/var/tmp/backup] is
excluded based on excludePaths.
WHY???
Well, because did didn't get to /home util that point. :)
Internally, Cedar Backup builds a list of files to be backed up before
executing the backup. As Cedar Backup traverses the directory heirarchy
for a particular collect directory, it checks individual file and
directory names against various rules to decide whether they are
excluded or not. In this case, the log didn't list /home as being
excluded until it actually found a directory matching an exclusion rule.
(In other words, even if you have hundreds of exclusions, Cedar Backup
doesn't really care about them unless it actually finds something to
apply an exclusion to.)
and 36 minutes to comute hash and realize there wasn't changed files (it
was the first time)
2005-10-19T10:46:20 CEST --> [DEBUG ] Digest [/var/tmp/backup/-.sha]
does not exist on disk.
2005-10-19T11:22:16 CEST --> [DEBUG ] Removed 0 unchanged files based
on digest values.
There is not enought debug info and the programm run for hours without
saying what is trying to do.
Yeah, and then the last user who complained said that there was too much
in the debug log. :)
It's a struggle to find a balance in the right level of logging, without
providing so much information that it's useless. Cedar Backup v1.0
listed every file it backed up, but that seemed excessive. Cedar Backup
v2.0 just logs information about individual collect directories, because
that's how I expected users to configure it.
You see, I never really expected users to want to back up their entire
root directory, and it's not something I've really ever done. Because
of that, I've never watched the log under those circumstances. I guess
the log would probably feel more "responsive" if you listed specific
directories to back up rather than backing up the root directory and
listing specific directories to exclude. You'd get entries as Cedar
Backup moved from collect directory to collect directory.
I will give some thought to how I can increase the debug output without
flooding the log -- but as I said, it's a difficult balancing act.
There probably aren't too many middle grounds between what you see (log
entries for individual collect directories) and log messages for every
single backed up directory or perhaps every single backed up file.
Which would you prefer? Can you really imagine wanting either of those
options? I am open to adding back in more logging if you really think
it's useful.
6· There is no pre/post process command execution. This is important in
particular to hack some script to add features cback doesn't (yet) have
or integrate it in the flow of other programs.
Funny, I just had someone else ask for that just last week, in bug #27:
http://cedar-solutions.com/cgi-bin/bugzilla/show_bug.cgi?id=27
Can you give me some thoughts on how you would expect this to work?
What would the commands be -- just shell commands, perhaps required to
start with an absolute path? Would you ever need to list more than one
shell command in Cedar Backup configuration for a given action, or would
you expect to combine all of your actions into one single shell script
somewhere on the filesystem? Can you ever imagine wanting to provide
Python code (a function) rather than a shell command?
Would it be enough to specify a single "pre-action hook" and a single
"post-action hook" in configuration for that command, or would it be
better to have a separate configuration action mapping hooks to actions?
(The first might be easier to understand, but the second would allow you
to hook extensions without extensions having to know anything about it.)
Let me know what you think.
7· Messaging
It would be nice to make the email message configurable and to be able
to have an email or other form of message (IM for ex) run also on success.
The thing is, Cedar Backup right now doesn't even know anything about
email or any other form of notification. It just assumes it's running
in a terminal and prints things to stdout. If something else (i.e.
cron) emails that output around, then so much the better. So, when you
suggest making the email message configurable, you're really suggesting
that I somehow make Cedar Backup's output to stdout configurable (or
cron's email format configurable), which is not something I'm likely to
do.
I guess I'm open to adding in some sort of hook (maybe something a
little like an extension, a Python function with a standard interface)
for notification, but I would have to give it some thought before
deciding for sure. One problem is how to appropriately handle error
conditions. In other words, it's easy to send success status messages,
but if the program crashes or fails hard, it may be difficult to send
failure messages. In the current model, cron handles all of the
difficult parts for me. :)
8· Space waste
It waste to much space because the collect fase is unneeded for me. You
can pass the thing to include/exclude in tar without the need to
actually copy it in an intermediate collect stage. This makes the
process twice fast.
What you think?
Are you using a pool of one? The reason I ask is, while collected data
always ends up being a little extraneous on a master machine, it still
does seem to make sense when backing up a set of machines (because it
makes the process more forgiving). I can understand why you would want
to combine the two steps when using a pool-of-one, however.
Hope is not too much and you feel depressed ;)
Nope, it doesn't make me feel depressed. I'm actually kind of
fascinated, because as people start to use Cedar Backup, almost none of
them seem to use it the way it was intended, in other words the way I
use it. :)
In particular, I'm surprised at how many people use it for the
pool-of-one case and I'm surprised that people are interested in backing
up their entire root directory gigabytes at a time. Cedar Backup isn't
yet really optimized for either of these cases. The pool-of-one backup
is too complicated, and Cedar Backup does seem to be excessively slow
when backing up really huge directories, because of an intentional
design decision to work in terms of lists of files (which makes
functionality like exclusions and ignore files very straightforward).
Anyway, I'm glad you wrote me. Please let me know what you think about
some of my comments above. Hopefully, we can get your cdrecord issue
straightened out, too.
KEN