[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feedback/todo on cback



Hello again Ken!

Thanks!
good breeze!

1· cdrecord issues

cdrecord -scanbus dev=ATA
That did the trick I didn't know I had to specify dev=ATA without it indeed it showed me just error about the sg/scsi emulation not found.
I was a bit in a hurry today and didn't get the time to try harder sorry
I'm now able to record :)




Anyway I would expect cback to create an iso file anyway even if it fails to actually record it. It would be nice if it creates an ISO also for dvd size even if it can't manage to record it directly.


If what you want is a different action (perhaps, "makeiso") that creates
an ISO image rather than writing a CD, this is something I could
accomodate.  We could build that action so it wouldn't have any
restrictions on image size, or perhaps only optionally enforced an image
size.

I will also eventually get around to adding support for writing DVD
media, but I just haven't had time yet.  (It shouldn't be too difficult;
what I need is time to figure out the syntax for whatever command I'll
be using, and then time to adequately test things.)

I'll help you for this for sure buddy! Indeed I just started programming in python (I do java) and this would be the best simple exercise to start working on. Indeed one of the reason I started looking at cback was the fact was writen in py and was a simple well structured project to start working on and I have also the need to solve my luxurious backup needs :) I whant to 'try' to explain my dream for a backup solution and where cback can go in my vision.
I currently don't have a real backup system. I just replicate data in an other disc and once upone a time I make a cd out of it. This works well for simple needs but is not a professional solution. When I say professional I really meant it:
From the user perspective it should be a (simple) GUI/web showing the filesystem tree where you can see for each directory/files a different color based on the backup policy it has.
This tool will produce an xml that might be close to the actual cback xml syntax collecting the preference for each dir/files-backup policy.
After that this xml file is transformed with an ad-hoc xsl to specific dialect for tool like unison (http://www.cis.upenn.edu/~bcpierce/unison/index.html) or rsync
(http://www.mikerubel.org/computers/rsync_snapshots/)
I currently use unison (a two way sync) and the main advantage of this approach is the way incremental backup are made. It really save disk space and processing time. It's a very fast and cheap way of creating snapshot-style backups of your data.
These tool better deals (performance and functionality wise) with problems like checksum/diffing and even merging when needed then cback and are trully tested tools used in production since years. They really shines when it comes to replica but of course they are not touring complete backup solutions but you can of course make snapshot of the repository the make.
So cback again would be usefull to make the snapshot from these rep and record it on a media (if needed).
But what if I whant to go back on the situation where I was one month ago for a particular file or for the whole filesystem?
This is the job of subversion so it probably makes sense to import the rsync/unison created repository in it, tag it and maybe take the snapshot of it with cback like tool.
So this is (my) big picture, the tools are already there they just need to be integreated.
What you think? I whanted to share my dream
Now I go to sleep I'm awake since 23 hours and I'm surprised I can still formulate proper english sentence... I'm italian afterall
I'll reply to the other things tomorow :)


regards
Eli


2· exclude/include patterns
In my opinion <ignore_file> should be removed because with <exclude> you have full power and is not beautifull to have these files around just to override the general setup. instead a user conf file should reside in some dir like ~/.cback and user can setup cron job as well...


Heh.  Well, if it's not beautiful to you, there's no need to use it. :)

Seriously, keep in mind that a per-user ~/.cback file does not really
provide equivalent functionality to a per-directory ignore indicator
file.


Per-directory ignore files apply system-wide. If any user creates an
ignore file in a particular directory, any Cedar Backup run by any user
will ignore that directory (assuming the backup is configured to pay
attention to an ignore indicator file at all).


Assuming that the per-user ~/.cback file behaved in the "standard" way,
it would only apply to Cedar Backup runs executed by that particular
user, which is not the same thing as you get with ignore indicator
files.

Remember, Cedar Backup is primarily intended to be run as root for large
parts of a system which might contain multiple users, rather than being
run by lots of individual users on a system.  (See the distinction?)


Instead what I really miss from cback is the ability to include files from excluded dir. I'd like to make exceptions to subtree/files from dirs excluded. This is in my opinion the most needed feature.


Hmm.  I can see why you might want that, but you can accomplish the same
thing today by specifying finer-grained backups and exclusions, so it's
kind of low on my priority list.

What would you expect configuration to look like?  Some sort of
exclusion-within-an-exclusion?  That makes me wonder whether it's really
worth making configuration any more complicated than it already is.


3· global collect
I don't get what global collect/collect_mode and collect/exclude are for. Since you specify it per dir basis does it override the parent settings? So if I specify on a dir a collect_mode daily and a collect/collect_mode weekly globaly when does it run?


If you specify a global configuration parameter like collect mode, it
applies to all directories being collected, acting as a default. If you
specify a different value for collect mode on an individual directory,
then that value overrides the default for that directory only.


For your particular example, if you specify a global "daily" collect
mode, but a "weekly" collect mode on a single directory, than by default
directories will use the "daily" mode but that one directory will use
the "weekly" mode.

Does that make sense?

Personally, I never specify a default because Cedar Backup v1.0 didn't
allow it, and I never got around to updating my configuration files.
However, if I were starting from scratch, I could see myself specifying
a default mode of "incr" and overriding just a few directories like /etc
with "daily" or something.


4· index/search
There is no way to know in which backup disc you had a certain file. It would be really nice if the digest file can be easly searched and connected to a backup-set in order to know where you can find a certain file you had now deleted. I still have to found backup/restore solutions providing this :(


Ah. I think what you're looking for is a way to say, "which backup disc
should I look in to find this particular file?".


You're right, Cedar Backup doesn't give you a way to do this.  In fact,
it doesn't even really know anything about this, because (for instance)
it has no way to know whether you switched discs at the beginning of the
week, or overwrote your current disc, or even if perhaps you put in
another bogus disc which Cedar Backup was able to attach a new ISO
session to.  It's not something that I could add very easily. :(


5· slowness/debug info
The backup process seems really to much slow. For example on one excluded tree it took 11 minutes just to realize it had not to proccess it:
2005-10-19T10:32:09 CEST --> [DEBUG ] Path [/home] is excluded based on excludePaths.
2005-10-19T10:43:52 CEST --> [DEBUG ] Path [/var/tmp/backup] is excluded based on excludePaths.


WHY???


Well, because did didn't get to /home util that point. :)

Internally, Cedar Backup builds a list of files to be backed up before
executing the backup.  As Cedar Backup traverses the directory heirarchy
for a particular collect directory, it checks individual file and
directory names against various rules to decide whether they are
excluded or not.  In this case, the log didn't list /home as being
excluded until it actually found a directory matching an exclusion rule.
(In other words, even if you have hundreds of exclusions, Cedar Backup
doesn't really care about them unless it actually finds something to
apply an exclusion to.)


and 36 minutes to comute hash and realize there wasn't changed files (it was the first time)
2005-10-19T10:46:20 CEST --> [DEBUG ] Digest [/var/tmp/backup/-.sha] does not exist on disk.
2005-10-19T11:22:16 CEST --> [DEBUG ] Removed 0 unchanged files based on digest values.


There is not enought debug info and the programm run for hours without saying what is trying to do.


Yeah, and then the last user who complained said that there was too much
in the debug log. :)


It's a struggle to find a balance in the right level of logging, without
providing so much information that it's useless.  Cedar Backup v1.0
listed every file it backed up, but that seemed excessive.  Cedar Backup
v2.0 just logs information about individual collect directories, because
that's how I expected users to configure it.

You see, I never really expected users to want to back up their entire
root directory, and it's not something I've really ever done.  Because
of that, I've never watched the log under those circumstances.  I guess
the log would probably feel more "responsive" if you listed specific
directories to back up rather than backing up the root directory and
listing specific directories to exclude.  You'd get entries as Cedar
Backup moved from collect directory to collect directory.

I will give some thought to how I can increase the debug output without
flooding the log -- but as I said, it's a difficult balancing act.
There probably aren't too many middle grounds between what you see (log
entries for individual collect directories) and log messages for every
single backed up directory or perhaps every single backed up file.
Which would you prefer?  Can you really imagine wanting either of those
options?  I am open to adding back in more logging if you really think
it's useful.


6· There is no pre/post process command execution. This is important in particular to hack some script to add features cback doesn't (yet) have or integrate it in the flow of other programs.


Funny, I just had someone else ask for that just last week, in bug #27:

   http://cedar-solutions.com/cgi-bin/bugzilla/show_bug.cgi?id=27

Can you give me some thoughts on how you would expect this to work?

What would the commands be -- just shell commands, perhaps required to
start with an absolute path?  Would you ever need to list more than one
shell command in Cedar Backup configuration for a given action, or would
you expect to combine all of your actions into one single shell script
somewhere on the filesystem?  Can you ever imagine wanting to provide
Python code (a function) rather than a shell command?

Would it be enough to specify a single "pre-action hook" and a single
"post-action hook" in configuration for that command, or would it be
better to have a separate configuration action mapping hooks to actions?
(The first might be easier to understand, but the second would allow you
to hook extensions without extensions having to know anything about it.)

Let me know what you think.


7· Messaging
It would be nice to make the email message configurable and to be able to have an email or other form of message (IM for ex) run also on success.


The thing is, Cedar Backup right now doesn't even know anything about
email or any other form of notification.  It just assumes it's running
in a terminal and prints things to stdout.  If something else (i.e.
cron) emails that output around, then so much the better.  So, when you
suggest making the email message configurable, you're really suggesting
that I somehow make Cedar Backup's output to stdout configurable (or
cron's email format configurable), which is not something I'm likely to
do.

I guess I'm open to adding in some sort of hook (maybe something a
little like an extension, a Python function with a standard interface)
for notification, but I would have to give it some thought before
deciding for sure.  One problem is how to appropriately handle error
conditions.  In other words, it's easy to send success status messages,
but if the program crashes or fails hard, it may be difficult to send
failure messages.  In the current model, cron handles all of the
difficult parts for me. :)


8· Space waste
It waste to much space because the collect fase is unneeded for me. You can pass the thing to include/exclude in tar without the need to actually copy it in an intermediate collect stage. This makes the process twice fast.
What you think?


Are you using a pool of one?  The reason I ask is, while collected data
always ends up being a little extraneous on a master machine, it still
does seem to make sense when backing up a set of machines (because it
makes the process more forgiving).  I can understand why you would want
to combine the two steps when using a pool-of-one, however.


Hope is not too much and you feel depressed ;)


Nope, it doesn't make me feel depressed. I'm actually kind of
fascinated, because as people start to use Cedar Backup, almost none of
them seem to use it the way it was intended, in other words the way I
use it. :)


In particular, I'm surprised at how many people use it for the
pool-of-one case and I'm surprised that people are interested in backing
up their entire root directory gigabytes at a time.  Cedar Backup isn't
yet really optimized for either of these cases.  The pool-of-one backup
is too complicated, and Cedar Backup does seem to be excessively slow
when backing up really huge directories, because of an intentional
design decision to work in terms of lists of files (which makes
functionality like exclusions and ignore files very straightforward).

Anyway, I'm glad you wrote me.  Please let me know what you think about
some of my comments above.  Hopefully, we can get your cdrecord issue
straightened out, too.

KEN




-- To unsubscribe, send mail to cedar-backup-users-unsubscribe@cedar-solutions.com.