Page 1 of 1

Directory compare & backup - suggestions?

Posted: Sun Sep 08, 2024 11:30 pm
by greengeek

What are the best ways to keep directories and files backed up?
Currently I copy large amounts of data once a month or so to external SSDs or usb sticks.
But I never remember exactly when I took the last backup so I have ended up with lots of duplication as well as potential gaps in my overall archive..

Is there a utility that can compare my latest backup directory (and all subdirectories) with what is currently on my laptop HDD?

ie: a utility that scans every directory checking for newer files and automatically adding only the newer stuff to my backup directory.

I seem to recall looking at PUDD once or twice but went no further.

What technique/utility do people use for this please?

EDIT - I should mention that I am not referring to save file or save folder backups. Just talking about personal data that I accumulate in separate directories. My directory names are usually something like:
Technica
Miscellania
Familia
Generalia

(and each heading has many subdirectories within it..)

cheers!


Re: Directory compare & backup - suggestions?

Posted: Mon Sep 09, 2024 12:35 am
by geo_c
greengeek wrote: Sun Sep 08, 2024 11:30 pm

What are the best ways to keep directories and files backed up?

EDIT - I should mention that I am not referring to save file or save folder backups. Just talking about personal data that I accumulate in separate directories. My directory names are usually something like:
Technica
Miscellania
Familia
Generalia

(and each heading has many subdirectories within it..)

cheers!

I do this several times a day using the rsync command. But to make it more automated I have a collection of scripts that I run in both directions, from a HDD to an external drive, or from an external drive to HDD, or from and external drive to another external drive.

I do this several times a day because I use several computers at different locations, and I don't use any cloud servers to store data. So I work at one location, run an rsync script to update my files on an external drive, and then when I get to the new location, run an rsync script to update the files on that computer from the external drive.

A typical script looks like this, it's one command, but a lot of typing, so hence the script:

Code: Select all

#!/bin/bash

#dbox.sync.mir
rsync -r -t -v --progress --delete --modify-window=1 -l -H -s -X -p -o -g /mnt/home/dbox.sync.mir /mnt/sdb1

Most forum OS's have rsync installed, and if not it's ususally available in the repository.

I didn't show you my whole script though, and I add a bit at the bottom, which if you run the script from a certain directory depth, it will drop a log file on the top level of the drive to record when the directory was last synced to some other source. That part looks like this:

Code: Select all

MYDIR=$(cd `dirname $0` && pwd)
cd $MYDIR
cd ..
cd ..
cd ..
touch dbox-$(date +"%m%d.%H%M").log

The cd.. commands drops the file a couple directories back, on the root of the drive, as I always keep these scripts on every drive and run them from a directory on the the target drive located at the same position. I keep the above script in: /mnt/sdb1/sync-script/syncMNT/home-sdb1 So it drops a log file on the drive named /mnt/sdb1/dbox-0905.1952.log

At any rate, that's my method, but the short answer is rync does what you're looking for, and you can run rsync --help for an explanation of the options.


Re: Directory compare & backup - suggestions?

Posted: Mon Sep 09, 2024 12:58 am
by greengeek
geo_c wrote: Mon Sep 09, 2024 12:35 am

A typical script looks like this, it's one command, but a lot of typing, so hence the script:

Code: Select all

#!/bin/bash

#dbox.sync.mir
rsync -r -t -v --progress --delete --modify-window=1 -l -H -s -X -p -o -g /mnt/home/dbox.sync.mir /mnt/sdb1

Thanks @geo_c - so if I had a directory "Familia" containing thousands of family images on my source drive - and a similar "Familia" directory on my target drive - could I run this sort of script and expect only new images to be copied across?

And I notice "--delete" in your script - does it delete certain images? (I am wanting to avoid this risk)

cheers!


Re: Directory compare & backup - suggestions?

Posted: Mon Sep 09, 2024 1:18 am
by geo_c
greengeek wrote: Mon Sep 09, 2024 12:58 am

Thanks @geo_c - so if I had a directory "Familia" containing thousands of family images on my source drive - and a similar "Familia" directory on my target drive - could I run this sort of script and expect only new images to be copied across?

And I notice "--delete" in your script - does it delete certain images? (I am wanting to avoid this risk)

cheers!

Yes, it will copy any files not found in the target directory, and the --delete option deletes files in the target directory not found in the source directory. This is how I keep from having duplicate files.

Of course you have to keep your logic straight when doing syncing of this nature.

here's rsync help, so you can look at the options I have in my script. You may want a different set of options based on your needs. My command preserves ownerships and permissions, deletes files on target not found on source, copies symlinks as symlinks, preserves hard links, preserves extended attributes, safely sends arguments (not sure why I have that), etc...

Code: Select all

Usage: rsync [OPTION]... SRC [SRC]... DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST
  or   rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST
  or   rsync [OPTION]... [USER@]HOST:SRC [DEST]
  or   rsync [OPTION]... [USER@]HOST::SRC [DEST]
  or   rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST]
The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect
to an rsync daemon, and require SRC or DEST to start with a module name.

Options
--verbose, -v            increase verbosity
--info=FLAGS             fine-grained informational verbosity
--debug=FLAGS            fine-grained debug verbosity
--stderr=e|a|c           change stderr output mode (default: errors)
--quiet, -q              suppress non-error messages
--no-motd                suppress daemon-mode MOTD
--checksum, -c           skip based on checksum, not mod-time & size
--archive, -a            archive mode is -rlptgoD (no -A,-X,-U,-N,-H)
--no-OPTION              turn off an implied OPTION (e.g. --no-D)
--recursive, -r          recurse into directories
--relative, -R           use relative path names
--no-implied-dirs        don't send implied dirs with --relative
--backup, -b             make backups (see --suffix & --backup-dir)
--backup-dir=DIR         make backups into hierarchy based in DIR
--suffix=SUFFIX          backup suffix (default ~ w/o --backup-dir)
--update, -u             skip files that are newer on the receiver
--inplace                update destination files in-place
--append                 append data onto shorter files
--append-verify          --append w/old data in file checksum
--dirs, -d               transfer directories without recursing
--old-dirs, --old-d      works like --dirs when talking to old rsync
--mkpath                 create destination's missing path components
--links, -l              copy symlinks as symlinks
--copy-links, -L         transform symlink into referent file/dir
--copy-unsafe-links      only "unsafe" symlinks are transformed
--safe-links             ignore symlinks that point outside the tree
--munge-links            munge symlinks to make them safe & unusable
--copy-dirlinks, -k      transform symlink to dir into referent dir
--keep-dirlinks, -K      treat symlinked dir on receiver as dir
--hard-links, -H         preserve hard links
--perms, -p              preserve permissions
--executability, -E      preserve executability
--chmod=CHMOD            affect file and/or directory permissions
--acls, -A               preserve ACLs (implies --perms)
--xattrs, -X             preserve extended attributes
--owner, -o              preserve owner (super-user only)
--group, -g              preserve group
--devices                preserve device files (super-user only)
--copy-devices           copy device contents as a regular file
--write-devices          write to devices as files (implies --inplace)
--specials               preserve special files
-D                       same as --devices --specials
--times, -t              preserve modification times
--atimes, -U             preserve access (use) times
--open-noatime           avoid changing the atime on opened files
--crtimes, -N            preserve create times (newness)
--omit-dir-times, -O     omit directories from --times
--omit-link-times, -J    omit symlinks from --times
--super                  receiver attempts super-user activities
--fake-super             store/recover privileged attrs using xattrs
--sparse, -S             turn sequences of nulls into sparse blocks
--preallocate            allocate dest files before writing them
--dry-run, -n            perform a trial run with no changes made
--whole-file, -W         copy files whole (w/o delta-xfer algorithm)
--checksum-choice=STR    choose the checksum algorithm (aka --cc)
--one-file-system, -x    don't cross filesystem boundaries
--block-size=SIZE, -B    force a fixed checksum block-size
--rsh=COMMAND, -e        specify the remote shell to use
--rsync-path=PROGRAM     specify the rsync to run on remote machine
--existing               skip creating new files on receiver
--ignore-existing        skip updating files that exist on receiver
--remove-source-files    sender removes synchronized files (non-dir)
--del                    an alias for --delete-during
--delete                 delete extraneous files from dest dirs
--delete-before          receiver deletes before xfer, not during
--delete-during          receiver deletes during the transfer
--delete-delay           find deletions during, delete after
--delete-after           receiver deletes after transfer, not during
--delete-excluded        also delete excluded files from dest dirs
--ignore-missing-args    ignore missing source args without error
--delete-missing-args    delete missing source args from destination
--ignore-errors          delete even if there are I/O errors
--force                  force deletion of dirs even if not empty
--max-delete=NUM         don't delete more than NUM files
--max-size=SIZE          don't transfer any file larger than SIZE
--min-size=SIZE          don't transfer any file smaller than SIZE
--max-alloc=SIZE         change a limit relating to memory alloc
--partial                keep partially transferred files
--partial-dir=DIR        put a partially transferred file into DIR
--delay-updates          put all updated files into place at end
--prune-empty-dirs, -m   prune empty directory chains from file-list
--numeric-ids            don't map uid/gid values by user/group name
--usermap=STRING         custom username mapping
--groupmap=STRING        custom groupname mapping
--chown=USER:GROUP       simple username/groupname mapping
--timeout=SECONDS        set I/O timeout in seconds
--contimeout=SECONDS     set daemon connection timeout in seconds
--ignore-times, -I       don't skip files that match size and time
--size-only              skip files that match in size
--modify-window=NUM, -@  set the accuracy for mod-time comparisons
--temp-dir=DIR, -T       create temporary files in directory DIR
--fuzzy, -y              find similar file for basis if no dest file
--compare-dest=DIR       also compare destination files relative to DIR
--copy-dest=DIR          ... and include copies of unchanged files
--link-dest=DIR          hardlink to files in DIR when unchanged
--compress, -z           compress file data during the transfer
--compress-choice=STR    choose the compression algorithm (aka --zc)
--compress-level=NUM     explicitly set compression level (aka --zl)
--skip-compress=LIST     skip compressing files with suffix in LIST
--cvs-exclude, -C        auto-ignore files in the same way CVS does
--filter=RULE, -f        add a file-filtering RULE
-F                       same as --filter='dir-merge /.rsync-filter'
                         repeated: --filter='- .rsync-filter'
--exclude=PATTERN        exclude files matching PATTERN
--exclude-from=FILE      read exclude patterns from FILE
--include=PATTERN        don't exclude files matching PATTERN
--include-from=FILE      read include patterns from FILE
--files-from=FILE        read list of source-file names from FILE
--from0, -0              all *-from/filter files are delimited by 0s
--old-args               disable the modern arg-protection idiom
--secluded-args, -s      use the protocol to safely send the args
--trust-sender           trust the remote sender's file list
--copy-as=USER[:GROUP]   specify user & optional group for the copy
--address=ADDRESS        bind address for outgoing socket to daemon
--port=PORT              specify double-colon alternate port number
--sockopts=OPTIONS       specify custom TCP options
--blocking-io            use blocking I/O for the remote shell
--outbuf=N|L|B           set out buffering to None, Line, or Block
--stats                  give some file-transfer stats
--8-bit-output, -8       leave high-bit chars unescaped in output
--human-readable, -h     output numbers in a human-readable format
--progress               show progress during transfer
-P                       same as --partial --progress
--itemize-changes, -i    output a change-summary for all updates
--remote-option=OPT, -M  send OPTION to the remote side only
--out-format=FORMAT      output updates using the specified FORMAT
--log-file=FILE          log what we're doing to the specified FILE
--log-file-format=FMT    log updates using the specified FMT
--password-file=FILE     read daemon-access password from FILE
--early-input=FILE       use FILE for daemon's early exec input
--list-only              list the files instead of copying them
--bwlimit=RATE           limit socket I/O bandwidth
--stop-after=MINS        Stop rsync after MINS minutes have elapsed
--stop-at=y-m-dTh:m      Stop rsync at the specified point in time
--fsync                  fsync every written file
--write-batch=FILE       write a batched update to FILE
--only-write-batch=FILE  like --write-batch but w/o updating dest
--read-batch=FILE        read a batched update from FILE
--protocol=NUM           force an older protocol version to be used
--iconv=CONVERT_SPEC     request charset conversion of filenames
--checksum-seed=NUM      set block/file checksum seed (advanced)
--ipv4, -4               prefer IPv4
--ipv6, -6               prefer IPv6
--version, -V            print the version + other info and exit
--help, -h (*)           show this help (* -h is help only on its own)

You might want to experiment by copying some files into a test directory and seeing it how operates on an empty target directory before running it on your valuable data.


Re: Directory compare & backup - suggestions?

Posted: Wed Sep 11, 2024 12:25 am
by fernan

I usually run "rsync" with specific parameters to update backup folders, options to copy only non existent files, or "delete after" the copy was done, or copy just newer files, and so.

But, to me, the most useful tool I've found is "syncthing" (no affiliation, https://syncthing.net/ ) to backup on different computers.

Once set up, it will sync 2 , 3 , or more computers, over the internet or in a LAN, with a lot of options, versions backup, write only (drop) folders if you need them, backup of deleted files in the rest of the computers, options to sync just some folders in some computers and not in others, and so. That way, I can work in 3 or 4 computers at different locations and I know all of them are in sync. The only requirement is that the computers must be turned on at the same time to allow the sync process, since no cloud or internet storage is involved. Just your private computers.


Re: Directory compare & backup - suggestions?

Posted: Wed Sep 11, 2024 1:57 am
by geo_c
fernan wrote: Wed Sep 11, 2024 12:25 am

But, to me, the most useful tool I've found is "syncthing" (no affiliation, https://syncthing.net/ ) to backup on different computers.

Before going the rsync route about 4 years ago, I was extensively using syncthing, and it seemed great, a viable alternative to something like dropbox, but I personally, for reasons I never quite figured out, but probably having to do with having multiple puppies accessing and running syncthing, I would get a lot of "conflicted copies" of files, of which I still find myself deleting isolated cases from certain directories years later as I run across them.

So like I say, probably has to do with how many OS's I was running with syncthing and should chalk it up to my own error. But with a little practice I found rsync a lot more manageable.


Re: Directory compare & backup - suggestions?

Posted: Wed Sep 11, 2024 2:55 am
by wizard

@greengeek

Have you looked at Grsync GUI for rsync, it's in BW64 and other Pups.

wizard


Re: Directory compare & backup - suggestions?

Posted: Wed Sep 11, 2024 5:25 am
by greengeek
wizard wrote: Wed Sep 11, 2024 2:55 am

Have you looked at Grsync GUI for rsync, it's in BW64 and other Pups.

wizard

Not yet, but I am just about to start putting each of these ideas into practice and see which method works best. I realize now that automation of backups is likely to bring a degree of risk. Will have to tread carefully.


Re: Directory compare & backup - suggestions?

Posted: Thu Sep 12, 2024 1:28 am
by geo_c
greengeek wrote: Wed Sep 11, 2024 5:25 am
wizard wrote: Wed Sep 11, 2024 2:55 am

Have you looked at Grsync GUI for rsync, it's in BW64 and other Pups.

wizard

Not yet, but I am just about to start putting each of these ideas into practice and see which method works best. I realize now that automation of backups is likely to bring a degree of risk. Will have to tread carefully.

Grsync is how I generated my first rsync commands. After choosing all the checkbox options it gives you the command it's going to run. All you have to do is copy and paste that into your own script once you're sure it does what you want.


Re: Directory compare & backup - suggestions?

Posted: Fri Sep 13, 2024 1:15 am
by fernan
geo_c wrote: Wed Sep 11, 2024 1:57 am
fernan wrote: Wed Sep 11, 2024 12:25 am

But, to me, the most useful tool I've found is "syncthing" (no affiliation, https://syncthing.net/ ) to backup on different computers.

Before going the rsync route about 4 years ago, I was extensively using syncthing, and it seemed great, a viable alternative to something like dropbox, but I personally, for reasons I never quite figured out, but probably having to do with having multiple puppies accessing and running syncthing, I would get a lot of "conflicted copies" of files, of which I still find myself deleting isolated cases from certain directories years later as I run across them.

So like I say, probably has to do with how many OS's I was running with syncthing and should chalk it up to my own error. But with a little practice I found rsync a lot more manageable.

Well, I forgot to mention (since it was not the main topic in this thread), I've added a couple of lines to the syncthing startup script, to search and delete all those "conflict" files inside my sync folders. Something like this:

Code: Select all

find /root/Sync/ -name "*sync-conflict*" -exec rm -rf {} \;

After this change, all the machines run smoothly, no more conflict files inside, and nothing gets lost.

I run 5 machines at the same time sharing the Sync default folder, and some extra folders shared between some of them, not all.


Re: Directory compare & backup - suggestions?

Posted: Sat Sep 14, 2024 10:58 am
by Jasper

@greengeek

As a suggestion you could try Restic

https://github.com/restic/restic

There are binaries available for most distributions available. You will have to make the binary executable first!

It is pretty straightforward to use.

restic-example.jpg

To setup a backup/snapshot directory using terminal

restic init --repo /tmp/backup (..... you can specify the backup location .... using /tmp as an example)

It will prompt you to setup a password

Then to backup a directory

restic --repo /tmp/backup backup ~/my-applications (..... using my-applications as an example)

To view the contents of your backup

restic ls latest -r /tmp/backup (..... will prompt you for your password)

Detailed examples are provided here:

https://restic.readthedocs.io/en/latest/index.html


Re: Directory compare & backup - suggestions?

Posted: Sun Sep 15, 2024 7:23 am
by greengeek
wizard wrote: Wed Sep 11, 2024 2:55 am

Have you looked at Grsync GUI for rsync, it's in BW64 and other Pups.

Just started looking at grsync. Seems as if it probably does have options that I need - although it makes me realise I will need to be very careful in structuring my directories and deciding which options to set up within grsync.
Just an initial complaint tho' - do I have to assume that the first field is "source" and the second is "destination"?
Seems a little casual to not have it explicitly labelled.
(I am hoping for utter clarity so I don't stuff things up... :twisted: :twisted: )

grsync_gui.jpg
grsync_gui.jpg (39 KiB) Viewed 611 times

Re: Directory compare & backup - suggestions?

Posted: Sun Sep 15, 2024 7:46 am
by greengeek

Well - my first fail with grsync is that it did not "merge" - it just "copied".

ie: I tried to copy recent "familia" files into old "familia" (on other disk) and it copied the whole source "familia" directory inside the destination "familia" directory.

ie: destination disk now has /familia/familia/*

Not quite what I wanted.

Will have to look closer at the grsync options.

** I guess what I want is a progressive, cumulative, archive. (Not a "sync" - just an ongoing "additive" archive) - if that makes sense....


Re: Directory compare & backup - suggestions?

Posted: Sun Sep 15, 2024 8:12 am
by greengeek
Jasper wrote: Sat Sep 14, 2024 10:58 am

As a suggestion you could try Restic

https://github.com/restic/restic

There are binaries available for most distributions available. You will have to make the binary executable first!

Thanks for the tip @Jasper
Any tips on how to proceed from the git page please?
Sometimes I seem to find a link to suitable debs from git - but often not.
Is there a link here that I am missing?
(TIA!)

restic.png
restic.png (156.49 KiB) Viewed 601 times

Re: Directory compare & backup - suggestions?

Posted: Sun Sep 15, 2024 11:45 am
by Jasper

@greengeek

This is the direct link to the amd64 binary (approx 7mb - 0.17.1 (2024-09-05):
https://github.com/restic/restic/relea ... _amd64.bz2

All releases can be found here:

https://github.com/restic/restic/releases

To check if the contents in the original and destination directories you can do this using terminal

du -sh /path/to/directory (this will just give you the overall size of the directory)

du -h /path/to/directory (will give you the individual files and size)


Re: Directory compare & backup - suggestions?

Posted: Mon Sep 16, 2024 2:37 am
by fernan
greengeek wrote: Sun Sep 15, 2024 7:46 am

Well - my first fail with grsync is that it did not "merge" - it just "copied".

ie: I tried to copy recent "familia" files into old "familia" (on other disk) and it copied the whole source "familia" directory inside the destination "familia" directory.

ie: destination disk now has /familia/familia/*

Not quite what I wanted.

Will have to look closer at the grsync options.

** I guess what I want is a progressive, cumulative, archive. (Not a "sync" - just an ongoing "additive" archive) - if that makes sense....

That happens when you specify the target and source folders, with or without the last "/" , so, if you specify your destination folder as /familia/ or as /familia
, it will be different. Try both and see by yourself.


Re: Directory compare & backup - suggestions?

Posted: Mon Sep 16, 2024 3:32 am
by geo_c
fernan wrote: Mon Sep 16, 2024 2:37 am

ie: destination disk now has /familia/familia/*
** I guess what I want is a progressive, cumulative, archive. (Not a "sync" - just an ongoing "additive" archive) - if that makes sense....

That happens when you specify the target and source folders, with or without the last "/" , so, if you specify your destination folder as /familia/ or as /familia , it will be different. Try both and see by yourself.

I find that specifying the source folder with no / following the folder name, for instance, /images/familia and also not naming the target folder, but instead the directory where the target resides, as in /images gives the result of syncing the source folder to /images/familia

So I have a directory on my hard drive: /mnt/home/abox which I would like to sync to a folder of the same name at the top level of sda1 (a usb drive)

I write the options and directories like this, and the result is a folder mnt/home/abox synced to /mnt/sda1/abox:

Code: Select all

rsync -r -t -v --progress --delete -l -H -s -X -p -o -g /mnt/home/abox /mnt/sda1

with the options:

Code: Select all

-r                   recurse into directories
-t                   preserve modification times
-v                   increase verbosity
--delete             delete extraneous files from dest dirs
--progress           show progress during transfer
-l                   copy symlinks as symlinks
-H                   preserve hard links
-s                   use the protocol to safely send the args
-X                   preserve extended attributes
-p                   preserve permissions
-o                   preserve owner (super-user only)
-g                   preserve group

So if you don't want to sync, but want rsync to add files from the source to the target, I believe simply taking out the --delete option will do that. You can decide if you want to preserve the links, permissions, ownerships, and so forth that my command uses. If you don't have any symlinks or hard links in /familia, it should do no harm to leave the options.


Re: Directory compare & backup - suggestions?

Posted: Mon Sep 16, 2024 4:37 am
by Clarity

Hello @greengeek, here's the example using Grsync that @geo_c is demonstrating.
Hope this is helpful

Grsync1.jpg
Grsync1.jpg (25.14 KiB) Viewed 517 times
Grsync2.jpg
Grsync2.jpg (48.46 KiB) Viewed 517 times

Re: Directory compare & backup - suggestions?

Posted: Mon Sep 16, 2024 1:50 pm
by geo_c
Clarity wrote: Mon Sep 16, 2024 4:37 am

Hello @greengeek, here's the example using Grsync that @geo_c is demonstrating.
Hope this is helpfulGrsync1.jpgGrsync2.jpg

That it is helpful using grsync to show the command line options. There is one option in Basic Options in the above image that is checked that I don't think my example reflects, which is the "Ignore Existing" option.

Which is described in the command usage as

Code: Select all

--ignore-existing              skip updating files that exist on receiver

That is fine as long as you didn't make any edits to files in your source directory that you would like to see reflected in your target directory, for instance maybe adding metadata tags, or editing the image without saving it under a different name. With the --ignore-existing option you wouldn't see those changes reflected on the target directory.