Transferring Data between Linux and Mac Systems

Introduction

The section "Using rsync and unison" has been taken as-is from a Top Tip of the Week of considerable antiquity by MaJoC (who himself is of considerable antiquity.) If anything is unclear, please badger either or both of us (MaJoC and ChrisCauser) to reword it, and thereby help us improve this document.

Using rsync and unison

rsync and unison are file-transfer utilities, with one very useful feature in common: If a file already exists at the destination end, only the differences are actually copied. In particular, if the files are the same, nothing gets transferred. This is most obviously useful when updating a massive set of files over a slow network, but is also a great timesaver if the transfer is interrupted part of the way through (restart to resume), or if the source fileset may have changed while we were copying it across.

Rsync copies files in one direction:

rsync -avP -e ssh fromname@fromsystem:/fromdir/ todir/

(my favourite inter-system rsync incantation) recursively copies everything in the named remote source directory into the named local destination directory, using SSH for data transport. See "man rsync" for the surprising consequences of omitting the trailing "/" on either directory name.

rsync -rlptvPm /media/usb1/foo/ /data/system/user/bar/

copies from your USB drive to a data-mounted partition; exchange the directory names to reverse the process. Note the extra "-m" argument ("don't overwrite newer files"). Changing "-a" to "-rlpt" means you get to own the copies, even if your system and external USB drive don't agree on numeric IDs (those using USB drives formatted as HFS+ to transport data between Macs and Debians tend to suffer from this).

To test if you have the syntax correct, you can always perform a 'dry run' by adding the 'n' option

rsync -nrlptvPm /media/usb1/foo/ /data/system/user/bar/
This just lists all the files that rsync was planning to transfer without actually changing anything. Useful if you forget about the above '/' issues on dir names or if you plan to (incorrectly) use the delete options.

For more details on rsync, see "man rsync" (rather long: skip forward to the "Usage" section), find your way round "info rsync", or look for documentation under http://rsync.samba.org/ .

Unison, which is based on rsync, synchronises files in two directions at once. The intention is to keep two or more copies of each file in separate places, and to update each from the other. If one copy of a file has been modified, the default action is to change the other to match; if both have been changed, neither is modified, and the problem is brought to the attention of a responsible adult (you). This is all done by keeping notes of the state of each file at the end of the last transaction. (Rsync by default goes by datestamps, which can be misleading. Unison by default doesn't propagate them; this can be even more misleading, and is definitely a mess to sort out afterwards.)

Synchronising between machines, or between mounted filesystems, is as simple as

unison ssh://farname@farsystem:/fardir /data/system/user/neardir
unison -times /media/usb1/foo /data/system/user/bar

(The latter propagates timestamps.) This is by default interactive. The text-based client is a mite user-hostile; I shall in due course be installing the GUI front end, unison-gtk, to Debian client systems. For more information on unison, see "man unison" (very short) or "unison -doc tutorial | less" (a bit long); or google for unison's home site, where you will also find native MS-Windows and Macintosh client binaries.

CAVEATS

  • Do not be tempted to use a * after the dir name... e.g.
    rsync -rlptvPmz username@scp-astro.physics.ox.ac.uk:~/  /Data/macsystem/username/oldhome  
    gives CORRECT results whereas
    rsync -rlptvPmz username@scp-astro.physics.ox.ac.uk:~/*  /Data/macsystem/username/oldhome/ 
    makes you think you've successfully made a backup, but it's mysteriously missed some files and folders (especially the hidden 'dot' files)

  • If you are transferring files to your new MAC, you may want to repeat the comand after a complete transfer or sync; not because rsync doesn't work, but because the MAC disks are currently formatted to a non-case sensitive system, so if you had two files/folders named lib and Lib in the same dir on your linux disk, they are forced to be the same on your MAC disk and so they overwrite each other each time you rsync. Tell tale signs of this bug are that rsync is never happy and insists there are still changes to be made just after a fresh sync. There may be a way round this bug: formatting your MAC disk to a 'MAC OSX extended (case-sensitive) Journaled' system... tests under way.

  • If your external USB drive is formatted as FAT32 (alias VFAT, ie as-Microsoft), the person doing the mount gets to own the files. The Bad News is that certain characters (notably ':') tend to appear in names of files produced at certain European observatories, but are forbidden in FAT filenames. Your only reliable choice here would be to create a "tarball" (using tar and gzip), transport that on the USB drive, and untar it at the other end. This is agreed to be awkward: you can't then use the files in-place on the USB drive, nor easily check which files in the tarball are older or newer than your local copy. There's also the minor detail that the maximum size for a file on FAT32 is 4GBytes, as I've recently (March 2008) had brought forcibly to my attention, so you'd need to create N separate tarballs, each under 4G in size. This also is agreed to be awkward. [ Edit by RyanHoughton: you can use the MAC "disk utility" to format a USB disk as extended (journaled) which has no problems with e.g. "file:name" (Aug 2008)]

  • If you're using my standard trick of running the same rsync incantation twice (the second time to double-check), and the target filesystem is VFAT, you may get an avalanche of updates on the second run. Before assuming the worst, check a test file with (eg)
    diff sourcedir/testfile destdir/testfile
    ls -l sourcedir/testfile destdir/testfile
          
    If the diff shows no differences, and the datestamps look equal, then you (like me) have fallen foul of the minor detail that timestamps on VFAT have a two-second minimum quantum, the UNIX datestamp is precise to the second, and the sourcefile's timestamp includes an odd number of seconds.

  • Those using external USB hard drives with Mac OS X systems have HFS+ available to them, but the numeric UIDs and GIDs will inevitably differ between OS X and our setup. The fix for "reowning" copied files mentioned above is specific to rsync, and only works when copying from the USB drive; those using unison, and those copying to HFS+, may have to investigate "man chown" instead for retrofixing ownerships. If there's a corresponding spell to cast over unison itself, or if the Linux HFS+ drivers do some esoteric user-identity mapping trick at mount time, please pass the information on to me for inclusion in future iterations of this Tip.

  • You will, of course, need a copy of whichever you use at both ends of the transaction. Our standard Debian desktop setup installs both, but they may well not be present by default on (eg) Mac OS X or your own personal Linux laptop, and definitely won't be there by default on ( ghasp) MS-Windows. For unison, you'll also have to have a fully-functional ssh setup, which for MS-Windows means installing Cygwin.
Topic revision: r7 - 03 Sep 2008 - 09:26:16 - RyanHoughton
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback