Transferring Data between Linux and Mac Systems
Introduction
The section "Using rsync and unison" has been taken as-is from a Top Tip of the Week of considerable antiquity by
MaJoC (who himself is of considerable antiquity.) If anything is unclear, please badger either or both of us (
MaJoC and
ChrisCauser) to reword it, and thereby help us improve this document.
Using rsync and unison
rsync and
unison are file-transfer utilities, with one very useful feature in common: If a file already exists at the destination end, only the differences are actually copied. In particular, if the files are the same,
nothing gets transferred. This is most obviously useful when updating a massive set of files over a slow network, but is also a great timesaver if the transfer is interrupted part of the way through (restart to resume), or if the source fileset may have changed while we were copying it across.
Rsync copies files in one direction:
rsync -avP -e ssh fromname@fromsystem:/fromdir/ todir/
(my favourite inter-system rsync incantation) recursively copies everything in the named remote source directory into the named local destination directory, using SSH for data transport. See "man rsync" for the surprising consequences of omitting the trailing "/" on either directory name.
rsync -rlptvPm /media/usb1/foo/ /data/system/user/bar/
copies from your USB drive to a data-mounted partition; exchange the directory names to reverse the process. Note the extra "-m" argument ("don't overwrite newer files"). Changing "-a" to "-rlpt" means you get to own the copies, even if your system and external USB drive don't agree on numeric IDs (those using USB drives formatted as HFS+ to transport data between Macs and Debians tend to suffer from this).
To test if you have the syntax correct, you can always perform a 'dry run' by adding the 'n' option
rsync -nrlptvPm /media/usb1/foo/ /data/system/user/bar/
This just lists all the files that rsync was planning to transfer without actually changing anything. Useful if you forget about the above '/' issues on dir names or if you plan to (incorrectly) use the delete options.
For more details on rsync, see "man rsync" (rather long: skip forward to the "Usage" section), find your way round "info rsync", or look for documentation under
http://rsync.samba.org/ .
Unison, which is based on rsync, synchronises files in two directions at once. The intention is to keep two or more copies of each file in separate places, and to update each from the other. If one copy of a file has been modified, the default action is to change the other to match; if both have been changed, neither is modified, and the problem is brought to the attention of a responsible adult (you). This is all done by keeping notes of the state of each file at the end of the last transaction. (Rsync by default goes by datestamps, which can be misleading. Unison by default doesn't propagate them; this can be even more misleading, and is definitely a mess to sort out afterwards.)
Synchronising between machines, or between mounted filesystems, is as simple as
unison ssh://farname@farsystem:/fardir /data/system/user/neardir
unison -times /media/usb1/foo /data/system/user/bar
(The latter propagates timestamps.) This is by default interactive. The text-based client is a mite user-hostile; I shall in due course be installing the GUI front end, unison-gtk, to Debian client systems. For more information on unison, see "man unison" (very short) or "unison -doc tutorial | less" (a bit long); or google for unison's home site, where you will also find native MS-Windows and Macintosh client binaries.
CAVEATS
- If you are transferring files to your new MAC, you may want to repeat the comand after a complete transfer or sync; not because rsync doesn't work, but because the MAC disks are currently formatted to a non-case sensitive system, so if you had two files/folders named lib and Lib in the same dir on your linux disk, they are forced to be the same on your MAC disk and so they overwrite each other each time you rsync. Tell tale signs of this bug are that rsync is never happy and insists there are still changes to be made just after a fresh sync. There may be a way round this bug: formatting your MAC disk to a 'MAC OSX extended (case-sensitive) Journaled' system... tests under way.
- If your external USB drive is formatted as FAT32 (alias VFAT, ie as-Microsoft), the person doing the mount gets to own the files. The Bad News is that certain characters (notably ':') tend to appear in names of files produced at certain European observatories, but are forbidden in FAT filenames. Your only reliable choice here would be to create a "tarball" (using tar and gzip), transport that on the USB drive, and untar it at the other end. This is agreed to be awkward: you can't then use the files in-place on the USB drive, nor easily check which files in the tarball are older or newer than your local copy. There's also the minor detail that the maximum size for a file on FAT32 is 4GBytes, as I've recently (March 2008) had brought forcibly to my attention, so you'd need to create N separate tarballs, each under 4G in size. This also is agreed to be awkward. [ Edit by RyanHoughton: you can use the MAC "disk utility" to format a USB disk as extended (journaled) which has no problems with e.g. "file:name" (Aug 2008)]
- Those using external USB hard drives with Mac OS X systems have HFS+ available to them, but the numeric UIDs and GIDs will inevitably differ between OS X and our setup. The fix for "reowning" copied files mentioned above is specific to rsync, and only works when copying from the USB drive; those using unison, and those copying to HFS+, may have to investigate "man chown" instead for retrofixing ownerships. If there's a corresponding spell to cast over unison itself, or if the Linux HFS+ drivers do some esoteric user-identity mapping trick at mount time, please pass the information on to me for inclusion in future iterations of this Tip.
- You will, of course, need a copy of whichever you use at both ends of the transaction. Our standard Debian desktop setup installs both, but they may well not be present by default on (eg) Mac OS X or your own personal Linux laptop, and definitely won't be there by default on ( ghasp) MS-Windows. For unison, you'll also have to have a fully-functional ssh setup, which for MS-Windows means installing Cygwin.