ftpsync.pl

I looked at Christoph Lechleitner's ftpsync.pl today. By the way, I am very surprised that there is no CPAN module that does what I want: mirror my website with ease and comfort. The closest I could find on CPAN is NET::FTP::Recursive, but this is still far from what I want - naturally this module doesn't check if file is already present and the time stamp.

Of course, I would like to use rsync, but like so many others I see way to install at my host. I looked at sitecopy, but that seems to be best when mirroring from local to remote. I want it the other way around: from remote to local. And I like perl anyways. So, I was glad about Mario's blog posting that pointed me to ftpsync.pl. First tests looked promising, then I ran into similar (or the same) problems Mario described. By the way, Mario suspects that ftpsync.pl does not come to terms with remote and local machine having not synchronized their server time. While this true for many simple perl mirror scripts, it is not true for ftpsync.pl. It does calculates the offset!

I synchronize a directory and then without changing anything on the remote folder, I do it again. ftpsync then wants to copy the whole stuff again. Why? I assume that this is because the script checks the file size to determine if the file has changed. [A few lines below I will find out that this is actually not the problem.] Now, my local machine is windows while the remote is linux. This apparently leads to the well-known carriage return problem (\n in *nix becomes \r\n under Windows) as described in this posting. I didn't verify it, but is seems plausible.

ftpsync.pl has a switch to turn of the timestamp check, but not the filesize check. Maybe it needs one? I guess not - you do want some security that the files are really identical. Another alternative seems to be to download the thing in binary. I am not sure if I can read the file in binary in cygwin or windows and I am too lazy too google it right now.

A very similar idea would be check whether binary files should be transferred in binary mode. Two tests: I use a graphical ftp client, transfer a gzip file in binary mode and the size is identical. I can even unpack it and use it. What about a mp3 file? I seem to be able to play it. This leads to the question why ftpsync.pl doesn't at least use binary mode for binary files? Let's call this binary transfer thing our first lead.

Another road (already alluded to) would be to try to correct the error after it already happened: Instead of simply turning off the filesize check if the difference in size can be explained by that carriage return conversion. If one counts the \n's in the local file, a bit offset could be determined in a decent speed, I assume. At least it would not be necessary to downloading the whole remote file.

A slightly misleading aside: Sitecopy requires downloading the remote file completly when using the MD5 identity check as an alternative to size-timestamp. And anyways, if the files are transferred in ASCII mode, then the MD5 should be different anyways. So, this doesn't help, but let's call the attempt to determine the "binary filesize" from an "ascii file" the second possible lead.

For the beginning, I should look at the synchronize mode of other already existing solutions, I guess such as sitecopy. And unison. Hmm. I just don't like sitecopy (because it seems too focused on synchronizing the other way 'round from local to remote) and it doesn't appear to have a windows off-set (lead 2). Probably, it doesn't need one, because it might deals better with binary files (lead 1). Anyways, lead 3 would be to look for alternative solutions. Probably, the thing to start with.

Unison requires to be installed on both machines that need sync. This is similar to rsync. I don't know anything about the ports unison requires. It seems that I might be able to install it, but unsure. Also, wikipedia.org states that unison has problems with special characters. That doesn't sound very good, but also Wikipedia is not really specific in this case so the problem might apply in Chinese, but not in my case. Who knows. Also, unison apparently does not use ftp and it is not under active development at the moment. At least according to wikipedia. Hm. Might work, but very unclear. Sounds too complicated for my case.

Wikipedia has a nice comparison of open source sync solutions: http://en.wikipedia.org/wiki/File_synchronization. This brings me to another tool: WINSCP. It has a GUI, but claims to be usable from command-line as well. I looked at it briefly and actually, it should work. This is a possible solution for lead 3.

Anyways, let's have a look at our lead 1, too. What happens if I simply tell ftpsync.pl to use binary mode for everything. That should definitely solve the file size problem. If the result is useful is another matter. Actually, now that I try to set it binary, I see in the debug mode that it is already set binary. I did give it some strange local path before. Maybe that confuses ftpsync.pl. So my mistake. Mea culpa! Too sleepy to figure out what exactly went wrong. I check for a larger set of dirs and have to wait. Seems to work now. So: ftpsync.pl rules.

This leaves me only with one question: Why is there no NET::FTP::Sync module? It seems kind of unconvenient to call a perl script from a perl script... Well, not too bad. Just wondering.



This is really a quite lengthy procedure. So I keep thinking about the situation that I need this solution for anyways. Checking if transfer is necessary takes its time and the transfer itself, too. So that I definitely should try to avoid to transfer too much. So back to drupal. I don't need to mirror drupal core. At least not very often. It's probably good to have it in an appropriate version on the local copy. That leaves me with the sites directory basically. Since I use a multisite installation, a natural way would be to split up the various sites. I guess,
I need the following packages then:
sites/all
sites/default
sites/www.mauricemengel.de.esem
sites/www.mauricemengel.de.ilkar
sites/www.mauricemengel.de.musikethnologie


By the way, maybe you want to have a look at this excellent, but age-old article on early ftpsync.pl
No votes yet
Theme provided by Danetsoft under GPL license from Danang Probo Sayekti