Rsyncing to New Heights in Linux Lore

General discussion about Linux, Linux distribution, using Linux etc.
Post Reply
osmansiddiq
Battalion Havaldaar Major
Posts: 275
Joined: Sun Oct 30, 2005 1:48 pm

Rsyncing to New Heights in Linux Lore

Post by osmansiddiq »

Rsyncing to New Heights in Linux Lore
Date: Apr 21, 2006 By John Tränkenschuh.
Need to transfer large files? Open source guru John Tränkenschuh explains why rsync is the wonder tool of the Internet and why it's better than FTP.

After working with Cygwin, you may decide to run a dedicated Linux partition. Many distributions exist out there, and it’s tough to make a decision. One very interesting distribution is called White Box Linux, available at http://www.whiteboxlinux.org. It is descended from RedHat Linux products but has some unique features, like a new installation procedure.

There is a problem, though. The website provides four CD ISO images for your download. But that is harder than you might think. You really, really want to try out a new Linux distribution, but it simply won’t download well. That happened to me. The download starts well and then gets radically slower. White Box Linux has a site as speedy as any other; no, the problem is on my end. My satellite vendor really cranks down on the bandwidth once 100MB or so is downloaded. Then it offers a small ray of hope: Use a Download Manager.

Download Manager? Oh yes, there are many, and they are free. However, the last free Download Manager I got for my Windows system set off adware and spyware alarms. Also, those tools work with FTP. Everyone seems to hit sites with FTP or browser downloads. I want an edge—a way to connect that’s unique. I want something that will quickly determine the best way to resume transferring a file. After all, I’ll be downloading nearly 3GB of data in four large files.

And that’s when it hit me like a load of bricks dropped from an overpass: Cygwin allows me to use Open Source tools. I resolved to try downloading my CD images via rsync and my Cygwin environment. I will really put my Cygwin environment to good use. So what is rsync and why is it better than FTP?

Rsync is the wonder tool of the Internet for planned file transfer. It can send only the changes made to a file, saving you the time lost retransmitting existing file content. It can replicate entire directory trees recursively. It can send files with on-demand file compression. This is excellent if you must copy large files with little bandwidth. Want security? It works well within ssh tunnels, allowing you to securely automate with digital keys versus passwords.

Because of these and other benefits to the server administrator, many Open Source repositories offer anonymous rsync reads. This is good for you. You can be one of the few rsync users among all those FTP users. That gives you a less busy connection to the server.

This article offers information on rsync when using your Cygwin system. It empowers you from restrictive file transfer managers that may come with hidden backdoors and track your moves. The article even shows off Linux and Cygwin power. To make this work, you need the ability to:

* Use the Cygwin command line well. Specifically, you must know how to navigate file systems using relative and absolute file paths.
* Plan where you want to place your retrieved files.
* Type long commands and command recall reliably.
* Know a great tool when you see one.

Let’s begin with the basics. Where do you get good information on rsync? You might point your browser to http://rsync.samba.org. While there, download a few of the documents. Specifically, you should visit http://rsync.samba.org/documentation.html and http://rsync.samba.org/examples.html.

Now that you have some background on rsync, let’s begin a transfer from White Box Linux. This article helps you accomplish the following steps:

1. Ensure adequate storage space.
2. Create and then navigate to the storage directory.
3. Experiment with rsync and many of its common switches.

It is really that simple to get the goods you need—without using suspect download managers and FTP.
Got Storage?

I’ve known several people who ran into system problems as their disk free space got low. Windows ’swap’ storage space isn’t a separate partition, and when swap gets low, Windows has problems. I’ve seen print jobs with illustrations that take megabytes of temporary storage. In short, you need 3GB of storage to maintain storage capacity for swap as well. If you have only 3.5GB of free space left, maybe using your machine during the download, you may face system lockups. Ensure plenty of free space and maybe run Disk Defrag so that the new big files are stored efficiently. Maybe buy a cheap external disk drive that interfaces via something fast, like FireWire and USB 2. Once this is done, you’re ready to build your storage depot.
Getting Ready to Receive the Files

You need to type out the server name and the filename to download and where the received files need to go and the filename used for storing the files. And don’t forget to type the entire path for everything! We need security access, too. So where will we store them? Most of us have full rights in the My Documents folder. That folder is nested three directories or more into your file system! Typing all that will be a tough job. So let’s begin by building and then navigating to your storage depot. If you execute rsync while in the storage depot, you can instruct rsync to put the files in the current directory and use the same filenames, saving you many keystrokes.

So let’s create a directory in the My Documents directory. You can navigate to it simply in Cygwin. In Cygwin, your drives branch from a common directory called /cygdrive. Also, Cygwin has Linux’s directory name completion. If you type the first few letters of a directory and press the Tab key, you can cycle through the directories that match those letters. This saves a lot of typing and also helps you type directory names that have spaces in them—correctly.

NOTE

The space character must be "escaped" with a special character, or Cygwin will think you are indicating two directories. Confused? Linux lessons do that, so just work through the lessons for now.

So, to get to your account’s My Documents folder, type a command similar to this:

cd /cygdrive/c/Docu[TAB]uments\ and\ Settings/{your ID name}/My[TAB]\ Documents/

and press Enter.

The Tab key is your command-line friend. You are in the correct directory and can use Cygwin to create the new directory. This keeps permissions and ownerships consistent. Use a simple name and then navigate into your directory.

mkdir whitebox && cd ./whitebox

This is a fancy way to link two commands such that the second is executed only if the first command completes. Type the pwd command. Are you where you want to be? If so, you’re ready to learn rsync itself.
Learning && Trying Rsync

It’s time to get to the good stuff: those CD images for Whitebox! Let’s begin by reviewing the manpage for rsync. Let’s create a copy:

´man rsync | col –b > rsyncmanpage.wri´

This command takes the heavily formatted manpage information, strips obnoxious characters, and then sends the text to a new file. The wri extension tells Windows to open the file with WordPad/write.exe, an editor that understands UNIX line endings. In fact, because the Cygwin shell can execute Windows binaries, run this command:

´write.exe rsyncmanpage.wri´

There! Now you have a copy of the documentation available as you work through the complex commands. What? You didn’t get this far? Missing rsync or col? At this point, you need to run the Cygwin setup.exe and ensure you have the tools you need.

As you review your rsync manual, you see it supports both rcp and URL syntax for connections. Because very few people know rcp syntax, given it is an accursed tool of uncertain and troubled security lineage, not unlike Sauron’s ring, let’s focus on using rsync in a simpler way.

The whiteboxlinux.org website points out the images can be retrieved a few rsync repositories. I’m going to use rsync://ftp.esat.net/mirrors/whiteboxlinux.org. Let’s start by listing files, much like an FTP site. We need to descend into directories, too. I will use the --list-only parameter to see the files and directories within the site name. I will end the URL with a directory marker / to indicate I want to see directory contents.

NOTE

Sometimes sites use a special file called a link to lead you to a directory. In these cases, you only see the link file and not the contents of the directory!

rsync --list only rsync://ftp.esat.net/mirrors/whiteboxlinux.org/
drwxr-xr-x 38 2005/05/04 18:44:22 .
drwxr-xr-x 15 2003/12/15 03:05:06 3.0
drwxr-xr-x 15 2005/05/04 18:44:42 4
drwxr-xr-x 21 2005/05/05 19:58:45 contrib

You now see the directories for the versions. Let’s try version 4. That means we add a 4, the directory name, at the end of our command.

rsync --list-only rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/
drwxr-xr-x 15 2005/05/04 18:44:42 .
drwxr-xr-x 91 2005/05/04 18:45:27 en

Now it’s time to select the en or English version, so add en/ to your command.

rsync --list-only rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/en/
drwxr-xr-x 91 2005/05/04 18:45:27 .
drwxr-xr-x 59 2005/05/24 15:15:36 debuginfo
drwxr-xr-x 35 2005/05/24 15:15:02 extras
drwxr-xr-x 43 2005/05/05 19:59:11 iso
drwxr-xr-x 6 2005/05/04 18:45:01 obsolete-updates
drwxr-xr-x 30 2005/05/04 18:45:57 os
drwxr-xr-x 95 2005/05/24 17:03:38 updates

We want the ISO/CD images, so add that and proceed:

$ rsync --list-only rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/en/iso/
drwxr-xr-x 43 2005/05/05 19:59:11 .
drwxr-xr-x 4096 2005/05/05 08:33:52 i386
drwxr-xr-x 4096 2005/05/05 20:00:24 source
drwxr-xr-x 4096 2005/05/05 20:00:01 x86_64

Got a dual-core chip? Want to use gcc to compile from source? Confused by my questions and just want to run a simple copy of WhiteBox? Just navigate into the i386 directory:

$ rsync --list-only rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/en/iso/i386/
drwxr-xr-x 4096 2005/05/05 08:33:52 .
-rw-r--r-- 651507712 2005/05/04 19:56:25 manifestdestiny-binary-i386-1.iso
-rw-r--r-- 668731392 2005/05/04 19:57:23 manifestdestiny-binary-i386-2.iso
-rw-r--r-- 656910336 2005/05/04 19:58:20 manifestdestiny-binary-i386-3.iso
-rw-r--r-- 364388352 2005/05/04 19:58:51 manifestdestiny-binary-i386-4.iso
-rw-r--r-- 508 2005/05/04 19:58:51 manifestdestiny-binary-i386-md5sums

Ah! There they are, the ISO images we need. We need to transfer these to our repository. Now let’s run the command needed to transfer the md5sum file.

NOTE

MD5Sums ensure what you downloaded hasn’t been changed or corrupted during downloading. The file lists the "signatures" for each ISO/CD image. You compare the signature in the file to the calculated MD5Sum of your downloaded image file. This is simple to do in Cygwin:

grep 1.iso manifestdestiny-binary-i386-md5sums && md5sum ./manifestdestiny-binary-i386-1.iso
39131c4e570ac8368b7819b3fe783274 manifestdestiny-binary-i386-1.iso
aedb34d89f388540d33a4bf4ff1dd072 *./manifestdestiny-binary-i386-1.iso

NOTE

The command lists the line of text for the first ISO image && then calculates the signature for an incomplete download I did earlier. As you can see, the two signatures don’t match.

This MD5sum file is a great test of neat command-line parameters we’ll set. Speaking of which, spend some time learning from the many great examples rsync provides in the manpage document.

As you review rsync’s pages of options, you’ll find that some are perfect for picking up a very large file, possibly in several tries. The -n option lists what actions rsync will take for a complicated set of options, without performing the transfer. This is great for testing a lengthy transfer will work as planned. The --partial option is very nice. Be default, rsync deletes a partially retrieved file. By adding this option, we preserve the existing data and let rsync add to it. I like the --progress option because it lets me know when the transfer is stalled due to some issue, like an ISP throttling my bandwidth. Wait a minute! A simple -P implements the two P’s of Progress and Partial! So, I should be able to try download the file with a simple -P!

NOTE

Note the case! This is Linux, and a capital letter is a requirement, not an option.

So let’s try a transfer. I type rsync, the -P option, the full URL, the path, the filename I want, and last, the destination directory, or ’.’ because I want it placed right where I am—My Storage Depot.

NOTE

If you’re concerned about the long command wrapping to the next line, Linux uses the \ character to indicate the rest of the command is one line down. Use it if needed.

Me? I just type away and ignore the wrapping. So first, let’s get the md5sums file:

$ rsync -P rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/en/iso/i386/manifestdestiny-binary-i386-md5sums .
manifestdestiny-binary-i386-md5sums
508 100% 496.09kB/s 0:00:00 (1, 100.0% of 1)

sent 168 bytes received 648 bytes 96.00 bytes/sec
total size is 508 speedup is 0.62

That was fast and easy. Look at my download speed of 496.09KB/sec. That’s due to compression my ISP provides and the fact it seems few are using rsync. The evil plot works! Now for CD 1:

$ rsync -P rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/en/iso/i386/manifestdestiny-binary-i386-1.iso .
manifestdestiny-binary-i386-1.iso
17692144 2% 3.79MB/s 0:02:43

So, look at the bandwidth we’re achieving now! That’s more than 3MB/sec! We haven’t even turned on compression for rsync itself!

Ok, I’ll be honest. I stopped the first transfer by pressing Ctrl-C, the "stop the application NOW!" keystroke. This illustrates the resumed file transfer moves very fast over previously downloaded content.

So it won’t be long. My bandwidth restriction will kick in, and I’ll go from a speedy connection at 100KB/sec to a lousy 3 or 4KB/sec. I’ll use the control-c sequence to stop the transfer. I will resume the transfer later. This shows what a good Ctrl-C produces:

$ rsync -P rsync://ftp.esat.net/mirrors/whiteboxlinux.org/4/en/iso/i386/manifestdestiny-binary-i386-1.iso .
manifestdestiny-binary-i386-1.iso
rsync error: received SIGUSR1 or SIGINT (code 20) at /home/lapo/packaging/tmp/rs
ync-2.6.6/rsync.c(163)
rsync error: received SIGUSR1 or SIGINT (code 20) at /home/lapo/packaging/tmp/rs
ync-2.6.6/rsync.c(163)

This article could go on and on. Unfortunately, it cannot. For now, though, download those CDs, in stages if you must. You know the command and the techniques to do so quickly and easily. Well, as quickly as you ISP allows, that is. From there, burn the ISO images to a blank CD.

Hmmm, there was one last thing I wanted to add. Something about typing all those very long command line items. Those very long items that would bring odd errors if you so much as missed a dash. I bet you had a rough time retyping all that stuff. What am I forgetting? If you press the Up arrow key, Cygwin should recall your last command perfectly, keeping you from endless retyping and fumbling. Wow, and to think you went through all that typing when one simple keystroke would help you out...

http://www.informit.com/articles/printe ... p?p=462525
soni
Naik
Posts: 70
Joined: Sat Oct 04, 2003 1:44 pm
Location: Karachi
Contact:

Post by soni »

Everyday we the users of the Gentoo/Portage do the Rsyncing.
fawad
Site Admin
Posts: 918
Joined: Wed Aug 07, 2002 8:00 pm
Location: Addison, IL
Contact:

Post by fawad »

osmansiddiq, the article is great. However, could I ask you to just paste a snippet from the article and link to the original article? Copying the whole article verbatim off another website might not be in the terms of use for the other site, and is generally considered in bad taste.

Regards
LinuxFreaK
Site Admin
Posts: 5132
Joined: Fri May 02, 2003 10:24 am
Location: Karachi
Contact:

Re:

Post by LinuxFreaK »

Dear osmansiddiq,
Salam,

I have no idea why do not you use Gup Shup Forum or http://www.linuxpakistan.net/news/ ?

Can some one please move this to Gup Shup.

Best Regards.
Farrukh Ahmed
osmansiddiq
Battalion Havaldaar Major
Posts: 275
Joined: Sun Oct 30, 2005 1:48 pm

Post by osmansiddiq »

sorry LinuxFreaK in future i,ll post in gup shup

also
a big sorry to the admin Fawad in future i,ll paste a snippet from the article and link to the original article?
although i do try to put link at the end
i have posted quite a few other posts in last couple of days and i did not follow the rules mentioned above the reason being that i had,nt read these posts and replies coze i could,nt do so-- i am on a cable net in a hostel and the service sucks ,all the pages visited are put in the cache and i,m not able to read any posts ,sometimes i,m not even able to log in linuxpakistan for a week on end its only when the cable provider does maintenance ,clears cache that i can read whats happening
as an example i go to bbc news and from 14 till today could only view news from 14of april
lambda
Major General
Posts: 3452
Joined: Tue May 27, 2003 7:04 pm
Location: Lahore
Contact:

Post by lambda »

so don't use the cable modem, get a dialup account instead.

and fix your punctuation.
osmansiddiq
Battalion Havaldaar Major
Posts: 275
Joined: Sun Oct 30, 2005 1:48 pm

Post by osmansiddiq »

dear lambda
can,t get a dial up as have no telephone line -live in a hostel
i thought cable net was supposed to be superior to dial up
anyone know of a good cable net provider in karachi ,to be specific dow univiersity of health sciences near civil hospital karachi
lambda
Major General
Posts: 3452
Joined: Tue May 27, 2003 7:04 pm
Location: Lahore
Contact:

Post by lambda »

osmansiddiq wrote:i thought cable net was supposed to be superior to dial up
you're using the wrong type of cable internet. try worldcall for the right type.
Post Reply