Using AWK to Download and Unpack Drupal Modules

When installing a new Drupal site (or when your list of available updates gets nice and long), you'll often have to download tons of modules, unpack them, and copy all of the resulting directories to your sites/all/modules directory. Personally, I'm not a fan of all the clicking, downloading, unzipping and most of all waiting!

Today I finally settled on a workflow that gets the job done, and it's called the UNIX command line. If your server doesn't use some flavor of UNIX or Linux, or if your web host doesn't allow you shell access, you may want to stop reading after the next paragraph.

Check to see if you can SSH in to your server by logging in via the Terminal:
(Windows kids, grab PuTTY)

[yourterminal]$ ssh [email protected]
[email protected]'s password: 
Last login: Mon Jun 16 12:34:48 2008 from myserver
[user@host ~]$

If you can log in, great. You might get a message prompting you to add the server's RSA fingerprint to your list of known hosts. If you do, type "yes" and return. Of course, you'll want to navigate to your modules directory. Your web site's root may be located elsewhere, but on my server they are under the home directory of the user.

[user@host ~]$ cd public_html/sites/all/modules
[user@host modules]$

Next, you need a list of all the files you'd like to grab from the Drupal.org server. You can create one with vi on the command line or perform this next step manually. Make sure they are one per line. Create a file called downloads.txt with one URL per line (gotcha: if you create this file in a text editor on your windows or macintosh, try adding a space or tab to the end of each line). In this case we will be updating customerror and date:

http://ftp.drupal.org/files/projects/customerror-5.x-1.1.tar.gz
http://ftp.drupal.org/files/projects/date-5.x-2.0-rc.tar.gz

Now we will use the cat command to print the urls on the screen, and the scripting tool awk to grab all of these URLs using wget by piping the commands into bash:

[user@host modules]$ cat downloads.txt | awk '{print "wget " $1}' | bash
-11:23:43--  http://ftp.drupal.org/files/projects/customerror-5.x-1.1.tar.gz
           => `customerror-5.x-1.1.tar.gz'
Resolving ftp.drupal.org... 64.50.236.52, 64.50.238.52
Connecting to ftp.drupal.org|64.50.236.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 90,158 (88K) [application/x-gzip]

100%[====================================>] 90,158       367.30K/s             

11:23:44 (367.00 KB/s) - `customerror-5.x-1.1.tar.gz' saved [90158/90158]

--11:24:07--  http://ftp.drupal.org/files/projects/date-5.x-2.0-rc.tar.gz
           => `date-5.x-2.0-rc.tar.gz'
Resolving ftp.drupal.org... 64.50.236.52, 64.50.238.52
Connecting to ftp.drupal.org|64.50.236.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 154,800 (151K) [application/x-gzip]

100%[====================================>] 154,800      525.96K/s             

11:24:07 (524.28 KB/s) - `date-5.x-2.0-rc.tar.gz' saved [154800/154800]

[user@host modules]$ ls
customerror-5.x-1.1.tar.gz
date-5.x-2.0-rc.tar.gz
downloads.txt
[user@host modules]$ rm downloads.txt

You'll see bunches of such downloads happen if you've done everything right. An ls will reveal the new files you've downloaded. If everything went well, remove the downloads.txt file. If you are upgrading a few drupal sites, you may want to keep and edit this file for your other Drupal installations.

Now that we have the tarballs of all the current modules we'll need to unpack them, and preform any upgrades. Most minor releases of modules don't require running the upgrade script, but check your module's readme files for any special instructions. Also, if you've got modules containing external APIs like SimpleTest or FeedAPI, you might need to make a backup of those downloaded files and remember to copy those files back in after this step.

Now we get to the real fun stuff. awk is a tool that works on one line of text at a time, performs some instructions, prints the results, then starts over again. It's extremely powerful for renaming files and performing batch operations. In this case we're only unpacking two files, but most of my Drupal sites have dozens of modules. We'll be using tar to unzip and unarchive these modules into the proper directories:

[user@host modules]$ ls -1 | awk '{print "tar -xvzf " $1}' | bash

This command (if you use the -v option to tar) will print lots of stuff to the screen, and if you have any existing installations of modules, they will be overwritten, which is why I warned you to back up your externally distributed files.

Next, visit your status report page admin/logs/status to see if you have to run the update.php or if there are any other broken features.

Like your mother said, don't forget to clean up. You've still got some junk files in your modules directory:

[user@host modules]$ rm *.tar.gz

As a final exercise for the reader, you could write a small shell script that combines all of these commands - download using the arguments from the file, unzip everything, and delete the zip files, all by running one command. I'll leave it up to you to take this any further if you like.

Behind the Scenes

Here's a rundown of the commands we used:

| (the pipe): takes the output of a command and feeds it into the input of the next command, like connecting two pipes together
cat: faithfully prints the contents of a text file
wget: downloads files from the internet over HTTP
ls -1: the -1 option prints all the contents of your directory in one column, suitable for awk or grep
tar -xvzf: x means extract, v means give us a verbose list of everything that's being done, z is for zip (or unzip), and f means I'll be telling you a file to act on
bash: provided the input is a valid shell command, calling bash inside of bash will execute a command - These commands often have other output which one can "pipe" on to other commands
awk: this is where the magic happens - programmers will recognize print and $1 (think PCRE Regular Expressions) - the curly braces are the operation to perform

One last bit of awk magic; if we only wanted to unpack modules in Organic Groups we should be able to do some simple pattern-matching to filter through the results:

[user@host modules]$ ls -1 | awk -F= '/og/ {print "tar -xvzf " $1}' | bash

awk can do lots and lots more than this - take a look at some resources online or power users should check out the excellent sed & awk

book by O'Reilly if you want to dive deeper into this subject.

Comments

Wow

Great post. Its nice to see the use of some command line wizardry to improve a point and click workflow. Makes me wonder why we ever put a GUI on anything ;)

FYI, the Drush project

FYI, the Drush project provides similar functionality.

I just use cvs to check out

I just use cvs to check out all my code.
Also makes upgrades much simpler, as you only ever
need to get the changes in the files.

This is pretty cool, and I'm

This is pretty cool, and I'm a command line fanatic myself. So...I humbly suggest the drush module - http://drupal.org/project/drush

I think you'll find it simpler to use and more robust than the methods you've presented here.

To do it with even less work

To do it with even less work at the command line, checkout drush.

CVS checkouts work faster

CVS checkouts work faster and easier, if you are already using SVN for deploying your website. With a single command you can check out a Drupal module, and updating is *much* easier than this method.

I'm sure it's still useful for many though :)

Nice idea! Once setup,

Nice idea!

Once setup, definitely less steps than the manual install/update process.

Have you also seen the 'Drush' module? http://drupal.org/project/drush

drush is a command line shell and Unix scripting interface for Drupal, a veritable Swiss Army knife designed to make life easier for those of us who spend most of our working hours hacking away at the command prompt.

Since finding this some months ago, I have totally changes how I manage all our drupal based sites (well - at least the ones that are 5.x +)!

For example, you can install multiple modules with a single line:
php drush.php pm install views cck date

Along with 'update_status' (http://drupal.org/project/update_status) & CVS deploy (http://drupal.org/project/cvs_deploy), drush makes up my trio of must have initial install modules, everything else can be added via the command line afterwards : )

If you're working in the

If you're working in the bash shell directly, there is no need for pipes and complicated awk black magic, just use a simple for loop:

for url in $(cat downloads.txt); do wget $url; done

and for the untarring:

for tarball in $(ls *.tar.gz) ; do tar -xzvf $tarball ; done

IMO, drush module has made

IMO, drush module has made these scripts obsolete. check out the drush pm install command

drush is another approach:

drush is another approach: drush pm update customerror-5.x-1.1 date-5.x-2.0-rc

Here's some new AWK magic,

Here's some new AWK magic, from my blog:

Let's say you have a bunch of files named "image.jpg.jpg"... ugh!

rprice@server$ ls -1 | \ awk -F\. '/.jpg.jpg/ { print "mv " $0, $1"." $2 }' \ | bash

Now they are renamed to just "image.jpg", that's pretty simple

Next challenge: you have several images which are named "image(1).jpg"... what do you do?

rprice@server$ ls -1 *$1$* | \ awk -F '$1$' '{ print "mv "$0 "(" $1 "" $2 "" $3 "" $4}' | \ awk -F$ '{ print $1 "\\(" $2 ")" $3 "" $4 "" $5}' | \ awk -F$ '{ print $1 "\\)" $2 " " $3 "" $4 "" $5}' \ | bash

Now they are also renamed to just "image.jpg" - you are a winner!

I use some perl scripts

I use some perl scripts similar to this, and they work fine for testing on my test server at home. But deploying my updates to my shared hosting account is still manual.

I configure the site to get served from a directory called 'mysite' which is really just a link to the real file structure called 'mysiteA' with settings.php pointing to a db called mysiteA. I take the prime config offline, deploy to an alternate file structure and db called mysiteB, and change the link to point there. Next time I deploy to mysiteA and switch back.

Does drush+update_status+cvs_deploy work with shared hosting, or would I need to be the admin on my server?
Anybody have a good workflow they'd care to share for using the trio of drush+update_status+cvs_deploy?

Hi Ryan! Is there a

Hi Ryan!

Is there a particular reason for all the awk love? The following would seem more concise;

$ wget -i downloads.txt
$ tar -xzvf *.tgz
$ find -name og*.tgz | xargs tar -xzvf

Kind regards,
Gareth

Um, I like AWK - I know there

Um, I like AWK - I know there are probably some commands that will take care of similar problems, but there are several situations when you need AWK, so I tend to use it more often than not. Thanks for the alternative...