At work the other day, I ran into a somewhat odd problem. I had a Windows server with nearly 3,000 folders, each containing a PDF file and an empty ZIP file, looking a little something like this:
2ec99d82-4b79-4454-9f4a-3b52c1cc63cc |-- 2ec99d82-4b79-4454-9f4a-3b52c1cc63cc.zip |-- 2ec99d82-4b79-4454-9f4a-3b52c1cc63cc.pdf
Sidenote: In case you’re curious, that long, weird looking string of numbers and letters is called a GUID (globally unique identifier).
The problem was that I needed to compress the PDF file into a ZIP archive, and then move it up one directory, for an end result of this:
2ec99d82-4b79-4454-9f4a-3b52c1cc63cc |-- 2ec99d82-4b79-4454-9f4a-3b52c1cc63cc.pdf 2ec99d82-4b79-4454-9f4a-3b52c1cc63cc.zip
In this case, the why isn’t important. I had a ton of these PDFs I had to compress, one way or another. In situations like this, I do some quick mental math to figure out roughly how long it would take to do the job manually and weigh that against the time and effort of creating an automated solution. I did about 20 of these by hand, and from start to finish (including deleting the empty ZIP file, waiting for folders to refresh, and cutting and pasting the ZIP file), it took about 25 seconds:
25 seconds * 2900 = 72,500 seconds 72,500 seconds / (60 seconds / 1 minute) = 1,208 minutes 1,208 minutes / (60 minutes / 1 hour) = 20.14 hours
Here, it’s pretty obvious that an automated solution was in order, and would take a lot less time than to manually process these files.
Whenever you’re looking to automate something on a computer, command line (i.e., DOS prompt) utilities are your best friends. Linux and Unix are built on top of these types of specialized tools, and with a little effort, you can duplicate most of that power in Windows too.
My solution called for a ZIP utility could be run from DOS – a job filled nicely by my favorite open source file compression tool, 7-Zip. Next, I needed a way to run 7-Zip on each and every PDF file buried within their subdirectories, while ignoring anything else in those same subdirectories. Some quick research brought me to a Microsoft program called Forfiles that did exactly what I was looking for – iterate through a folder or directory tree and do something for each file it found.
To make things a little easier on myself, I copied all of the folders and their files from the server to a Windows XP machine, where I could experiment and work on my solution without the fear of screwing up the original versions. These all went into a folder called unzipped, while the compressed versions would go into – you guessed it – zipped.
After some tinkering for a total of about an hour over the course of two days, I ended up with this working command line script:
C:\>c:\forfiles.exe /P c:\unzipped /M *.pdf /S /C "cmd /c c:\7za a -tzip c:\zipped\@fname.zip @path"
There’s a lot going on in that single line, so I’ll break it down for you piece by piece:
C:> – Because I’m working with multiple programs and directories, I placed everything on the root of the hard drive to keep it simple and reduce confusion.
c:\forfiles.exe – This runs the Forfiles command line program.
/P c:\unzipped – Tells Forfiles the folder path where it should start working
/M *.pdf – Specifies that it should only worry about files that end in a .PDF file extension.
/S – Indicates that it should also look for PDF files within subdirectories of c:\unzipped.
/C – Tells Forfiles to execute the command immediately following and within quotes.
cmd /c – Launches another command prompt specifically for use by the next program.
c:\7za – Kicks off the command line version of 7-Zip.
a – Tells 7-Zip to create an archive file.
-tzip – Indicates that the type of the archive file should be the ZIP format.
c:\zipped\@fname.zip – The ultimate location of the resulting ZIP file. Note that here, @fname is a variable from Forfiles that returns the name of the file it’s working on, minus the file extension.
@path – The full file path of the PDF file that should be zipped. This is also a Forfiles variable.
So, there you have it. It took less than seven minutes for this little script to find and compress all 2,900 PDF files I had. Not bad, considering it would have taken me half of my work week.