Project Spelunker

This is a fan-run project attempting to restore the “old” City of Heroes forums back to a readable and searchable state. The old forums were saved from oblivion by the forward thinking people at the Internet Archive shortly before the game was shut down. However, they saved it by using their own version of the standard WARC format. I have created my own personal WARC processing program, and am now calling out for help from my fellow CoHers to assist with this project, as it contains 12,710 WARC files, each containing MANY MANY files within them, to the tune of over 670 GB of data to process. Needless to say, doing this on my own could take months, if not years.

So, how can you help? By “donating” processing time, basically. I’ve created a simple front end for my WARC processor. All you have to do is run my program, type in a nickname/username (to give you credit on a final “Credits” page when the whole process is done), and then click the Process button.

The program will then perform the following steps, on an endless loop until the project is completed (that is, unless you click the Cancel button, in which case the program will automatically stop after the current loop iteration):

  1. An updated file list will be obtained from the server.
  2. The program will choose a random file number to process, then proceed to download that WARC file to your PC in your temp folder.
  3. It will “de-WARC” the file, and extract all of the individual file contents to a subdirectory in your temp folder.
  4. It will Zip up all of those extracted files into a single zip file (along with a “credits” .crd file)
  5. It will upload the zip file to a completed outputs folder on the server.
  6. It will delete all of the local files that it downloaded/created.
  7. The loop will then start again at 1.

Here’s what the program’s interface looks like. Pretty simple, right?

After this phase of the project is complete, I’ll then have to unzip all of these files onto the server into a single directory, then index them all into one or more HTML reference files. Then Google should be able to find and index all of those files. After Google finds and indexes them, they should then be searchable! At least, in theory.

Below you can find the program to download and/or the source code, if you want to investigate it for yourself to ensure that I’m not doing anything nefarious: