velospace Downtime Explained and Analyzed
What Happened?
velospace went down hard about a month and a half ago. The site was being hosted on a shared server with a few hundred other sites and at some point velospace took down the server. I have a sneaking suspicion that it was a misconfiguration with the server’s email system - a large number of user registration email addresses that were improperly entered were circulating in qmail and eventually it got to be too much. velospace was sucking up 99%+ of the server resources and crashing MySQL / Apache continuously so the plug was pulled on February 4, 2008.
Why Was the Site Unavailable for so Long?
The old host deactivated the site and I had to scramble to either try to: a) fix the problem in the short term and get the site back up on the existing host, or b) find a new host and hope that my hunch about the old host’s misconfiguration was correct. I did some research in 48 hours following the crash and decided to try Media Temple as a new host. Media Temple has a shared / clustered hosting environment that seemed to fit my needs, and as a bonus they have 24/7 phone support.
I got on a clustered hosting plan about three days after the crash and started to transfer over the velospace archive from the old host. The transfer took a few hours - the photos and site files were about 4GB of data and the database was about 2GB of data. I did a wget from the new server to the old server to save time from pulling the files down to my PC and pushing them back to the new server.
After transferring the files over I got to work on setting up the databases again, fixing a few absolute paths, making sure the new server had the watermarking software installed, and so on. I transferred the domain name over to the new host when I registered the account - this turned out to be a big mistake.
Rather than point the existing velospace.org DNS records to the new host, I transferred the domain name to Media Temple as part of the hosting account creation process. It turns out that it can take up to a week for a domain ownership transfer to take place. I ran into a lot of road blocks trying to get transfer auth codes from my old host, waiting for the new host to approve the transfer, waiting for DNS to update, and so on. A week went by with velospace running on a server that no one could get to because the DNS ownership transfer went so damn slow - talk about frustrating! The site was down for 10 days.
Whats the Deal With the New Host?
I am going to reserve judgment on the quality of Media Temple until I have more experience with their service. The current plan I am on is billed on metered usage based on a blend of CPU and MySQL cycles. This billing model has forced me, in a good way, to optimize the site so that it is less resource intensive. When I go over my monthly allotment of resources I have to pay for what I use. So far I have been able to cut down on inefficient queries and streamline the site in a lot of ways. The end result is a faster website that should be able to scale in the long run.
Any Lessons Learned?
The best lesson I learned from all of this is to have a contingency plan in mind - I had to scramble and figure out what I was going to do because the site went down with no warning. I should have had a plan in mind before velospace got unplugged so I could get the site up and running within a day or two rather than a week or two.
If you have any questions or comments, drop me a line.
- Greg