Server Maintenance July 2006
This article is about the server maintenance I carried out on the 13th of July, 2006. I thought it would be interesting to write down my experiences as a system administrator during the last years maintaining the server which runs Forever For Now and several other websites and services.
I will describe the old situation, pre-July 2006, then write a bit about preparing and moving over the new situation, post-July 2006. But I will start by giving the reasons to do this upgrade and maintenance in the first place since it might not seem obvious as to why to upgrade a running system.
I hope you'll find this article useful, I tried my best to get my experiences written down.
If it ain't broke, don't fix it
I strongly believe in the phrase "If it ain't broke, don't fix it." so it takes a fair amount of good reasons to convince me something is broken before I'll attempt to fix it. In the case of the FFN server I had some fairly good reasons to consider the installation broken beyond repair. Although the system kept on running fine for the outside world, maintaining and supporting it was troublesome.
Before being able to go into details, let me give you a short overview of the server hard- and sofware running before the July 2006 maintenance.
In short, the old server hardware looks like this:
- Intel Pentium III (Katmai) at 450 Mhz
- 192 Mb of PC-100 SD-RAM (64 Mb before the 9th of April, 2004)
- 2x 8.4 Gb Seagate disks in RAID-1
- Realtek 8029 10mbit NIC
The operating system installed on it was Gentoo 1.4, which was released somewhere around December 2003. It has got the usual software installed on it, ranging from Apache, MySQL and PHP to PostgreSQL, Postfix and Courier. It was pretty much a default installation with about a dozen daemons and programs installed.
First reason - disk space
The disk layout was pretty complex as I had written a custom raidtab to have five software RAID-1 volumes mounted at /, /var, /usr, swap and /home. For filesystem optimization this is a good thing, but for disk space usage it requires good planning, especially with a relatively small space of 8.4 gigabytes.
As the popularity of FFN and the other sites grew, the /var partition quickly started taking up more and more space for both website documents and database storage. The daily backup routine also started eating away more and more of the root and backup partitions each day so that I actually needed to transfer them off the machine at least once a week, which is good for offsite backups anyway. But all in all, this lead to the first reason to do something about the server.
Second reason - software updates
During the course of almost three years the Gentoo portage system, which is the main package management feature, started to show more and more cracks and it became apparent to me that my initial installation wasn't able to withstand all the major changes in the package management software.
Countless of times I received errors when using emerge (portage commandline utility) and I think somewhere halfway 2005 portage was broken beyond repair. I couldn't install or update software from then on anymore. That is hardly ideal for a production server and at that point I decided to get on the look out for a long term supported Linux version to upgrade to.
During that time I became accustomed with Ubuntu Linux which I was using on the desktop and set out to learn more about the package management and configuration. I also considered CentOS as a serious competitor, CentOS is the community maintained version of Redhat Enterprise Linux.
Third reason - clock drift
Over the course of 900 days, the actual uptime disregarding short outages, the system clock of the server had drifted two hours so it wasn't reporting the correct time anymore. Unable to install an NTP client the official way and not wanting to fiddle around compiling recent software against legacy libc and other system libraries I had no choice but to watch in agony as the only source of system time information became unreliable.
Fourth reason - a phpBB hack
Somewhere late in 2005 an automated hacking attempt had defaced one of the system's phpBB installations and placed a Turkish message on the forum pages. That was the point I realized again the importance of keeping software up to date. The phpBB installation wasn't managed by portage and since portage was broken anyway it wouldn't have helped a single bit.
But seeing a forum getting defaced made me realize that I had to do something quickly before major exploits became available for the Apache and OpenSSH versions which where installed on the server at that time.
Fifth reason - (software) performance
During the period of planning the server maintenance I decided to test software performance of the major system components like PostgreSQL. The speed improvements in the new 8.1 version compared to the old 7.3.4 I was using were amazing. Due to a lot of query planner optimizations and the introduction of new join algorithms the performance of queries was increased a lot.
All in all: time to fix
Taking all the aforementioned reasons into account, it was apparent I needed to correct the situation. With my recent involvement in the Ubuntu Linux community I decided to use the Ubuntu 6.06 Long Term Support version for my server installation. But I first took the time to look at the server hardware's current track record to see if anything could be improved.
I summed up the major events regarding the server and its infrastructure and I came up with the following list:
- 9th of April, 2004: planned downtime for memory upgrade from 64 to 192 Mb of RAM.
- 9th of December, 2004: unplanned downtime due to power outage caused by heavy snowfall.
- 27th of June, 2005: unplanned downtime due to power outage casued by summer heat.
- 26th of November 2005: unplanned downtime due to power outage caused by snowfall.
That's pretty good with no kernel panics or hardware lockups. The hardware has been running smoothly without a hitch ever since starting the machine in the beginning of 2004. It even served well during a host network move when I moved from ZeelandNet Cable to Wanadoo ADSL. I decided the hardware would still be sufficient, but the hard disk space could use some expansion.
Planning it all
Before jumping in and upgrading the server I decided to do some test runs on the upgrade by using VMware to install Ubuntu 6.06 LTS in advance. During this process it became apparent that I needed to rewrite some old backup scripts and I had to change some software configurations. All in all this process took about two weeks of my spare time to get it all right.
Doing this was very useful and efficient because this way I made sure all my backups and restore files were in good shape before actually trying anything on the server iron itself.
During this time I also had great benefits from the complete documentation I had written during three years on the server configuration and software installations I had performed. I can recommend every sysadmin to keep track of everything you do on a server so you can always browse back and see what was changed when. This is very useful information to say the least.
Besides the software stuff, I also drafted a direction for the new hardware and disk layout. I decided that 30 gigabytes of storage space would be sufficient for the next years so a few days later I had two identical 30 gigabyte Maxtor disks delivered. I didn't have any complaints about processing speed or RAM usage so I decided to reuse the old hardware for the largest part possible.
The maintenance day itself
On the 13th of July itself I had everything ready - I restored my system backups in VMware and started up the virtual server. It was bridged via Glacier to provide seamless fallback from the normal server. At the moment I saw all requests being transferred to VMware I shut down the server and took it apart for disk installation.
I also installed a new Realtek 8139b based card into it which would do 10 and 100mbit transfers. This is useful for transferring files internally, like backups and large documents.
After hardware installation I performed some small benchmarks (disk performance had doubled) and started installing Ubuntu 6.06. I had to restart this process one time because of bad media, so always burn your installation CDs at the lowest speed possible.
After about four hours everything was set and I rerouted all incoming traffic back to the real server. The fact that you are reading this now shows that everything went well.
There's a few things I still need to get installed to be able to say that the transition is complete. I need to find a way to get ViewCVS installed again and I need to find a replacement for Turck MMCache, a PHP caching solution I used before upgrading to Ubuntu 6.06. I looked into APC which seems like a worthy replacement for it.
On the 22nd of October 2006 I reinstalled an online CVS browser in the form of ViewVC. After extensive testing I decided the custom installation was going to fit in nicely and would provide added value for the site.
Summed up advice
There's a lot of things I learned while maintaining the server the last years, but the most important lesson seems to be that planning is the key to success. Investigate your option and especially the long term implications of each option (operating system) before deciding what to use. Gentoo seemed like a good choice back in 2003, but it's development and deployment model doesn't seem to have stood the test of time.
I'm confident that Ubuntu 6.06 will provide a solid base for the next three years of operation. I've been using it on the desktop for quite some time now and being a developer I know my way around the system.
About this article
This article was written on the 15th of July, 2006. I added a link to the new ViewVC browser and corrected some typos on the 23rd of November, 2006.