Thursday, May 22, 2008

Lessons from Extreme Data Center Makeover

Several months ago, I blogged about our OpenWorld hands-on lab set up project, which I referred as “Extreme Makeover, Data Center Edition”, as we had to set up a complex environment with a large number of complex enterprise class software in very short amount of time. It was both a fun and stressful project, and I learned a couple lessons from the exercise. I planned to blog about the lessons, but keep on postponing as other topics, from Gartner Conference to EM release to Collaborate took precedence. Well, here is a belated follow-up to the original post.

Lesson #1 – No project is too small when it comes to applying good IT practices

“How hard could setting up a demo environment be?” - that was the initial thought that came across my mind. However, it became very apparent very soon that it was a serious project with all the attributes of a real deployment. For example, after we came back from lunch on day 2 of setup, we found that one of the servers could no longer speak to the network. That was weird, as it worked just before lunch. After checking the network cable connection to the machine and agonizing all the network configuration parameters on the box, we discovered that it was actually a change that someone made on the network switch during lunch that caused the problem. We wasted half the afternoon troubleshooting. A little bit of discipline in the form of configuration management would have prevented that problem.

Lesson #2 – Be very careful about making assumptions

When we specified the hardware spec of our demo environment, we put down the usual requirements about CPU, memory, disk space, etc... What we did not specified, and assumed that we would get, were DVD drives on the server machines. We got CD-ROM drives instead, and we lost at least a day of time from this simple omission.

Lesson #3 – Expect the unexpected

A 5.6 earthquake hit the San Francisco Bay Area on one night while we were going to system test our client-server connection. That disrupted our work for the night as we didn't feel safe working in a mid-rise building not knowing whether there would be more shaking to come. I am not sure how to plan for something like this, but almost all projects run into “unforeseen” difficulties that are very hard to predict. In our case, there was very little that we could have done other than working longer hour the next day to make up for the lost time. If we had more time to work with at the beginning, we would have built extra buffer time into the schedule.
The examples above might seem trivial, but they introduced days of delay when their cumulative effects were added together. We managed to pull the project through thanks to hard work by the whole team, and we will have to keep these lessons in mind when we set up for the next OpenWorld or other similar projects.

No comments: