Stephan Adig: 8 Months of Hard Work -> Success

First of all great news:

we are running now with round about 350 hosts on Ubuntu Lucid (10.04 LTS) Server Flavour on Bare Metal (HP Rackmounts DL360/DL365/DL380/DL385 from G5 via G5P, G6 and G7 , HP BladeServers BL465c G5 and G7 with the Flex10 Fabric) and VMWare Machines.

This was not the case until the last weekend.

In the past, we were running Ubuntu Jaunty (9.04) and that had to change, because 9.04 was EOL when Ubuntu Maverick.

Well, normally it would be easy to follow the non LTS releases with do-release-upgrade or apt-get update / apt-get dist-upgrade, but during our tests we found some really strange things.
We are running on Ubuntu many different services and some of them are involving DRBD setups. Especially this DRBD setup gave us problems.

First, in 10.04 LTS no Heartbeat1/2 was existing anymore, so we had to replace all our puppet recipes which are dealing with HA1/2 to pacemaker. This was one of the serious buggers
Second, while we were test-upgraing from 9.04 to 9.10 to 1.04 we found out that during this update all DRBD devices were horribly broken (we don't know why, but they were, and we had no time to investigate).

Therefore, we decided that we have to totally redeploy our Servers during Operational Times from Scratch.

What does this mean:

Setup the whole infrastructure, or update the existing infrastructure to deploy Ubuntu 10.04
Test Deploy VMWare Machines and Bare Metal Test Machines
Test new hardware, especially the BL465C G7 blade servers from HP, because of the new Flex10 Fabric NIC
Test Database Setups with Replications for our Production Services. From 5.0 to 5.1 many things changed. This was crucial for us, because some of our databases are running under high load (IO, CPU and Memory wise)
Test many pacemaker setups, and write puppet recipes for them (pacemaker + ipvs + ldirectord, pacemaker+drbd+mysql, pacemaker+apache2, pacemaker + bind, pacemaker + postfix etc.)
Test FAI Deployment of Bare Metal

Well, the problem with all that, we only had 8 months of time, without interrupting the daily operations.

Result: Many days with too many hours and a lot of brainfck involved.

At this time, when we started this adventure, we were 4 team members, and everybody got a share of the work.

My special topic was: Rewrite the FAIManager I wrote in 2008/2009. The result was DC².

I want to spare you the technical details of this adventure, but it was hard work. Especially when you get new hardware which was really untested, and you find problems during Network Boot Setups.
In the last 5 days, before the big bang started, I had to replace klibcs ipconfig network setup in live-initramfs overlay with udhcpc. This was a success, but it costed work time.

Anyhow, last weekend was the high time for us. We started on Saturday, around 10am (UTC+1) and after 36 hours we were finished.

All of our services are redundant. So, we deployed from scratch the second line of our machines. We tested the product on this second line and when we were sure, that everything worked, we switched from old Ubuntu 9.04 First Line Machines to the newly deployed Ubuntu 10.04 LTS line.
After the switch we re-checked the product services, so we were really sure that everything worked as before.
After the final test, we started to deploy the first line. Sunday evening we were then ready to bring up the newly deployed machines as redundancy.

The last action on this sunday was to drink some beer and smoke a cigar to celebrate our success.

All in all, it was a success, everything worked as expected and the downtime was not more then 30 minutes.

Coming to an end, this project wouldn't have worked out without many people involved.

All OPS team members involved. Without their energy to work day and night this wouldn't have worked out nicely.
All people working for Ubuntu, Debian and especially my dear friends from the FAI project.
A special thanks to Stéphane Graber and the people from the LTSP project, who had already UDHCPd in their initramfs setup, from where I got the idea and parts of the implementation.
The people from the Puppetlabs for their great software, FAI + Puppet are great!
The people from the Qooxdoo Project, this is really a nifty piece of javascript framework
The people from the Django Project, the backend application runs with it
David Fischer for his great rpc4django project, really a cool implementation for xmlrpc and json-rpc
The developer of Googles Chromium Browser, Mozilla Firefox and Firebug
Hewlett-Packard for the great hardware

Stephan Adig: 8 Months of Hard Work -> Success

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112