Stephan Hermann: Thoughts about a loadbalanced DNS setup

During the last days, we had a discussion about how we provide a simple, but always available DNS Server setup.

Right now, most of the domains are served by public and internal DNS servers of our office IT.
So, to not fiddle around with zone files, I thought it would be a good idea to have at least two DNS slave servers, which are filled by the master servers of our office IT.

These two slave DNS servers will serve all hosts and zones for our DCs (hence, there is just more then this setup), behind a IVPS/LVS loadbalancer.

This is the inkscape graph I came up with:

The IPVS/LVS Loadbalancer itself is an active/passive pacemaker cluster.

So I sat down and created a proof of concept on a testing VMWare ESX Server.

I created four machines, 2 DNS, 2 IPVS.

The used software for the two IPVS/LVS machines are:

Ubuntu 10.04 (aka Lucid Lynx)
pacemaker
corosync
ipvsadm
ldirectord

The used software for the two DNS Servers are:

Ubuntu 10.04 (aka Lucid Lynx)
Bind9

The network is setup like this:

DNS-01: 192.168.10.10
DNS-02: 192.168.10.11
LB-DNS-01: 192.168.10.20
LB-DNS-02: 192.168.10.21
DNS-Server VIP: 192.168.10.100

Starting with the easy part of this setup:

Installation of the two DNS Server:

Writing a Puppet recipe for the two DNS Servers, for setting up the production network, install the DNS software, deployment of the bind9 configurations, especially to provide the slave zones.
Provisioning of the two DNS Server machines in (DC)²
Deployment via FAI, Puppet runs inside the FAI install run

So, from preparation over provisioning to the running production system this took round about 30 minutes.

Now for the IPVS/LVS Loadbalancers.

This wasn't trivial, because of pacemaker and the strange configuration of it.
Therefore I decided to do the default deployment via FAI and Puppet but left the special Pacemaker part out of the test setup.

So here it goes:

Corosync configuration
I just used the default config and just adjusted the totem/interface section:

LB-DNS-01:

interface {
   # The following values need to be set based on your environment
   ringnumber: 0
   bindnetaddr: 192.168.10.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
   }

LB-DNS-02:

interface {
   # The following values need to be set based on your environment
   ringnumber: 0
   bindnetaddr: 192.168.10.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
   }

Now you edit /etc/default/corosync and set the environment variable "START" to "yes".

When everything is ok (you should just check /var/log/syslog), you have a clean pacemaker cluster. If you are using some special Cisco or Juniper Network devices, don't forget to enable multicast on the connected ports (as I have two ESX Machines under control, I had to do that on the host access port of our Cisco switch).

Now for the pacemaker CIB setup.

I don't use any stonith resources, to make that clear before you ask, I don't trust the external/vmware stonith agent. And regarding the test setup I'm working on, this is also not necessary.

To avoid no failover without stonith agents, you have to tell pacemaker to

stonith-enabled: false
no-quorum-policy: ignore

Without those settings, you won't see a failover while a node is disappearing.

So, here is my excerpt of the CIB:

<crm_config>
        <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair id="cib-bootstrap-options-stonith-disable" name="stonith-enabled" value="false"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>        
      </cluster_property_set>
</crm_config>
<resources>
        <clone id="resDNSClone">
                <meta_attributes id="resDNSClone_Meta">
                        <nvpair id="resDNSClone_meta_max" name="clone-max" value="2"/>
                        <nvpair id="resDNSClone_meta_node_max" name="clone_node_max" value="1"/>
                </meta_attributes>
                <primitive id="resDNSClone_ldir" class="ocf" provider="heartbeat" type="ldirectord">
                        <operations>
                                <op id="resDNSClone_ldir_op" name="monitor" interval="30s" timeout="10s"/>
                        </operations>
                        <meta_attributes id="resDNSClone_ldir_meta">
                                <nvpair id="resDNSClone_ldir_meta_threshold" name="migration-threshold" value="10"/>
                                <nvpair id="resDNSClone_ldir_meta_target" name="target-role" value="Started"/>
                                <nvpair id="resDNSClone_ldir_meta_fail" name="on-fail" value="standby"/>
                                <nvpair id="resDNSClone_ldir_meta_quorum" name="required" value="quorum"/>
                        </meta_attributes>
                        <instance_attributes id="resDNSClone_ldir_attribs">
                                <nvpair id="resDNSClone_ldir_config" name="configfile" value="/etc/ha.d/ldirectord.cf"/>
                        </instance_attributes>
                </primitive>
        </clone>
        <clone id="resDNSPingd">
                <primitive id="resDNSClone_Pingd" class="ocf" provider="pacemaker" type="pingd">
                        <instance_attributes id="resDNSClone_Pingd_attribs">
                                <nvpair id="resDNSClone_Pingd_attribs_hostlist" name="host_list" value="172.24.24.1 172.24.24.20 172.24.24.21"></nvpair>
                                <nvpair id="resDNSClone_Pingd_attribs_dampen" name="dampen" value="5s"></nvpair>
                                <nvpair id="resDNSClone_Pingd_attribs_multiplier" name="multiplier" value="100"></nvpair>
                                <nvpair id="resDNSClone_Pingd_attribs_interval" name="interval" value="2s"></nvpair></instance_attributes></primitive>
        </clone>
        <group id="resDNSLB">
                <meta_attributes id="resDNSLB_meta">
                        <nvpair id="resDNSLB_order" name="ordered" value="false"/>
                </meta_attributes>
                <primitive id="resDNSIP" class="ocf" provider="heartbeat" type="IPaddr2">
                        <operations>
                                <op id="resDNSIP_failover" name="monitor" interval="10s"/>
                        </operations>
                        <meta_attributes id="resDNSIP_meta">
                                <nvpair id="resDNSIP_meta_fail" name="on-fail" value="standby"/>
                        </meta_attributes>
                        <instance_attributes id="resDNSIP_attribs">
                                <nvpair id="resDNSIP_ip" name="ip" value="172.24.24.100"/>
                                <nvpair id="resDNSIP_nic" name="nic" value="eth0"/>
                                <nvpair id="resDNSip_cidr" name="cidr-netmask" value="24"/>
                        </instance_attributes>
                </primitive>
        </group>
</resources>
<constraints>
        <rsc_colocation id="resDNSIP_colo" rsc="resDNSLB" with-rsc="resDNSClone" score="INFINITY"/>
</constraints>

So, this CIB xml you save somehow under cluster.xml and then just do a:

sudo cibadmin --replace -x cluster.xml

on one cluster node and now your cluster should be running, or?

What's missing?

Ah yes, the ldirectord setup, I put this config file under /etc/ha.d/ldirectord.cf (as you can read above), because in past times, this was the location where you found the ldirectord configuration.

ldirectord configuration:

checktimeout=10

checkinterval=15

failurecount=3

negotiatetimeout=10

autoreload=yes

logfile="local0"

quiescent=yes

virtual=192.168.10.100:53

real=192.168.10.10:53 gate

real=192.168.10.11:53 gate

protocol=tcp

scheduler=rr

request="testhost.internal.zone.tld"

receive="172.24.24.10"

service=dns

virtual=192.168.10.100:53

real=192.168.10.10:53 gate

real=192.168.10.11:53 gate

protocol=udp

scheduler=rr

request="testhost.internal.zone.tld"

receive="172.24.24.10"

service=dns

Now, just reboot both boxes, and ssh after reboot into your boxes.

Start the pacemaker monitoring tool via "sudo crm_mon" and just playaround.

Stephan Hermann: Thoughts about a loadbalanced DNS setup

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List