Quantcast
Channel: Planet Ubuntu
Viewing all articles
Browse latest Browse all 17727

Stephan Hermann: Thoughts about a loadbalanced DNS setup

$
0
0
During the last days, we had a discussion about how we provide a simple, but always available DNS Server setup.

Right now, most of the domains are served by public and internal DNS servers of our office IT.
So, to not fiddle around with zone files, I thought it would be a good idea to have at least two DNS slave servers, which are filled by the master servers of our office IT.

These two slave DNS servers will serve all hosts and zones for our DCs (hence, there is just more then this setup), behind a IVPS/LVS loadbalancer.

This is the inkscape graph I came up with:

The IPVS/LVS Loadbalancer itself is an active/passive pacemaker cluster.
So I sat down and created a proof of concept on a testing VMWare ESX Server.

I created four machines, 2 DNS, 2 IPVS.

The used software for the two IPVS/LVS machines are:

  1. Ubuntu 10.04 (aka Lucid Lynx)
  2. pacemaker
  3. corosync
  4. ipvsadm
  5. ldirectord
The used software for the two DNS Servers are:

  1. Ubuntu 10.04 (aka Lucid Lynx)
  2. Bind9
The network is setup like this:

  • DNS-01: 192.168.10.10
  • DNS-02: 192.168.10.11
  • LB-DNS-01: 192.168.10.20
  • LB-DNS-02: 192.168.10.21
  • DNS-Server VIP: 192.168.10.100


Starting with the easy part of this setup:

Installation of the two DNS Server:

  1. Writing a Puppet recipe for the two DNS Servers, for setting up the production network, install the DNS software, deployment of the bind9 configurations, especially to provide the slave zones.
  2. Provisioning of the two DNS Server machines in (DC)²
  3. Deployment via FAI, Puppet runs inside the FAI install run
So, from preparation over provisioning to the running production system this took round about 30 minutes.

Now for the IPVS/LVS Loadbalancers.

This wasn't trivial, because of pacemaker and the strange configuration of it.
Therefore I decided to do the default deployment via FAI and Puppet but left the special Pacemaker part out of the test setup.

So here it goes:


  • Corosync configuration
    I just used the default config and just adjusted the totem/interface section:
    • LB-DNS-01:
      • interface {
                        # The following values need to be set based on your environment
                        ringnumber: 0
                        bindnetaddr: 192.168.10.0
                        mcastaddr: 226.94.1.1
                        mcastport: 5405
                }
    • LB-DNS-02:
      • interface {
                        # The following values need to be set based on your environment
                        ringnumber: 0
                        bindnetaddr: 192.168.10.0
                        mcastaddr: 226.94.1.1
                        mcastport: 5405
                }
Now you edit /etc/default/corosync and set the environment variable "START" to "yes".

When everything is ok (you should just check /var/log/syslog), you have a clean pacemaker cluster. If you are using some special Cisco or Juniper Network devices, don't forget to enable multicast on the connected ports (as I have two ESX Machines under control, I had to do that on the host access port of our Cisco switch).

Now for the pacemaker CIB setup.
I don't use any stonith resources, to make that clear before you ask, I don't trust the external/vmware stonith agent. And regarding the test setup I'm working on, this is also not necessary.
To avoid no failover without stonith agents, you have to tell pacemaker to 
  1. stonith-enabled: false
  2. no-quorum-policy: ignore
Without those settings, you won't see a failover while a node is disappearing.

So, here is my excerpt of the CIB:
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-disable" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
</cluster_property_set>
</crm_config>
<resources>
<clone id="resDNSClone">
<meta_attributes id="resDNSClone_Meta">
<nvpair id="resDNSClone_meta_max" name="clone-max" value="2"/>
<nvpair id="resDNSClone_meta_node_max" name="clone_node_max" value="1"/>
</meta_attributes>
<primitive id="resDNSClone_ldir" class="ocf" provider="heartbeat" type="ldirectord">
<operations>
<op id="resDNSClone_ldir_op" name="monitor" interval="30s" timeout="10s"/>
</operations>
<meta_attributes id="resDNSClone_ldir_meta">
<nvpair id="resDNSClone_ldir_meta_threshold" name="migration-threshold" value="10"/>
<nvpair id="resDNSClone_ldir_meta_target" name="target-role" value="Started"/>
<nvpair id="resDNSClone_ldir_meta_fail" name="on-fail" value="standby"/>
<nvpair id="resDNSClone_ldir_meta_quorum" name="required" value="quorum"/>
</meta_attributes>
<instance_attributes id="resDNSClone_ldir_attribs">
<nvpair id="resDNSClone_ldir_config" name="configfile" value="/etc/ha.d/ldirectord.cf"/>
</instance_attributes>
</primitive>
</clone>
<clone id="resDNSPingd">
<primitive id="resDNSClone_Pingd" class="ocf" provider="pacemaker" type="pingd">
<instance_attributes id="resDNSClone_Pingd_attribs">
<nvpair id="resDNSClone_Pingd_attribs_hostlist" name="host_list" value="172.24.24.1 172.24.24.20 172.24.24.21"></nvpair>
<nvpair id="resDNSClone_Pingd_attribs_dampen" name="dampen" value="5s"></nvpair>
<nvpair id="resDNSClone_Pingd_attribs_multiplier" name="multiplier" value="100"></nvpair>
<nvpair id="resDNSClone_Pingd_attribs_interval" name="interval" value="2s"></nvpair></instance_attributes></primitive>
</clone>
<group id="resDNSLB">
<meta_attributes id="resDNSLB_meta">
<nvpair id="resDNSLB_order" name="ordered" value="false"/>
</meta_attributes>
<primitive id="resDNSIP" class="ocf" provider="heartbeat" type="IPaddr2">
<operations>
<op id="resDNSIP_failover" name="monitor" interval="10s"/>
</operations>
<meta_attributes id="resDNSIP_meta">
<nvpair id="resDNSIP_meta_fail" name="on-fail" value="standby"/>
</meta_attributes>
<instance_attributes id="resDNSIP_attribs">
<nvpair id="resDNSIP_ip" name="ip" value="172.24.24.100"/>
<nvpair id="resDNSIP_nic" name="nic" value="eth0"/>
<nvpair id="resDNSip_cidr" name="cidr-netmask" value="24"/>
</instance_attributes>
</primitive>
</group>
</resources>
<constraints>
<rsc_colocation id="resDNSIP_colo" rsc="resDNSLB" with-rsc="resDNSClone" score="INFINITY"/>
</constraints>


So, this CIB xml you save somehow under cluster.xml and then just do a:

sudo cibadmin --replace -x cluster.xml

on one cluster node and now your cluster should be running, or?

What's missing?

Ah yes, the ldirectord setup, I put this config file under /etc/ha.d/ldirectord.cf (as you can read above), because in past times, this was the location where you found the ldirectord configuration.

ldirectord configuration:

checktimeout=10
checkinterval=15
failurecount=3
negotiatetimeout=10
autoreload=yes
logfile="local0"
quiescent=yes


virtual=192.168.10.100:53
        real=192.168.10.10:53 gate
        real=192.168.10.11:53 gate
        protocol=tcp
        scheduler=rr
        request="testhost.internal.zone.tld"
        receive="172.24.24.10"
        service=dns
        
virtual=192.168.10.100:53
        real=192.168.10.10:53 gate
        real=192.168.10.11:53 gate
        protocol=udp
        scheduler=rr
        request="testhost.internal.zone.tld"
        receive="172.24.24.10"
        service=dns

Now, just reboot both boxes, and ssh after reboot into your boxes.
Start the pacemaker monitoring tool via "sudo crm_mon" and just playaround.


Flattr this

Viewing all articles
Browse latest Browse all 17727

Trending Articles