High-Availability DHCP


The Dynamic Host Configuration Protocol (DHCP) is at the core of almost every enterprise network forming a mission-critical service that is the cornerstone of reliable and simplifies workstation management. Yet, this service is usually configured as a single point of failure with little thought to high availability or disaster recovery. While longer lease durations are often used to provide some cushion in the event of unavailability, this should not be used as the only protection against system failure. Even in modestly-sized environment, the volume of lease requests and the risk associated with the failure of DHCP beg for a better solution.

Technology Overview

What IS DHCP?

When TCP/IP-based networks were first being developed and used, all network configuration was done manually. IP addresses, subnets and the like were all configured on the individual hosts and had to be manually tracked and documented. This worked for small businesses and controlled environments where constant attention by a single network administrator could be provided or where very tight controls on documentation could be achieved. As these networks grew and larger enterprises adopted the technology, the management burden grew to the point of being unmanageable – a centralized, enterprise solution is needed.

The Dynamic Host Configuration Protocol is the answer to this problem. DHCP provides a service to hosts on an enterprise network that allows the hosts to request and receive IP Addresses, subnet information, and other important configuration information. This also acts as a single enterprise database for this information where a network administrator can access all configuration assignments from one place.

The Role of DHCP in the Enterprise

In a large network, this database and the configuration tools surrounding the service act as a single point of management for all of these data. This means that wide-sweeping configuration changes can be controlled and effected from a single location further reducing the management burden on the network administration staff. Imagine trying to make a simple subnet change on 10,000 workstations by hand!

As an environment grows, the load on the DHCP servers will linearly increase and the addition of branch office sites can create the need for additional DHCP servers to manage these sites. While this handles the load, it introduces additional single points of failure to the enterprise.

Finding DHCP Servers and Scopes

When a workstation needs to find a DHCP server, it will send broadcast packets onto the network advertizing its need for a DHCP server. Any DHCP servers that hear this message will respond advertizing their ability to provide configuration information. The first acknowledgement that the Host receives will be the server that the workstation will use for DCHP in this transaction – all following responses from other DHCP servers are ignored. This is important, as we’ll be exploiting this behavior to provide highly-available DHCP service to the workstations.

For more information on the inner workings of the DHCP Protocol see:

IP Helper-Address

There are some limitations though. Since workstations that are requesting configuration information do not yet have valid IP addresses, they have to rely on network broadcasts to find a DHCP server, request an address, and secure a lease. This means that all of these communications occur on Layer-2 of the network and are not routable – at least not without some help from the network infrastructure. Most enterprise-class routers and switches can forward DHCP packets directly to a specified server in another network or subnet. This is called an IP Helper, or just helper address.

In the Cisco product line, this is specified at the network or subnet level as:

    IP Helper-address <Destination IP Address>

The NAK Poisoning Issue

When designing redundant or highly-available DHCP configurations, there is one additional hurdle that we must overcome – the problem of Negative Acknowledgement (NAK) Poisoning. The goal of installing additional DHCP servers is to provide uninterrupted service with a minimum of management overhead. It may seem intuitive that one would just want to configure a second DHCP server and give it a new scope so that workstations would be able to receive configuration information from either server. Unfortunately, this will result in occasional “quirkiness” where some workstations will not be assigned addresses when they try to renew. This leads to frustrating and difficult to track network problems.

When 50% of the license duration is up, the host will attempt to contact the server from which it originally received the license by renewing. If this is not available or unresponsive, the host will try again later. If 87.5% of the lease duration has expired, the host will broadcast on the network again and will bind to any DHCP server that responds to the request (rebinding). If this server cannot renew the lease, the workstation can end up in a state where is does not have a valid address and cannot participate in network communications.

Let’s look at what happens (translated into English for your enjoyment):

(Client) “Hey DHCP Server, I need to renew my address. 87.5% of my lease time is up.” (Rebinding)
(Server) “I can certainly do that for you. What address would you like to renew.”
(Client) “Well, I have been using 10.1.1.53/24. I’d like to keep using that one.”
(Server) “Hmm, I am authoritative for 10.1.2.0/24. Your address is not in my network. I can’t renew that one.” (This is the NAK)
(Client) “I guess I can’t get an address…”

At this point, the workstation is a bit lost as to what to do. It will continue to use its IP address until the lease is up and will try to renew again later. In some cases, the workstation will find the original DHCP server before the address lease expires, but in some cases, the above communication will continue and the workstation will eventually stop communication when it no longer has a valid address.

Avoiding the NAK

The best way to overcome this is to avoid it altogether. Your DHCP servers don’t have to be able to provide addresses for all scopes, but they should be aware of the scopes so that they can serve those clients. To do this reciprocal exclusions should be used to divide the scope into pieces – let’s look at a 50/50 split for the 10.1.1.1/24 subnet as an example:

 101909_0229_HighAvailab1.png

Now the communication looks like this:

(Client) “Hey DHCP Server, I need to renew my address. 87.5% of my lease time is up.”
(Server 2) “I can certainly do that for you. What address would you like to renew.”
(Client) “Well, I have been using 10.1.1.53/24. I’d like to keep using that one.”
(Server 2) “Hmm, I am authoritative for 10.1.0/24, but I have an exclusion on that address. How about 10.1.1.222?”
(Client) “Thanks! 10.1.1.222 it is.

High-Availability Scenarios

Now we have all the pieces to pull together effective high-availability DHCP solutions to serve our enterprise. Just as there are a number of different network configurations, there are different configurations of DHCP that can be deployed to support them. It should also be noted that that using MSCS clusters to support DHCP isn’t an ideal solution as this tends to work inconsistently and is expensive.

Centralized/ Redundant DHCP

Centralizing DHCP can provide a single point of management for all of your workstation configuration changes and will allow a tightly-focused network administration staff control the whole environment from one point. This kind of configuration works best when you have a single large site or when you are willing to accept the single points of failure associated with your WAN links. Generally, though, this is usually done to provide on-segment redundancy for a single DHCP server.

On a network with only one subnet or router, you will be able to rely on the local network broadcasts to associate servers with workstations – whichever one happens to respond the fastest will become the DHCP server for that request. If you have multiple sites or subnets/ VLANs, you’ll need IP Helper Addresses pointing at both of the servers. Depending on the network hardware, you may find that the order that these are listed has an effect on the balance between the servers. (I have not found this to be the case on Cisco gear though.) In this configuration, you will want to start with a 50/50 split in your exclusions. Over time, you may find that you’ll have to adjust this split to compensate for differences in network speeds and hardware capacity.

Setup Steps:

  1. Plan and diagram out your scopes. You generally want to plan for at least a 50% growth margin for DHCP to accommodate network growth as well as a long-term outage of one of your DHCP servers.
  2. Configure all scopes on both servers
  3. Configure 50/50 reciprocal exclusions on both servers
  4. Configure any manual reservations on both servers
  5. Test failover by disabling DHCP on each server and forcing a renew on the client (IPConfig /release | IPConfig /renew)

Distributed DHCP101909_0229_HighAvailab2.png

In larger environments with many sites, it is important to provide local DHCP services to clients for immediate response to lease requests, but also to provide a centralized backup that is able to server requests in the event that there is a problem with the local server. This removes the risk associated with relying on the WAN links for a mission-important service like DHCP.

In this scenario, we will be relying on the fact that the WAN link is much slower than the on-segment network. The router/ switch must be configured with an IP helper to route the DHCP request to the centralized server, but the time needed to make this round trip will be significantly longer than the time needed to serve the request on the local network.

The sample scenario to the right is a three-site scenario comprised as a main HQ site and two branch offices. In this configuration, the scopes have been configured in an 80/20 distribution with 80% of the available IP address leases residing on the local network and 20% across the WAN as failover. It should also be noted that since the backup DHCP server is the second local server at the HQ site, it is splitting the DHCP load 50/50.

Often, network engineers will choose to only have a single DHCP server per site, but having a separate server at the main site will allow the load to be controlled and avoids a single point of failure at the HQ site. This may seem a bit complicated, but is relatively simple to configure and allow you to build all additional sites against a common design pattern.

Finally, you should let the network design and WAN connectivity act as a guide for the designing of highly available DHCP configurations. If you are set up as a hub and spoke configuration, the solution will be slightly different that in you have a few main sites with spokes off of each. Just make sure that you are taking the entire topology into consideration as you plan the final solution.

Setup Steps:

  1. Plan and diagram out your scopes. You generally want to plan for at least a 50% growth margin for DHCP to accommodate network growth as well as a long-term outage of one of your DHCP servers.
  2. Make sure you understand your network topology and plan your DHCP setup to avoid awkward network hops.
  3. Configure the Primary DHCP servers for each site
  4. Configure all scopes the backup DHCP server
  5. Configure 80/20 reciprocal exclusions between the Branch Office DHCP servers and the Backup server.
  6. Configure any manual reservations on both servers
  7. Configure IP Helper-address commands on the routers/ switches at the branch offices.
  8. Test failover by disabling DHCP on each server and forcing a renew on the client (IPConfig /release | IPConfig /renew)

 

Conclusion

Configuring highly-available DHCP solutions is not complicated or tricky if you understand the technology and plan using your network topology as a guide. Whether it is just configuring a single site failover for DHCP or an enterprise-scoped failover topology, using reciprocal scopes and IP helper-addresses will ensure that DHCP services are always available. One last note, be sure that you are monitoring your DHCP services. Even with HA servers backing up your primary scopes, you want to be sure that you are notified of server problems when then happen so you can react to them quickly. The best failover scenario is one that you never have to use.

Other Links:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s