TFS Custom Links in MS Project 2010

As a project manager, one of the most powerful features of Team Foundation Server 2010 is the ability to create links between work items. While there are several different kinds of links that are built into TFS by default, you’ll often want to create custom link types to streamline your reporting as well as the interface within VSTS.

Advertisements

As a project manager, one of the most powerful features of Team Foundation Server 2010 is the ability to create links between work items.  While there are several different kinds of links that are built into TFS by default, you’ll often want to create custom link types to streamline your reporting as well as the interface within VSTS. 

MS Project integrates tightly with TFS and recognises two different kinds of relationships when you are importing work items from the system.

Hierarchy — This is the tree-based relationship that allows work items to roll up into each other.  In MS project, this is represented by summary tasks with all of the children under it.  By default, TFS uses System.LinkTypes.Hierarchy for this relationship.

Dependency — These are the relationships between work items that sets the order that they need to be accomplished in the plan.  This means that the ordering and resource leveling will follow this map.  By default, TFS uses System.LinkTypes.Dependency for this relationship.

If you need to remap these to your custom LinkTypes, it is a simple change to the FileMapping.XML file in the Template’s Classification Directory.  You simply need to add the following keys to the Mappings in the XML, specifying the names of the LinkTypes you’ve created.

HierarchyLinkType
<HierarchyLinkType LinkType=”MyNamespace.LinkTypes.MyHierarchyLinkType” />

DependencyLinkType
<DependencyLinkType LinkType=”MyNamespace.LinkTypes.MyDependencyLinkType” />

Note that the documentation on MSDN has incorrect syntax and will throw an exception when you try to export to MS Project.

Bug in WebPage Control on TFS Process Editor

The problem comes when using the TFS Process Editor (from the TFS Power Tools) to graphically edit the work item. Whenever you save the Work Item with the Process Editor, it reinterprets the @PortalPage variable breaking the link.

 When building process guidance in Team Foundation Server 2010, I like to put a WebPageControl on a tab that can respond contextually to the work item or even the state that the work item is in.  TFS gives you a really great way to present this, but there is a catch.Below, is the XML from the work item definition that will look to the SharePoint 2010 portal and open an HTML document tied to that Work Item in the Process Guidance document library.  While you might want to keep this flat, I have some additional folder structure there to give me flexibility as I choose to redirect on additional values in the future.  The important part of this is the @PortalPage server variable.  This tells the server to look up the SharePoint site for this project and build this into the URL.

<Control Type=WebpageControl LabelPosition=Top Dock=Fill>
   <WebpageControlOptions AllowScript=false ReloadOnParamChange=false>
      <Link UrlRoot=@portalpage UrlPath=Process%20Guidance/Supporting%20Files/{0}/index.html>
         <Param Index=0 Value=System.WorkItemType Type=Current />
      </Link>
   </WebpageControlOptions>
</Control>
 

This is great and provides an intuitive experience for the users of the TFS template.  The problem comes when using the TFS Process Editor (from the TFS Power Tools) to graphically edit the work item.  Whenever you save the Work Item with the Process Editor, it reinterprets the @PortalPage variable breaking the link.

 <Control Type=WebpageControl LabelPosition=Top Dock=Fill>
   <WebpageControlOptions AllowScript=false ReloadOnParamChange=false>
      <Link UrlRoot=http://portalpage UrlPath=Process%20Guidance/Supporting%20Files/{0}/index.html>
         <Param Index=0 Value=System.WorkItemType Type=Current />
      </Link>
   </WebpageControlOptions>
</Control>

I hope this is something that gets fixed in a future version of the Process Editor, but for now, you must open the WIT file in an XML editor and adjust the value manually.  

Using MS Project 2010 for Software Development Projects – Work vs. Duration

There is a general misconception that Microsoft Project, or many of the commercially-available project management applications, doesn’t work well for software development projects. When talking with Project Managers and Architects, their trouble usually arises from scheduling and resource leveling. They’ve usually been burned by situations where countless hours have been invested into building out a project plan only to find that a minor change skews the plan into an almost unusable mess. There might be many things going on at the same time, but it usually boils down to an understanding of the difference between Duration and Work.

There is a general misconception that Microsoft Project, or many of the commercially-available project management applications, doesn’t work well for software development projects. When talking with Project Managers and Architects, their trouble usually arises from scheduling and resource leveling. They’ve usually been burned by situations where countless hours have been invested into building out a project plan only to find that a minor change skews the plan into an almost unusable mess. There might be many things going on at the same time, but it usually boils down to an understanding of the difference between Duration and Work.

Conceptually, this isn’t a hard distinction. Most people can tell you that Duration is the calendar time it takes to get a task done whereas Work is the number of development hours it takes to complete a given objective. However, in the middle of a project, the distinction is easy to miss especially when the business objectives and the actual project work-stream are expressed in different terms. The business stakeholders want to know when functionality will have to be handed off to other groups for activities like integration testing, security audit, and user acceptance testing – ultimately, when will the functionality be delivered. On the other hand, the development team is used to thinking in development hours and estimated work remaining. All of this is made more complex by the fact that MS Project’s default view is expressed in Duration.

Let’s look at a simple example based on the following Work Breakdown Structure (Not complete, just an example):

WBS Item

Initial Estimate
(Baseline)

Actual Hours

Work
Remaining

Traceability

1.0 – Users can log into the web site (Authorization) and are granted access according to their assigned authorizations 80 Hours 0 hours 80 Hours Use Case 1.0
     1.1 – Develop custom logon control for the upper-right in the Master Page header 20 hours 0 hours 20 hours Use Case 1.0
     1.2 – UX Design for logon flows 8 Hours 0 hours 8 Hours Use Case 1.0
     1.3 – Forgot Password Functionality 20 hours 0 hours 20 hours Use Case 1.0
Alternate Flow (Forgot Password)
          1.3.1 – Forgot Password Challenge Question 12 Hours 0 hours 12 Hours
          1.3.2 – Forgot User Name e-mail process 8 hours 0 hours 8 hours
Other tasks…  

 

I’ve put in a little bit more information into this WBS to prove a point. As you go through your initial planning (at the user story/ product backlog/ use case level) you will be looking at a time-based measurement. Likewise, when you are handing that work into the development queue, the estimates that you will arrive at in your sprint plan will be based on effort. Don’t work if you are using story points at the highest level. These eventually get translated into hours, and should be considered a substitution for the effort measurement.

Pasting this into MS Project 2010, it is tempting to just use the standard Gantt chart, but that leads you directly into the wrong path.

Notice that the column on the default view is Duration – that is, calendar time that is required to get the task done. This works if there is only one person (or resource) working at a time, but this quickly breaks down when you have a team or several teams working on the project at once. Instead, you need to add the Work column to the Gantt chart.

  1. Scroll the tabular view to the right until you can see the ‘Add New Column’ heading
  2. Click on the menu arrow to see the list of available fields
  3. Select the Work Field

Once the actual work is entered for the individual work packages, all items will automatically be marked as Effort Driven and the project plan can be leveled against the resources. Here I have added myself as the resource and have allowed MS Project to rebuild the schedule.

The Final Project Plan

Allocating an additional resource to some tasks in the project (John Gault, in this case) keeps the development effort the same, but MS Project 2010 shortens the overall duration of the project to take advantage of the additional work capacity. With the additional resource, we’ve shaved 2 days of duration off of the project.

Just be sure that when you add additional resources, you tell MS Project how this should affect the task.

Conclusion

With a little planning and attention to work estimates rather than only duration, MS Project can be a huge asset to the software development team. This work/effort-based tracking is vital to effective sprint planning and allows the project team to quickly visualize the workstream against the sprint capacity. This also helps to focus the team on critical path issues that might be missed in a taskboard exercise.

Interviewing Active Directory Engineers

One of the most difficult challenges an IT manager faces is hiring the right people. Finding the right employee is more than just looking for strong performance in a single area, but requires that you consider several different skill domains to not only measure technical ability, but to ensure that the prospective employee is a good fit for your organization.

Know Thyself

Before jumping into a technical interview, it is important to take a step back to look at the environment and team that you will be bringing this candidate. Some questions you might want to ask yourself:

  1. What criteria do I use to describe a successful candidate?
  2. What qualities do I see in my best people that I am looking for in a new employee?
  3. Am I looking for someone to tow the line or be an agent of change? (Think hard and be honest)
  4. How would I describe my corporate culture? Challenges integrating with the team?

The Golden Mean

Once you have taken an honest look at your environment and the role that you are really looking for the new person to fill, it is time to look at specific interview questions that will help you narrow the field. It is always hard to find people who have a high-level of technical competency, but I’ll often have to reject these people for other reasons. It is usually easier to train a quality candidate than it is to hammer someone with amazing book knowledge into the right mold.

There are a couple of key skills that can’t be easily taught and take a significant investment over time to develop:

  • Logical Progression through Fault Isolation – It is important that a candidate be able to troubleshoot a complex situation through methodical fault isolation dealing with one variable at a time and be able to narrow the field of probability in a logical order. Lack of this skill leads to a network engineer that grasps at straws and can’t decide what is important and what is not.
  • Ability to Communicate in Written and Verbal Media – This is of the utmost importance to be able to lead a team and act as a focal point for end-user communication. This comes down to the ability to clearly express ideas and to be able to break problems into common experiences and clear ideas that anyone can understand.
    Lack of this skill leads to alienation of the working team as well as mistrust on the part of the users.
  • Ability to Understand the Business – This doesn’t require an MBA or advanced training, but every employee with system/ enterprise-level responsibility should be able to identify when a situation has the ability to impact the bottom line and how revenue streams are driven through the company. Lack of this skill leads to a mis-prioritization of tasks and a lack of ability to evaluate the risk involved in work items.
  • Ability to Work as a Team – This skill encompasses leadership and appropriately handling team dynamics. Active Directory and Windows Server often form the nexus of many disparate IT systems and will require team integration to serve these different consumers. It is important to be able to recognize politically charged situations and appropriately deal with these situations driving consensus. Lack of this skill leads to team alienation and infighting between teams and team members.
  • Ability to Face Mistakes and Learn from Them – Nothing it more frustrating than dealing with a coworker or employee who does the same thing over and over and expects a different result. Even worse is when the outcome is always blamed on other people. A strong candidate will be able to talk freely about mistakes made as these are often the birthplace of high-level expertise. Lack of this skill is usually expressed through displays of bravado and ego contests. This is a lack of objective self-evaluation. In its worst expression, this leads to an institutional culture of blame and lying to cover missteps.
Counter-Example: Hiring by Skill Alone
—————————————————–

About three years ago, I was looking for an engineer that had both high-end Active Directory skills and moderate-level Cisco infrastructure skills for a client I had in the Chicago metro area. On the resume and in the technical interview the candidate hit all of the technical requirements out of the park. He was able to answer all of my questions and was even able to push back to read into the questions and challenge the suppositions behind the relatively simple scenarios posed by the questions.

At the time, I was having a hard time finding qualified technical people to put into a client who was already contracting for the FTE – needless to say, I really wanted a qualified body to put into that role. This eagerness caused me to gloss over some of my non-technical questions and ignore minor idiosyncrasies that I was picking up on in the interview.

This misstep on my part led to a tumultuous working relationship that disrupted the rest of the team and caused us to lose weeks of work time to deal with personality conflicts.

This eventually came to a head with the engineer refusing to take criticism on his work and refusing to be held to the same standards as the rest of team. In a morning progress meeting where we were reviewing status and needs, the employee got so upset at having his work reviewed that he lost control of himself and even asked me, his manager, to leave the room so he didn’t have to punch me!

Needless to say, that was his last day.

Active Directory Interview Questions

Situational Questions

  1. What was the hardest technical challenge you’ve ever faced? How did you overcome that?
  2. Describe a situation where it was you against the technology and the technology won? How did you handle the communication during the crisis?
  3. What do you consider your best technical strength? How do you use that to give back to the community to mentor new people?
  4. Talk about a project with which you’ve been involved that was abandoned/ cancelled before it was completed? What went into making that decision? What positive things came out of the project in spite of its overall failure?

Team Dynamics

  1. What things to you do on a day-to-day basis to maintain your skills?
  2. What skills do you think most AD/ Windows engineers lack? How would you help them get up to speed?
  3. How would you describe your learning style? How do you learn best from experiences? Other coworkers?
  4. Who was the most frustrating person you’ve ever had to work with? What made that so painful? How did you help that person become a better team member?
  5. If a coworker were to tell you that you were at the top of their list for most frustrating team mate, how would you address that? What would they probably see as your worst trait?
  6. The last time your manager made you angry, how did you bring that to a positive conclusion?
  7. How would you describe your ideal team environment? What would make it rewarding for you?
  8. Describe your best day at work? What is the one thing you could do to make your life better?

Active Directory (Yes, finally)

  1. What are the 5 FSMO (Fizz-mō) roles and what do they do?
    (A lot of interviews will skip a question like this that will spark a long explanation on the part of the interviewee, but it is important to ask as this will demonstrate attention to detail, handling several things at once, and handling of the core of AD. I’ve found that 75% of MCSEs can’t answer this question and can often hurt the interview so much that this is the only technical question I’ll get to.)
    1. Primary Domain Controller Emulator (PDCe) – This role handles ubiquity of password changes and some GPO operations. This also participates as a PDC in NT4 Domains.
    2. Schema Master (SM) – Acts as the single point of authority in maintaining the AD Schema
    3. Infrastructure Master (IM) – This is responsible for the mapping of GUUIDs to objects across the domains. This role maintains object references.
    4. Domain Naming Master (DNM) – This is responsible for the addition, removal, and management of domains in the forest.
    5. Relative ID Master (RID Master) – This role is responsible for the assignment of unique SIDs to objects created in the domain.
  2. Which FSMO roles are Forest-specific?
    1. There is only on Schema in a forest. This is managed by the Schema Master
    2. The Domain Naming Master controls domain declarations and object identity (uniqueness) in a forest as a whole.
  3. What is the role of the Global Catalog (GC)? Why is this not a FSMO Role?
    1. The GC holds a flat/ denormalized version of the Active Directory that can be more quickly searched and accessed without having to traverse the hierarchical tree
    2. Since any DC can hold a copy of the GC, this is not a Single Master in the FSMO sense. This is an important piece of the AD system, though.
  4. Which role is no longer used if all Domain Controllers are holders of the GC?
    1. The Infrastructure Master is not used if all DCs are GCs
    2. Normally the IM and the GC should not be on the same server, but this is not the case if all DCs also have the GC
    3. In smaller networks (under a dozen DCs, usually all DCs should be GCs and ADi DNS holders)
  5. What is the role of the PDC Emulator (PDCe) if you are in 2003 native mode and all of your clients are running Windows XP?
    1. The PDCe is required for GPO and Password Updates
    2. Unavailability of this service results in long logon times and policy application problems
    3. Time services are also bound here
    4. If the answer is that this service is only for backwards compatibility, this is a HUGE flag that the candidate only has book knowledge!
  6. Talk about the role of DNS in the Active Directory
    1. DNS is the backbone of AD as all resource location is done against the DNS
    2. DNS helps to shape all communication patterns and is integral to replication
    3. DNS also supports the location of servers and maps to the AD topology of the domain
    4. 80% of AD problems can be traced DNS problems
  7. How would you repopulate the SRV records if you needed to update them?
    1. Restart the NETLOGON service
    2. You can also reboot the server, but this isn’t the preferred method as it requires downtime.
  8. Why would one choose AD integrated (ADi) DNS over standard DNS or BIND?
    1. ADi DNS uses the replication strategy of the AD itself providing a better replication topology and a differential replication of changes rather than whole zone transfers
    2. This is the most reliable DNS to support all of the features of Active Directory
  9. How would you check the health of your domain?
    1. You can use a combination of DCDIAG, NetDIAG and REPLMON
    2. Event logs can also be used, but these will not give deep knowledge into the AD’s innerworkings
  10. What tools would you use to check replication?
    1. REPLMON in the Server Support Tools is the best tool
    2. You can also force replication in AD Sites& Services
  11. What does it mean when an object is tombstoned?
    1. This means that the object hasn’t replicated in a long time and the tombstone threshold was exceeded. (60 days in W2K and 120 in W2K3)
    2. This object gets sent to garbage collection
    3. This usually happens when a DC is offline or cannot replicate for a long time. The DC has to be rebuilt to rejoin the domain.
  12. How would you remove a failed Domain Controller from the domain if it couldn’t be demoted gracefully?
    1. DCPROMO /forceremove
    2. Delete from AD Sites and Services
    3. Delete the DNS records
    4. Perform a metadata cleanup (NTDSUtil)
  13. What are the three objects to which you can attach GPOs?
    1. Domain
    2. Organizational Unit (OU)
    3. Site – this is the least common
  14. If your domain is in Windows 2003 Native mode and your clients are all Vista/ XP, when would you run WINS?
    1. When you cross subnet boundaries it is still necessary to run WINS. Otherwise the workstations will still rely on NetBIOS elections to populate the master browser. This will also provide a list in My Network Places/ Computers Near Me that cross the subnet boundaries providing a full list.
  15. What does DHCP do in your network? How does a workstation find a DHCP server?
    1. DHCP hands out IP addresses and network configuration information to workstations and hosts.
    2. A workstation will broadcast on the network to find a server and will respond to whatever server responds first
    3. If there isn’t one on the local broadcast domain, a router or layer-3 switch can direct the server request using an IP Helper-Address to cross domains.
  16. What is the difference between an Authoritative and non-Authoritative AD restore?
    1. Authoritative takes control of the AD and overwrites the current version with records of a timestamp. Non-authoritative will allow newer records to be replicated on top of the restored records.
  17. Under what circumstances would you need a new domain? A new Forest?
    1. New Domain
      1. In the case of needing a new domain security policy or password policy (not Server 2008 though)
      2. If you need a soft security boundary for administration
      3. If you need a separate IPSec policy
      4. Crossing geopolitical boundaries
    2. New Forest
      1. If you need to split Namespace (mycompany.com vs. yourcompany.com)
      2. If you need a hard security boundary without an implicit trust
  18. What tools would you use to join two domains together?
    1. Active Directory Migration Tool (ADMT)
    2. 3rd party tools are available like the Qwest tools
  19. How would you copy files from one server to another and keep the NTFS permissions?
    1. Use xcopy or xcopy.vbs
    2. Use a 3rd party tool like robocopy
  20. What commands would you put in a logon script to map the W: drive to the Home Directory?
    1. Net use W: /home

Conclusion

There are a number of considerations that should be taken when looking for new Active Directory engineers, both on the soft skills as well as the technical. Of course, a list like this can never be complete and would have to be tailored on a case-by-case basis, but should act as a guide for the careful interviewer.

High-Availability DHCP

The Dynamic Host Configuration Protocol (DHCP) is at the core of almost every enterprise network forming a mission-critical service that is the cornerstone of reliable and simplifies workstation management. Yet, this service is usually configured as a single point of failure with little thought to high availability or disaster recovery. While longer lease durations are often used to provide some cushion in the event of unavailability, this should not be used as the only protection against system failure. Even in modestly-sized environment, the volume of lease requests and the risk associated with the failure of DHCP beg for a better solution.

Technology Overview

What IS DHCP?

When TCP/IP-based networks were first being developed and used, all network configuration was done manually. IP addresses, subnets and the like were all configured on the individual hosts and had to be manually tracked and documented. This worked for small businesses and controlled environments where constant attention by a single network administrator could be provided or where very tight controls on documentation could be achieved. As these networks grew and larger enterprises adopted the technology, the management burden grew to the point of being unmanageable – a centralized, enterprise solution is needed.

The Dynamic Host Configuration Protocol is the answer to this problem. DHCP provides a service to hosts on an enterprise network that allows the hosts to request and receive IP Addresses, subnet information, and other important configuration information. This also acts as a single enterprise database for this information where a network administrator can access all configuration assignments from one place.

The Role of DHCP in the Enterprise

In a large network, this database and the configuration tools surrounding the service act as a single point of management for all of these data. This means that wide-sweeping configuration changes can be controlled and effected from a single location further reducing the management burden on the network administration staff. Imagine trying to make a simple subnet change on 10,000 workstations by hand!

As an environment grows, the load on the DHCP servers will linearly increase and the addition of branch office sites can create the need for additional DHCP servers to manage these sites. While this handles the load, it introduces additional single points of failure to the enterprise.

Finding DHCP Servers and Scopes

When a workstation needs to find a DHCP server, it will send broadcast packets onto the network advertizing its need for a DHCP server. Any DHCP servers that hear this message will respond advertizing their ability to provide configuration information. The first acknowledgement that the Host receives will be the server that the workstation will use for DCHP in this transaction – all following responses from other DHCP servers are ignored. This is important, as we’ll be exploiting this behavior to provide highly-available DHCP service to the workstations.

For more information on the inner workings of the DHCP Protocol see:

IP Helper-Address

There are some limitations though. Since workstations that are requesting configuration information do not yet have valid IP addresses, they have to rely on network broadcasts to find a DHCP server, request an address, and secure a lease. This means that all of these communications occur on Layer-2 of the network and are not routable – at least not without some help from the network infrastructure. Most enterprise-class routers and switches can forward DHCP packets directly to a specified server in another network or subnet. This is called an IP Helper, or just helper address.

In the Cisco product line, this is specified at the network or subnet level as:

    IP Helper-address <Destination IP Address>

The NAK Poisoning Issue

When designing redundant or highly-available DHCP configurations, there is one additional hurdle that we must overcome – the problem of Negative Acknowledgement (NAK) Poisoning. The goal of installing additional DHCP servers is to provide uninterrupted service with a minimum of management overhead. It may seem intuitive that one would just want to configure a second DHCP server and give it a new scope so that workstations would be able to receive configuration information from either server. Unfortunately, this will result in occasional “quirkiness” where some workstations will not be assigned addresses when they try to renew. This leads to frustrating and difficult to track network problems.

When 50% of the license duration is up, the host will attempt to contact the server from which it originally received the license by renewing. If this is not available or unresponsive, the host will try again later. If 87.5% of the lease duration has expired, the host will broadcast on the network again and will bind to any DHCP server that responds to the request (rebinding). If this server cannot renew the lease, the workstation can end up in a state where is does not have a valid address and cannot participate in network communications.

Let’s look at what happens (translated into English for your enjoyment):

(Client) “Hey DHCP Server, I need to renew my address. 87.5% of my lease time is up.” (Rebinding)
(Server) “I can certainly do that for you. What address would you like to renew.”
(Client) “Well, I have been using 10.1.1.53/24. I’d like to keep using that one.”
(Server) “Hmm, I am authoritative for 10.1.2.0/24. Your address is not in my network. I can’t renew that one.” (This is the NAK)
(Client) “I guess I can’t get an address…”

At this point, the workstation is a bit lost as to what to do. It will continue to use its IP address until the lease is up and will try to renew again later. In some cases, the workstation will find the original DHCP server before the address lease expires, but in some cases, the above communication will continue and the workstation will eventually stop communication when it no longer has a valid address.

Avoiding the NAK

The best way to overcome this is to avoid it altogether. Your DHCP servers don’t have to be able to provide addresses for all scopes, but they should be aware of the scopes so that they can serve those clients. To do this reciprocal exclusions should be used to divide the scope into pieces – let’s look at a 50/50 split for the 10.1.1.1/24 subnet as an example:

 101909_0229_HighAvailab1.png

Now the communication looks like this:

(Client) “Hey DHCP Server, I need to renew my address. 87.5% of my lease time is up.”
(Server 2) “I can certainly do that for you. What address would you like to renew.”
(Client) “Well, I have been using 10.1.1.53/24. I’d like to keep using that one.”
(Server 2) “Hmm, I am authoritative for 10.1.0/24, but I have an exclusion on that address. How about 10.1.1.222?”
(Client) “Thanks! 10.1.1.222 it is.

High-Availability Scenarios

Now we have all the pieces to pull together effective high-availability DHCP solutions to serve our enterprise. Just as there are a number of different network configurations, there are different configurations of DHCP that can be deployed to support them. It should also be noted that that using MSCS clusters to support DHCP isn’t an ideal solution as this tends to work inconsistently and is expensive.

Centralized/ Redundant DHCP

Centralizing DHCP can provide a single point of management for all of your workstation configuration changes and will allow a tightly-focused network administration staff control the whole environment from one point. This kind of configuration works best when you have a single large site or when you are willing to accept the single points of failure associated with your WAN links. Generally, though, this is usually done to provide on-segment redundancy for a single DHCP server.

On a network with only one subnet or router, you will be able to rely on the local network broadcasts to associate servers with workstations – whichever one happens to respond the fastest will become the DHCP server for that request. If you have multiple sites or subnets/ VLANs, you’ll need IP Helper Addresses pointing at both of the servers. Depending on the network hardware, you may find that the order that these are listed has an effect on the balance between the servers. (I have not found this to be the case on Cisco gear though.) In this configuration, you will want to start with a 50/50 split in your exclusions. Over time, you may find that you’ll have to adjust this split to compensate for differences in network speeds and hardware capacity.

Setup Steps:

  1. Plan and diagram out your scopes. You generally want to plan for at least a 50% growth margin for DHCP to accommodate network growth as well as a long-term outage of one of your DHCP servers.
  2. Configure all scopes on both servers
  3. Configure 50/50 reciprocal exclusions on both servers
  4. Configure any manual reservations on both servers
  5. Test failover by disabling DHCP on each server and forcing a renew on the client (IPConfig /release | IPConfig /renew)

Distributed DHCP101909_0229_HighAvailab2.png

In larger environments with many sites, it is important to provide local DHCP services to clients for immediate response to lease requests, but also to provide a centralized backup that is able to server requests in the event that there is a problem with the local server. This removes the risk associated with relying on the WAN links for a mission-important service like DHCP.

In this scenario, we will be relying on the fact that the WAN link is much slower than the on-segment network. The router/ switch must be configured with an IP helper to route the DHCP request to the centralized server, but the time needed to make this round trip will be significantly longer than the time needed to serve the request on the local network.

The sample scenario to the right is a three-site scenario comprised as a main HQ site and two branch offices. In this configuration, the scopes have been configured in an 80/20 distribution with 80% of the available IP address leases residing on the local network and 20% across the WAN as failover. It should also be noted that since the backup DHCP server is the second local server at the HQ site, it is splitting the DHCP load 50/50.

Often, network engineers will choose to only have a single DHCP server per site, but having a separate server at the main site will allow the load to be controlled and avoids a single point of failure at the HQ site. This may seem a bit complicated, but is relatively simple to configure and allow you to build all additional sites against a common design pattern.

Finally, you should let the network design and WAN connectivity act as a guide for the designing of highly available DHCP configurations. If you are set up as a hub and spoke configuration, the solution will be slightly different that in you have a few main sites with spokes off of each. Just make sure that you are taking the entire topology into consideration as you plan the final solution.

Setup Steps:

  1. Plan and diagram out your scopes. You generally want to plan for at least a 50% growth margin for DHCP to accommodate network growth as well as a long-term outage of one of your DHCP servers.
  2. Make sure you understand your network topology and plan your DHCP setup to avoid awkward network hops.
  3. Configure the Primary DHCP servers for each site
  4. Configure all scopes the backup DHCP server
  5. Configure 80/20 reciprocal exclusions between the Branch Office DHCP servers and the Backup server.
  6. Configure any manual reservations on both servers
  7. Configure IP Helper-address commands on the routers/ switches at the branch offices.
  8. Test failover by disabling DHCP on each server and forcing a renew on the client (IPConfig /release | IPConfig /renew)

 

Conclusion

Configuring highly-available DHCP solutions is not complicated or tricky if you understand the technology and plan using your network topology as a guide. Whether it is just configuring a single site failover for DHCP or an enterprise-scoped failover topology, using reciprocal scopes and IP helper-addresses will ensure that DHCP services are always available. One last note, be sure that you are monitoring your DHCP services. Even with HA servers backing up your primary scopes, you want to be sure that you are notified of server problems when then happen so you can react to them quickly. The best failover scenario is one that you never have to use.

Other Links: