As a fan of debate, I’ll start with points that are interesting but have no real bearing on the topic.
- Slashdotters are clearly not qualified to make this assessment. Their Appeal to Authority fails.
- Microsoft wouldn’t issue an advisory and Fix-It if it weren’t relevant. My Appeal to Authority is better than theirs. ;-)
- The existence of a newer protocol, Kerberos, does not make NTLM simply disappear.
- Wikipedia actually doesn’t say NTLM is long dead. Wikipedia as an appeal to authority is a joke. I link to it regularly, not for its completeness, but because it is written for a layman audience. It is a great place to start if you don’t know something.
- I’m not a Linux fanboy looking to disgrace MS. I’m a long time MCSE and even gave MS some props in my post.
- Finally, my work is what it is, probably the last nail, of hundreds, in the coffin. I make no claim to it being inspired by God.
Now to the relevant details... When MS introduced Active Directory in Windows 2000, they implemented Kerberos 5 as the default authentication protocol FOR DOMAIN ACCOUNTS. This is a pretty important requirement. If a machine is not domain joined, or the account is not a domain account, Kerberos is not an option. The upside here is that when machines are in workgroups, it is much less likely that the accounts will have any sort of value off of the host. However, this does not stop the SSPI from trying to authenticate using your cached account credentials when accessing resources that are not on the host. This means a workgroup host could still be vulnerable to my “Send the Hash” attack if not properly configured.
Even for machines that are domain joined, while Kerberos is the default, NTLM is used in several situations:
- If the service is not Kerberos enabled (Kerberized). Maybe it runs on an NT server?
- The service/server does not have a Service Principal Name (SPN) registered.
- The service/server has duplicate SPNs registered.
- When accessing the system by IP rather than name.
- Improperly built clusters.
- 3rd party system implemented incorrectly.
- When accessing data across forests, using an older domain type trust.
- When the client can’t access a KDC/DC, such as when it is outside the firewall.
- When the KDC/DC is behind NAT.
Before getting in depth with a couple of these cases, I’ll make a generalization, “Kerberos is very hard to get right, except under simple conditions.”
Outside the Firewall or Behind NAT
MS’s implementation of Kerberos requires that the clients, servers, and KDC/DCs all be on the same routed network with AD integrated DNS or DDNS that allows the DCs to register SRV records. The clients must be able to find and access the KDCs to get Kerberos tickets. I am not going to cover all the details of Kerberos, but this is a key difference. With NTLM, the server you want to access does the job of finding a DC and getting the DC to validate the challenge/response after your client has done its handshake. The resource server passes the challenge and response to the DC over RPC using packet privacy and gets back a pass/fail and a list of group memberships which it uses to build the user’s access token. This is super simple and easy when the client is outside your firewall. You only need to open one port, the application's port.
If you intend to make Windows Kerberos work across NAT or behind a firewall, prepare for pain. Each Windows client has a component called the dcLocator. Its exact operations vary slightly from version to version of Windows. You might think you just need to open up TCP88 to a KDC and you’re set. You might get a pony in the mail too.
I’ll blog on the exact details at some point, but the dcLocator first needs to find the KDC DNS SRV records in the _msdcs.domainname.org zone. Right off the bat, this means that you need split DNS, as the answers inside your firewall will not be the same IP as outside. Once you have your external DNS zone and main DC records, the client will ping all the DCs and select the fastest to respond. If ICMP is blocked, nothing proceeds. The client sends a CLDAP query to the fastest DC. This is connectionless LDAP over 389 UDP. This query is to ask which AD site the client is a member of. This query is not answered in a traditional way, based on the filter. Instead, AD uses the source IP to map the IP to an AD subnet which maps to an AD site, which the LDAP search response will contain. If your client is behind NAT, then the source IP will likely be a SNAT address. From this response, the dcLocator then does a second DNS SRV query to get the DCs that are in the AD site returned from the CLDAP query. The dcLocator then pings each of those DCs and the first to respond is queried and if the response is satisfactory, then this becomes the default DC for a period of time. This time can vary by OS version. Now we are ready to do Kerberos. Some versions of Windows try UDP88 first and then when they get back the “response too large” they try TCP88 route. If UDP is blocked, these versions may not try TCP88 even if it is open. I will not be swearing to this in court as it has been over a year since I configured this type of scenario and I am writing this without a net.
This means, that for Kerberos to work outside the firewall or behind NAT, you need to:
- Setup Split DNS
- Create at least one domain level SRV pointing to the external IP address
- Create a site DNS SRV record for EVERY DC in the default site, pointing to the external IP address.
- Open port 389 UDP
- Open port 88 UDP
- Open Port 88 TCP
- Open ICMP
NTLM is a lot easier to use in both NAT and outside the firewall scenarios.
Messed up SPN Scenarios
Connection by IP Address