Mark R. Gamache's Random Blog: 2010

Or Configuring the F5 BigIP to shoot you in the foot.

Ever have bizarre connection problems when a client hits a Big-IP, but not when it hits the pool members directly? Is it happening to you right now and you don’t even know it? If your application is not stateless, this post may help you a lot. If nothing else, it covers some fundamentals that we've all let rust. Also, just because you aren't seeing the problem doesn't mean it is not there…

For years, we have seen a few LDAP connection problems that could not be explained, but the volume was so low that no one was willing to take the time to work with us to find out what was going on. With the addition of some new high volume systems, the problem has become more frequent and had greater impact, probably due to poor LDAP exception handling. Our service is an LDAP with approximately 20,000,000 million objects in it, each with around 28 attributes. We service 17 to 20 million LDAP requests a day. This an average of 231 LDAP request per second, if you do the math, but as most hours are slow, our peak is closer to 600 per second. This service is actually 3 LDAP servers fronted by a single Big-IP HA pair.

Before we dig in, the F5 Big-IP or Local Traffic Manager (LTM) is a modern marvel. The people who build these things and make the “magic” they do are amazing. They can solve some really complex availability and scaling challenges. Like most marvels, when applied wrong, they may not help, or even do harm. Try taking 20,000 grams of Amoxicillin for a headache. :-P

The Big-IP was originally created to load balance traffic to web sites that take very high volumes of traffic. If you have gotten this far, you probably already know that the Big-IP is optimized to spread the traffic to a pool of servers so that they all get a portion of the traffic. The tricky part is that there are hundreds of levers, knobs and switches, most of which default to the most common use case, HTTP traffic.

While HTTP uses TCP as its transport, HTTP IS STATELESS. HTTP does not care if you have already opened a TCP socket or not. It doesn’t care if the socket the browser uses is dead. That is, your web browser is coded to deal with every possible TCP use case fast and efficiently. On the other hand, some Application Layer Protocols, such as LDAP, are stateful and use the connection oriented nature of TCP to maintain state. This is at the heart of the problems I saw last week (and ongoing for years…).

LDAP, in the most common use case, requires a client connection to bind to the server using a username and password. The entire session is authenticated in that user’s context and all reads and writes are filtered through an access control list in the directory. Just a reminded, an LDAP server is a TCP listener that translates the LDAP protocol to an x.500 directory. An LDAP server is not a directory in and of itself, it is just a port and a set of communications rules optimized for access to an x.500 directory.

Here is a typical LDAP session with 192.168.0.15 as the server:

1 14:48:40.484934 192.168.0.20 192.168.0.15 TCP 12403 > ldap [SYN] Seq=4167455526 Win=65535 Len=0 MSS=1260

2 14:48:40.523621 192.168.0.15 192.168.0.20 TCP ldap > 12403 [SYN, ACK] Seq=4126863585 Ack=4167455527 Win=3780 Len=0

3 14:48:40.523683 192.168.0.20 192.168.0.15 TCP 12403 > ldap [ACK] Seq=4167455527 Ack=4126863586 Win=65535 Len=0

4 14:48:40.603744 192.168.0.20 192.168.0.15 LDAP bindRequest(1) "mgamach@testldap.org" simple

5 14:48:40.642792 192.168.0.15 192.168.0.20 LDAP bindResponse(1) success

6 14:48:40.649008 192.168.0.20 192.168.0.15 LDAP searchRequest(2) "DC=testldap,DC=org" baseObject

7 14:48:40.675407 192.168.0.15 192.168.0.20 LDAP searchResEntry(2) "DC=testldap,DC=org" | searchResDone(2) success

8 14:49:12.076343 192.168.0.20 192.168.0.15 LDAP unbindRequest(8)

9 14:49:12.076718 192.168.0.20 192.168.0.15 TCP 12403 > ldap [FIN, ACK] Seq=4167456246 Ack=4127490647 Len=0

10 14:49:12.100687 192.168.0.15 192.168.0.20 TCP ldap > 12403 [FIN, ACK] Seq=4127490647 Ack=4167456246 Len=0

11 14:49:12.100768 192.168.0.20 192.168.0.15 TCP 12403 > ldap [ACK] Seq=4167456247 Ack=4127490648 Len=0

12 14:49:12.105203 192.168.0.15 192.168.0.20 TCP ldap > 12403 [FIN, ACK] Seq=4127490647 Ack=4167456247 Len=0

The things worth noting are that A) the bind can’t occur until after the TCP 3-way handshake, B) when the client sends the unbind the server responds with a FIN, starting the TCP session tear-down, and C) this is one TCP session for the whole LDAP session. An LDAP client can run multiple LDAP session at once, in which case, each would have its own ephemeral port. While some LDAP server and client code can perform multiple asynchronous calls inside one TCP session, it is not very common and often not done well, but he client code developers. By far, the most common method is to treat each LDAP session as a synchronous session, request, then reply then request, then reply and so on until the unbind.

If the TCP session fails for any reason, the LDAP session is gone, there is no way to revive it or reconnect to it. Hopefully the client app and underlying systems are written well enough to deal with a failure and retry in a well considered manner. This is often not the case; consider the Java bug that does not inform the LDAP stack of a TCP reset for 15 seconds. That is one very painful timeout if a customer is waiting.

Back to the problem that we were seeing… At an application level our internal customers were seeing various connection errors in their logs. Different OSs and JVM versions seemed to throw slightly different errors, adding a bit to the confusion. It took quite some time to get the data we needed to diagnose the issue. We needed a packet capture on both sides of the BigIP as well as time stamped failure logs and IPs from the apps. With such high volume traffic, trying to find an error without somewhere to start is like trying to find one special needle in a needle stack, while wearing someone else’s bifocals.

Having the data in hand and using several WireShark tricks that I will blog on later, I found the root cause. The Big-IP config we were given was not vetted for awesome. We assumed that the networking team that creates the VIP and configures the pool analyzed our app and set up a config to optimize our performance. In fact, the config seemed a bit random.

We had already visited the config once as the F5 was introducing a 13x delay in response times. This turned out to be a TCP Profile setting that was set to WAN optimized. The settings in question optimized for bandwidth savings, over speed. Ooops! The culprit that we also fixed, prior to this, was that the TPC Idle timeout on the F5 did not match that of our LDAP servers. The F5 was set to close idle threads after 5 minutes, while the LDAP servers allow 2 hours. Finally, there isn’t just one TCP profile; there is one server (S-NAT) side and one client (VIP) side. Ours had different idle timeouts, leaving the F5 stuck between two parties who expected the session to be held open for different amounts of time. With those finds and fixes in place, we still see about a .5% failure rates, which is totally unacceptable with such a high volume service. That is still an average or more that once per second.

Here’s the last bit of fundamentals needed to understand why we were still seeing failures. There was only one IP in our F5 S-NAT pool, this means that it could, theoretically, have 65536 simulations connections to our pool members. However, the default setting for the use of those ports does not force the use of them all. The F5 setting TM.PreserveClientPort, by default has the S-NAT IP use the same ephemeral port as the upstream side of the conversation; meaning the client to VIP. The laws of TCP/IP say that the S-NAT IP can only make one connection on one port. This means if two of our clients make calls using the same ephemeral port, at the same time, the one with the established connection stays live and the other gets a nice TCP reset in its Christmas stocking. This is a bit more confusing for the app because the F5 accepts the VIP side TCP handshake while it is still getting ready to try the S-NAT side connection. Only then does it realize that the port is taken and it can’t connect. Once that happens, the F5 sends and unexpected TCP reset to the client when it is expecting to be able to do an LDAP bind.

At this point, you are thinking, “but Mark, what are the odds of two of your clients using the same ephemeral port at the same time? They must be crazy low”. Well, there’s more. Different OSs limit the number of possible ephemeral ports that can be used. BSD for instance used to only use 1024 to 3976. In our case most of our clients are on Solaris 10, which allows anything above 32768. We have around 20 clients that pseudo-randomly pick a port between 32768 and 65536. This is actually a probability problem exactly like the Birthday Problem. We just have 20 people (servers) with 32768 days (ports) to choose from instead of 365. We are looking to know what the probability is of any 2 of the servers using the same port. I will ignore the complexity of adding time as a factor. That is, some clients may hold a port open for longer than others, depending on the operations being performed. If we just assume that all the machines connect once using a random port, then our odds come up as .6% which is right next to our observed impact. As we will see in a minute, this number may or may not make sense. It is really just to make the point that it is a lot more likely to have a collision, than might seem intuitive.

While there are still a couple more factors to look at, let’s look at a failure in action.

1 0.000000 192.168.113.59 192.168.0.10 TCP 61325 > ldap [SYN] Seq=2258030186 Win=32768 Len=0 MSS=1460 WS=0

2 0.000005 192.168.0.10 192.168.113.59 TCP ldap > 61325 [SYN, ACK] Seq=2422958290 Ack=2258030187 Win=4380 Len=0 MSS=1460 WS=0

3 0.042001 192.168.113.59 192.168.0.10 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

4 0.042010 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [SYN] Seq=4287903691 Win=4380 Len=0 MSS=1460 WS=0

5 0.042013 192.168.0.10 192.168.113.59 TCP ldap > 61325 [ACK] Seq=2422958291 Ack=2258030284 Win=4477 Len=0

6 0.042734 10.99.26.25 10.99.248.20 TCP ovsessionmgr > 61325 [SYN, ACK] Seq=1298286822 Ack=4287903692 Win=49640 Len=0 MSS=1460 WS=0

7 0.042738 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [ACK] Seq=4287903692 Ack=1298286823 Win=4380 Len=0

8 0.042741 10.99.248.20 10.99.26.25 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

9 0.043455 10.99.26.25 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=1298286823 Ack=4287903789 Win=49543 Len=0

10 0.043699 10.99.26.25 10.99.248.20 LDAP bindResponse(1) success

11 0.043703 192.168.0.10 192.168.113.59 LDAP bindResponse(1) success

12 0.043706 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [ACK] Seq=4287903789 Ack=1298286837 Win=4394 Len=0

13 0.088130 192.168.113.59 192.168.0.10 LDAP searchRequest(2) "mobile=2065551234,ou=subscribers,c=us,dc=testLDAP,dc=com" baseObject

14 0.088135 10.99.248.20 10.99.26.25 LDAP searchRequest(2) "mobile=2065551234,ou=subscribers,c=us,dc=testLDAP,dc=com" baseObject

15 0.088138 192.168.0.10 192.168.113.59 TCP ldap > 61325 [ACK] Seq=2422958305 Ack=2258030416 Win=4609 Len=0

16 0.088866 10.99.26.25 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=1298286837 Ack=4287903921 Win=49640 Len=0

17 0.089591 10.99.26.25 10.99.248.20 LDAP searchResEntry(2) "mobile=2065551234,ou=subscribers,c=us,dc=testLDAP,dc=com"

18 0.089595 192.168.0.10 192.168.113.59 LDAP searchResEntry(2) "mobile=2065551234,ou=subscribers,c=us,dc=testLDAP,dc=com"

19 0.089598 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [ACK] Seq=4287903921 Ack=1298286926 Win=4483 Len=0

20 0.089600 10.99.26.25 10.99.248.20 LDAP searchResDone(2) success

21 0.089602 192.168.0.10 192.168.113.59 LDAP searchResDone(2) success

22 0.089605 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [ACK] Seq=4287903921 Ack=1298286940 Win=4497 Len=0

23 0.130614 192.168.113.59 192.168.0.10 TCP 61325 > ldap [ACK] Seq=2258030416 Ack=2422958408 Win=32768 Len=0

24 0.130861 192.168.113.59 192.168.0.10 LDAP unbindRequest(3)

25 0.130865 10.99.248.20 10.99.26.25 LDAP unbindRequest(3)

26 0.130867 192.168.0.10 192.168.113.59 TCP ldap > 61325 [ACK] Seq=2422958408 Ack=2258030452 Win=4645 Len=0

27 0.131347 192.168.113.59 192.168.0.10 TCP 61325 > ldap [FIN, ACK] Seq=2258030452 Ack=2422958408 Win=32768 Len=0

28 0.131351 192.168.0.10 192.168.113.59 TCP ldap > 61325 [ACK] Seq=2422958408 Ack=2258030453 Win=4645 Len=0

29 0.131354 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [FIN, ACK] Seq=4287903957 Ack=1298286940 Win=4497 Len=0

30 0.131356 10.99.26.25 10.99.248.20 TCP ovsessionmgr > 61325 [FIN, ACK] Seq=1298286940 Ack=4287903957 Win=49640 Len=0

31 0.131359 10.99.248.20 10.99.26.25 TCP 61325 > ovsessionmgr [FIN, ACK] Seq=4287903957 Ack=1298286941 Win=4497 Len=0

32 0.131362 192.168.0.10 192.168.113.59 TCP ldap > 61325 [FIN, ACK] Seq=2422958408 Ack=2258030453 Win=4645 Len=0

33 0.132083 10.99.26.25 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=1298286941 Ack=4287903958 Win=49640 Len=0

34 0.132088 10.99.26.25 10.99.248.20 TCP [TCP Dup ACK 33#1] ovsessionmgr > 61325 [ACK] Seq=1298286941 Ack=4287903958 Win=49640 Len=0

35 0.172359 192.168.113.59 192.168.0.10 TCP 61325 > ldap [ACK] Seq=2258030453 Ack=2422958409 Win=32768 Len=0

36 56.717368 192.168.113.58 192.168.0.10 TCP 61325 > ldap [SYN] Seq=3620104483 Win=32768 Len=0 MSS=1460 WS=0

37 56.717374 192.168.0.10 192.168.113.58 TCP ldap > 61325 [SYN, ACK] Seq=3332499193 Ack=3620104484 Win=4380 Len=0 MSS=1460 WS=0

38 56.758628 192.168.113.58 192.168.0.10 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

39 56.758638 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [SYN] Seq=258910960 Win=4380 Len=0 MSS=1460 WS=0

40 56.758640 192.168.0.10 192.168.113.58 TCP ldap > 61325 [ACK] Seq=3332499194 Ack=3620104581 Win=4477 Len=0

41 56.759124 10.99.26.22 10.99.248.20 TCP ovsessionmgr > 61325 [SYN, ACK] Seq=3696686467 Ack=258910961 Win=49640 Len=0 MSS=1460 WS=0

42 56.759128 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [ACK] Seq=258910961 Ack=3696686468 Win=4380 Len=0

43 56.759131 10.99.248.20 10.99.26.22 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

44 56.759605 10.99.26.22 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=3696686468 Ack=258911058 Win=49543 Len=0

45 56.760583 10.99.26.22 10.99.248.20 LDAP bindResponse(1) success

46 56.760588 192.168.0.10 192.168.113.58 LDAP bindResponse(1) success

47 56.760591 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [ACK] Seq=258911058 Ack=3696686482 Win=4394 Len=0

48 56.802823 192.168.113.58 192.168.0.10 LDAP modifyRequest(2) "mobile=4255552345,ou=subscribers,c=us,dc=testLDAP,dc=com"

49 56.802830 10.99.248.20 10.99.26.22 LDAP modifyRequest(2) "mobile=4255552345,ou=subscribers,c=us,dc=testLDAP,dc=com"

50 56.802833 192.168.0.10 192.168.113.58 TCP ldap > 61325 [ACK] Seq=3332499208 Ack=3620104727 Win=4623 Len=0

51 56.803309 10.99.26.22 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=3696686482 Ack=258911204 Win=49640 Len=0

52 56.814303 10.99.26.22 10.99.248.20 LDAP modifyResponse(2) success

53 56.814309 192.168.0.10 192.168.113.58 LDAP modifyResponse(2) success

54 56.814312 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [ACK] Seq=258911204 Ack=3696686496 Win=4408 Len=0

55 56.856552 192.168.113.58 192.168.0.10 LDAP unbindRequest(3)

56 56.856557 10.99.248.20 10.99.26.22 LDAP unbindRequest(3)

57 56.856560 192.168.0.10 192.168.113.58 TCP ldap > 61325 [ACK] Seq=3332499222 Ack=3620104763 Win=4659 Len=0

58 56.857061 10.99.26.22 10.99.248.20 TCP ovsessionmgr > 61325 [FIN, ACK] Seq=3696686496 Ack=258911240 Win=49640 Len=0

59 56.857066 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [ACK] Seq=258911240 Ack=3696686497 Win=4408 Len=0

60 56.857068 192.168.0.10 192.168.113.58 TCP ldap > 61325 [FIN, ACK] Seq=3332499222 Ack=3620104763 Win=4659 Len=0

61 56.857269 192.168.113.58 192.168.0.10 TCP 61325 > ldap [FIN, ACK] Seq=3620104763 Ack=3332499222 Win=32768 Len=0

62 56.898034 192.168.113.58 192.168.0.10 TCP 61325 > ldap [ACK] Seq=3620104764 Ack=3332499223 Win=32768 Len=0

63 57.381196 192.168.113.58 192.168.0.10 TCP 61325 > ldap [FIN, ACK] Seq=3620104763 Ack=3332499223 Win=32768 Len=0

64 57.381200 192.168.0.10 192.168.113.58 TCP ldap > 61325 [ACK] Seq=3332499223 Ack=3620104764 Win=4659 Len=0

65 57.381204 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [FIN, ACK] Seq=258911240 Ack=3696686497 Win=4408 Len=0

66 57.381931 10.99.26.22 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=3696686497 Ack=258911241 Win=49640 Len=0

67 70.622841 192.168.113.38 192.168.0.10 TCP 61325 > ldap [SYN] Seq=2324271576 Win=32768 Len=0 MSS=1460 WS=0

68 70.622846 192.168.0.10 192.168.113.38 TCP ldap > 61325 [SYN, ACK] Seq=151461930 Ack=2324271577 Win=4380 Len=0 MSS=1460 WS=0

69 70.664112 192.168.113.38 192.168.0.10 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

70 70.664121 10.99.248.20 10.99.26.22 TCP [TCP Port numbers reused] 61325 > ovsessionmgr [SYN] Seq=273484448 Win=4380 Len=0 MSS=1460 WS=0

71 70.664124 192.168.0.10 192.168.113.38 TCP ldap > 61325 [ACK] Seq=151461931 Ack=2324271674 Win=4477 Len=0

72 71.664114 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [SYN] Seq=273484448 Win=4380 Len=0 MSS=1460 WS=0

73 72.864329 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [SYN] Seq=273484448 Win=4380 Len=0 MSS=1460 WS=0

74 74.063807 10.99.248.20 10.99.26.22 TCP 61325 > ovsessionmgr [SYN] Seq=273484448 Win=4380 Len=0 MSS=1460

75 75.264023 192.168.0.10 192.168.113.38 TCP ldap > 61325 [RST, ACK] Seq=151461931 Ack=2324271674 Win=4477 Len=0

76 95.794344 192.168.113.37 192.168.0.10 TCP 61325 > ldap [SYN] Seq=136650233 Win=32768 Len=0 MSS=1460 WS=0

77 95.794349 192.168.0.10 192.168.113.37 TCP ldap > 61325 [SYN, ACK] Seq=3257078415 Ack=136650234 Win=4380 Len=0 MSS=1460 WS=0

78 95.835606 192.168.113.37 192.168.0.10 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

79 95.835616 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [SYN] Seq=3796674135 Win=4380 Len=0 MSS=1460 WS=0

80 95.835619 192.168.0.10 192.168.113.37 TCP ldap > 61325 [ACK] Seq=3257078416 Ack=136650331 Win=4477 Len=0

81 95.836114 10.99.26.28 10.99.248.20 TCP ovsessionmgr > 61325 [SYN, ACK] Seq=3190364148 Ack=3796674136 Win=49640 Len=0 MSS=1460 WS=0

82 95.836118 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [ACK] Seq=3796674136 Ack=3190364149 Win=4380 Len=0

83 95.836121 10.99.248.20 10.99.26.28 LDAP bindRequest(1) "uid=appBindAcct,ou=applications,c=us,dc=testLDAP,dc=com" simple

84 95.836578 10.99.26.28 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=3190364149 Ack=3796674233 Win=49543 Len=0

85 95.837317 10.99.26.28 10.99.248.20 LDAP bindResponse(1) success

86 95.837320 192.168.0.10 192.168.113.37 LDAP bindResponse(1) success

87 95.837323 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [ACK] Seq=3796674233 Ack=3190364163 Win=4394 Len=0

88 95.879556 192.168.113.37 192.168.0.10 LDAP searchRequest(2) "mobile=3605551234,ou=subscribers,c=us,dc=testLDAP,dc=com" baseObject

89 95.879562 10.99.248.20 10.99.26.28 LDAP searchRequest(2) "mobile=3605551234,ou=subscribers,c=us,dc=testLDAP,dc=com" baseObject

90 95.879565 192.168.0.10 192.168.113.37 TCP ldap > 61325 [ACK] Seq=3257078430 Ack=136650861 Win=5007 Len=0

91 95.880280 10.99.26.28 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=3190364163 Ack=3796674763 Win=49640 Len=0

92 95.881746 10.99.26.28 10.99.248.20 LDAP searchResEntry(2) "mobile=3605551234,ou=subscribers,c=us,dc=testLDAP,dc=com"

93 95.881751 192.168.0.10 192.168.113.37 LDAP searchResEntry(2) "mobile=3605551234,ou=subscribers,c=us,dc=testLDAP,dc=com"

94 95.881754 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [ACK] Seq=3796674763 Ack=3190364902 Win=5133 Len=0

95 95.881756 10.99.26.28 10.99.248.20 LDAP searchResDone(2) success

96 95.881758 192.168.0.10 192.168.113.37 LDAP searchResDone(2) success

97 95.881761 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [ACK] Seq=3796674763 Ack=3190364916 Win=5147 Len=0

98 95.922763 192.168.113.37 192.168.0.10 TCP 61325 > ldap [ACK] Seq=136650861 Ack=3257079183 Win=32768 Len=0

99 95.923035 192.168.113.37 192.168.0.10 LDAP unbindRequest(3)

100 95.923038 10.99.248.20 10.99.26.28 LDAP unbindRequest(3)

101 95.923040 192.168.0.10 192.168.113.37 TCP ldap > 61325 [ACK] Seq=3257079183 Ack=136650897 Win=5043 Len=0

102 95.923504 192.168.113.37 192.168.0.10 TCP 61325 > ldap [FIN, ACK] Seq=136650897 Ack=3257079183 Win=32768 Len=0

103 95.923507 192.168.0.10 192.168.113.37 TCP ldap > 61325 [ACK] Seq=3257079183 Ack=136650898 Win=5043 Len=0

104 95.923509 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [FIN, ACK] Seq=3796674799 Ack=3190364916 Win=5147 Len=0

105 95.923511 10.99.26.28 10.99.248.20 TCP ovsessionmgr > 61325 [FIN, ACK] Seq=3190364916 Ack=3796674799 Win=49640 Len=0

106 95.923515 10.99.248.20 10.99.26.28 TCP 61325 > ovsessionmgr [FIN, ACK] Seq=3796674799 Ack=3190364917 Win=5147 Len=0

107 95.923517 192.168.0.10 192.168.113.37 TCP ldap > 61325 [FIN, ACK] Seq=3257079183 Ack=136650898 Win=5043 Len=0

108 95.924223 10.99.26.28 10.99.248.20 TCP ovsessionmgr > 61325 [ACK] Seq=3190364917 Ack=3796674800 Win=49640 Len=0

109 95.924228 10.99.26.28 10.99.248.20 TCP [TCP Dup ACK 108#1] ovsessionmgr > 61325 [ACK] Seq=3190364917 Ack=3796674800 Win=49640 Len=0

110 95.964510 192.168.113.37 192.168.0.10 TCP 61325 > ldap [ACK] Seq=136650898 Ack=3257079184 Win=32768 Len=0

First off, all of this traffic is from ephemeral port 61325, whether it is from the client or the S-NAT side of the F5. All the traffic to and from the ldap port is on the VIP side of the F5. All of the traffic to and from the ovsessionmgr port is on the S-NAT side of the F5, to the LDAP servers.

LDAP VIP: 192.168.0.10

S-NAT IP: 10.99.248.20

LDAP Servers: 10.99.26.22, 10.99.26.25, 10.99.26.28

The rest of the IPs are clients.

First off, this capture only lasts 95 seconds, yet we see 4 different clients use the same ephemeral port in that short time. This 13 KB capture is, of course, massively filtered. The actual 95 second window is about 250 MB.

During this time, there is only one TCP reset sent (Frame 75), so we will analyze what happens there first, then go back to why there aren’t more failures.

In frames 55 and 56, we see a client send an unbind to the F5 and the F5 sends the unbind to the LDAP server. This starts the end of the two TCP sessions.

58 to 66 are the proper tear downs of the TCP sessions on both sides of the F5. This means that Client, F5 VIP side, F5 S-NAT side and server all agree the TCP sessions are over.

67 is a different client staring a new TCP handshake with the VIP. It is using the same ephemeral port, 61325. As TCP Segments can arrive out of order, we can see that 68 and 71 complete the TCP handshake, then the client attempts to bind in 69. Keep in mind that the data will be delivered from the IP stack to the app in order.

70 is the S-NAT trying to establish a TCP session with a pool member, 10.99.26.22. There is no SYN/ACK segment, so the S-NAT IP tries again in 72, 73 and 74. After the four failures to establish the S-NAT side TCP session, the F5 gives up and has to do something with the established connection on the VIP side, so it sends the TCP reset in frame 75.

“But wait”, you cry, “In Frame 66 we saw the final ACK of the session close with 10.99.26.22”. WAIT indeed, TIME_WAIT to be more exact. RFC 793 says that the TCP session, while closed in a data transmission sense, is still left locked in a state called TIME_WAIT. This is designed to make sure that any segments that were out of order make it to their destination before the socket is completely closed and refuses the data. It also makes sure that there is enough time for the final ACK of the close process has arrived. After all you can’t ACK that ACK or it’s not the final ACK.

Here’s the rub, TIME_WAIT was defined as 2 times the Maximum Segment Lifetime (MSL) in RFC 793. This value was, as the FRC puts it, is “arbitrary”. To be more exact, it made sense based on the networking technologies when the RFC was written in 1981. This value, which is 4 minutes (2 X an MSL of 2 min), was appropriate for the slow networks of the day. Since then, some vendors have decided this is too long. There is good reason for this. If a server needs to make a large number of outbound connections, this 4 minute penalty before the port can be reused can lead to port exhaustion. This value is tunable on an OS level. The default for windows is still 4 minutes. Our LDAP servers have 60 second TIME_WAIT, as part of the Solaris settings. TIME_WAIT means different things to the client and server. The client cannot make any calls out on that ephemeral port, not to any server, until the wait is over. The server cannot accept traffic from the source host on that same source port until the wait is over. The server could accept new traffic from the same source host, if it were on a new ephemeral port.

Between frames 66 and 74, the final ACK and the final SYN attempt, we only see about 12 seconds. The LDAP server is clearly in its 60 second TIME_WAIT and properly refusing the connection. Now you’re confused and so was I. Well, there are a couple more F5 TCP Profile settings that come into play. First, the TIME_WAIT is settable in the TCP profile and defaults to 2 seconds!! The client and server don’t agree, by way of negotiation, on a TIME_WAIT, they both have their own. This mismatch makes collisions possible. This setting is great to have, but it must be set for your use case. The default is not right for our use case. Second, and even more interesting/scary, is the TIME_WAIT Recycle option, which is defaulted to on. This behavior is described as TIME-WAIT Truncation, in RFC 1644. This allows the S-NAT side to cut the TIME_WAIT short if it receives a packet wanting the port, just like we see in frame 72. In this case, the setting doesn’t matter, as the F5 2 second timer is long over. This setting makes a lot of sense, performance wise, IF the S-NAT does not try to hit a pool member that is still in TIME_WAIT. In our case, we see that all 3 pool members were hit in that 95 seconds window. As we only have one S-NAT IP to work with and the F5 can’t guess at the TIME_WAIT on the pool members, we are setup for failure.

The moral of the story is, all the F5's performance settings have tradeoffs. By keeping TIME_WAIT low, the F5 can save on memory, by dumping the session early and avoid port exhaustion, but unless you have the proper number of S-NAT IPs, you are shooting yourself in the foot. By setting the F5 TIME_WAIT to match the time wait of your severs, you avoid the risk of collisions, but you tie up the ports and RAM on the F5. In essence, Adding IPs to the S-NAT IP pool lets us use probability to avoid most collisions.

The bottom line is, make sure your F5 setting match your use case. You may also want to read this article from F5 on things that may cause the F5 to send a TCP reset.

Lessons learned:

Configuring a load balancer well requires great knowledge about not just networking, but the nature of the application being load balanced.

F5 has done such a great job optimizing the defaults for HTTP, that most customers may never even see problems, if they are there at all.

Fundamentals like TCP session teardown and TCP state are worth knowing more about. There is no better reference than TCP/IP Illustrated by Stevens.

Mark R. Gamache's Random Blog

Monday, December 6, 2010

Tuning F5 Big-IP Performance to Ruin Your Performance

Wednesday, November 10, 2010

IT is All Ones and Zeros, heavy on the Zeros...

Sunday, May 16, 2010

Top Rejected Names for "I Can't Believe it's Not Butter"

Tuesday, February 23, 2010

How I Learned to Stop Hating Java and Start Loving s_client

Blog Archive