Whether you’re load balancing two servers or scaling on-demand instances across clouds, understanding the underlying F5® load balancing methods is the foundation of the BIG-IP® platform. Below you will find a defacto list of F5 load balancing methods from a Local LTM® perspective. Keep in mind the DNS aka GTM™ module also provides load balancing from a name resolution standpoint as well – you can read more about GTM vs LTM here. We’ll get another article out soon focusing on the DNS / GTM load balancing and how it works in concert with the LTM.
Local Load Balancing with the F5 BIG-IP Local Traffic Manager™ (LTM) – aka the “Good license”
In full proxy mode the BIG-IP LTM slices, dices, and transforms client and server side connections like a traffic ninja. The below load balancing methods are available when attaching servers aka nodes to pools. Before we get into the meat and potatoes of LTM load balancing, I’ll cover some key gotchas and concepts that have a significant bearing on how load balancing works in the F5 LTM. Note – the LTM module also comes standard in F5’s Better & Best software bundles.
Member vs Node
You’ll notice some load balancing methods have a choice of node or member. This distinction is very important. “Node” allows you to distribute traffic to the servers in your pool based on metrics of those servers “globally” on the BIG-IP. Whereas “Member” bases the metrics for load balancing decisions only from within that particular pool. This is important if you’re sharing servers across pools, i.e. “node” makes sense if you want to take into account the metrics of that server in all the pools before making a load balancing decision vs “member” for just that pool.
CMP® and Load Balancing:
You should understand the implications of CMP and how the BIG-IP distributes traffic with it activated. Clustered Multiprocessing™ (CMP) is a default BIG-IP traffic acceleration feature that creates separate Traffic Management Microkernel (TMM) instances for each CPU, sharing the workload among all CPUs. Because each TMM handles load balancing independently from the other TMMs, traffic distribution across the pool members may appear to be uneven vs if you were to disable CMP. In short what this means is that “order” is independent across TMM instances. For example: If you have 4 TMM instances and you’re using round robin, each instance starts at the first pool member. This lil’ diagram should give you some context:
Imagine 2 pool members in this test pool, 10.0.0.1 and 10.0.0.2.
|–Connection 1–> |||| –Connection 1–> TMM0 –> 10.0.0.1:80|
|–Connection 2–> |||| –Connection 2–> TMM1 –> 10.0.0.1:80|
|–Connection 3–> |||| –Connection 3–> TMM2 –> 10.0.0.1:80|
|–Connection 4–> |||(BIG-IP Virtual Server)||| –Connection 4–> TMM3 –> 10.0.0.1:80|
|–Connection 5–> |||| –Connection 5–> TMM0 –> 10.0.0.2:80|
|–Connection 6–> |||| –Connection 6–> TMM1 –> 10.0.0.2:80|
|–Connection 7–> |||| –Connection 7–> TMM2 –> 10.0.0.2:80|
|–Connection 8–> |||| –Connection 8–> TMM3 –> 10.0.0.2:80|
Static vs Dynamic load balancing methods
Static load balancing methods do not use any traffic metrics from the node / aka pool member to distribute traffic. For example “Round Robin” described above is a static load balancing method. Dynamic load balancing methods like “Least Connections” DO use traffic metrics from the node to distribute traffic.
Performance-based load balancing methods
It’s important to understand there are some dynamic load balancing methods that rely on performance monitors. Performance monitors are not to be confused with health monitors. Health monitors keep a close eye on on the health of a resource to deem it available or unavailable – they are independent to load balancing methods. Performance monitors measure the hosts performance and dynamically send more or less traffic to hosts in the pool – they work with corresponding dynamic load balancing methods. Health monitors can be applied at the node level or at the pool level, but performance monitors can only be applied at the node level – ie in the nodes list not attached to a pool.
F5 LTM Load Balancing Methods
|Load Balancing Method||Textbook Description||Austin’s Insight|
|Round Robin||Round Robin method passes each new connection request to the next server in the pool, eventually distributing connections evenly across the array of machines being load balanced. This is the default load balancing method.||
Round Robin is a static lb method you pick in early application testing when you have little or no information about the application and backend servers. In other words, there are typically better options – but if you needed to get something distributing traffic quick with little background info round robin will work.
It can also be a good baseline to identify if the application is stateful – ie if it would require a persistence profile, if you did round robin would break your app.
|The BIG-IP system distributes connections among pool members or nodes in a static rotation according to ratio weights that you define. In this case, the number of connections that each system receives over time is proportionate to the ratio weight you defined for each pool member or node. You set a ratio weight when you create each pool member or node.||
Ratio load balancing is a static load balancing method basing traffic distribution on the ratio you set, ie 3 to 1, 2 to 1, 5 to 2.
Sometimes folks will use ratios according to server size, ie double the server size send twice as much traffic to it. I’m not a huge fan of static ratio load balancing as things don’t always work out like that in the real world. I do however think they are useful for load balancing things you can’t easily measure and are more static – like circuits in a gateway pool. For example, if you have a gateway pool with two circuits, one is 1gb and the other is 100mb, a static ratio might make sense – but it always depends.
Dynamic Ratio (member)
Dynamic Ratio (node)
The Dynamic Ratio methods select a server based on various aspects of real-time server performance analysis. These methods are similar to the Ratio methods, except that with Dynamic Ratio methods, the ratio weights are system-generated, and the values of the ratio weights are not static. These methods are based on continuous monitoring of the servers, and the ratio weights are therefore continually changing.
Note: To implement Dynamic Ratio load balancing, you must first install and configure the necessary server software for these systems, and then install the appropriate performance monitor.
Dynamic ratio load balancing is great for application traffic that can vary greatly from user to user. For example, a user for a payroll application might generate reports for 100 employees made up of big bulky PDFs, vs a user who is just logging in to make a change to her account. If you based your traffic distribution decisions on a static load balancing method, or even one of the simpler dynamic methods like least connections, you wouldn’t have a good way of knowing one server is working 500% harder than the other pool members and is subsequently slower – unless you have a way to measure server performance – let me introduce you to dynamic ratio load balancing… 😉
In order to use this load balancing method you will need to apply a performance monitor at the node level to the members in the pool and ensure the server supports that data collection. Other than the SNMP performance monitor, performance monitors require their specific plug-in file to be installed on the actual server.
|Fastest (node) Fastest (application)||
The Fastest methods select a server based on the least number of current outstanding sessions. These methods require that you assign both a Layer 7 and a TCP type of profile to the virtual server.
Note: If the OneConnect ™ feature is enabled, the Least Connections methods do not include idle connections in the calculations when selecting a pool member or node. The Least Connections methods use only active connections in their calculations.
The key to understanding the fastest load balancing method is to grasp that an “outstanding request” is one that has not received a response. The BIG-IP has a counter on each pool member that increments when it receives a L7 request, and decrements those counters as soon as the response is received.
This method comes in handy when your pool members are located in different networks / data centers where latency might become a factor.
Again, you’ll need a TCP profile and a layer 7 profile – for example an HTTP profile.
Note: You’ll see the disclaimer txt from F5 to the left “If the OneConnect feature is enabled, the Least Connections methods.. etc etc..” When they say “least connection methods” they are talking about the load balancing methods that in one way or another distribute traffic to pool members based on least connections. Those methods are: Least Connections, Weighted Least Connections, Fastest, Observed, and Predictive.
Least Connections (member)
Least Connections (node)
The Least Connections methods are relatively simple in that the BIG-IP system passes a new connection to the pool member or node that has the least number of active connections.
Note: If the OneConnect feature is enabled, the Least Connections methods do not include idle connections in the calculations when selecting a pool member or node. The Least Connections methods use only active connections in their calculations.
The Least Connections method is a good choice when the servers you’re load balancing have similar performance capabilities, AND the application traffic on the servers DON’T vary greatly from user to user. Recall earlier in the article where when we discussed the payroll app – just because the server has less connections, it doesn’t necessarily mean it’s going to be faster. In those situations, you should take a look if dynamic ratio load balancing and investigate if it meets your needs.
Since there are some dependencies and complexities to dynamic ratio load balancing, weighted least connections method may be a good choice when you have servers with varying capacity that you can quantify.
Weighted Least Connections (member)
Weighted Least Connections (node)
Similar to the Least Connections methods, these load balancing methods select pool members or nodes based on the number of active connections. However, the Weighted Least Connections methods also base their selections on server capacity.
The Weighted Least Connections (member) method specifies that the system uses the value you specify in Connection Limit to establish a proportional algorithm for each pool member. The system bases the load balancing decision on that proportion and the number of current connections to that pool member. For example, member_a has 20 connections and its connection limit is 100, so it is at 20% of capacity. Similarly, member_b has 20 connections and its connection limit is 200, so it is at 10% of capacity. In this case, the system select selects member_b. This algorithm requires all pool members to have a non-zero connection limit specified.
The Weighted Least Connections (node) method specifies that the system uses the value you specify in the node’s Connection Limit setting and the number of current connections to a node to establish a proportional algorithm. This algorithm requires all nodes used by pool members to have a non-zero connection limit specified. If all servers have equal capacity, these load balancing methods behave in the same way as the Least Connections methods.
Note: If the OneConnect feature is enabled, the Weighted Least Connections methods do not include idle connections in the calculations when selecting a pool member or node. The Weighted Least Connections methods use only active connections in their calculations.
|Weighted least connections requires you to have a good handle on server capacity, which can be hard to quantify. Additionally, if your application have dynamic traffic varying from user to user it can skew the limits you set. Moral of the story? If your pool is made up of servers with different capacities and the app is relatively static, weighted least connections can work for your situation – but not the best for adaptive traffic distribution.|
|The Observed mode dynamic load balancing algorithm calculates a dynamic ratio value which is used to distribute connections among available pool members. The ratio is based on the number of Layer 4 (L4) connections last observed for each pool member. Every second, the BIG-IP system observes the number of L4 connections to each pool member and assigns a ratio value to each pool member. When a new connection is requested, Observed mode load balances the connections based on the ratio values assigned to each pool member, preferring the pool member with the greatest ratio value.||Observed load balancing is ratio load balancing where the ratios are dynamically assigned by the F5 every second based on connection counts. Observed can work well in small & large pools with varying server speeds, when there are a healthy amount of connections being distributed to the pool members/nodes – consistently. It does not perform well in situations with low connection counts, large pools can be more susceptible to that situation. The crux of it is – you want to make sure you have a large enough sample size at each of the pool members/nodes. In other words, the worry isn’t hitting some limit with a large pool, the worry is not having enough connections to for the sample size. To clarify further, a large pool could work well as long as it’s taking a high volume of connections consistently.|
|The Predictive methods use the ranking methods used by the Observed methods, where servers are rated according to the number of current connections. However, with the Predictive methods, the BIG-IP system analyzes the trend of the ranking over time, determining whether a node’s performance is currently improving or declining. The servers with performance rankings that are currently improving, rather than declining, receive a higher proportion of the connections.||Predictive is similar to observed except the ratio is derived from a trend over time. Ahhh so what is the length of time the predictive load balancing method bases its decision on, you ask? That time has never been confirmed or denied by F5. It’s rumored to be based on the monitoring interval, but some brief testing proved inconclusive.|
The Least Sessions method selects the server that currently has the least number of entries in the persistence table. Use of this load balancing method requires that the virtual server references a type of profile that tracks persistence connections, such as the Source Address Affinity or Universal profile type.
Note: The Least Sessions methods are incompatible with cookie persistence.
This is an interesting option for a load balancing method, as it bases the metric off of persistence table entries. There are only a couple persistence types that the F5 maintains tables for – they are Source Address, or Universal persistence.
Universal persistence allows you to persist traffic based on header or content data (in the client request and server response) that you specify in an iRule. Whether it’s source address, or universal – the traffic distribution works the same way – the pool members with less persistence table entries get more traffic.
|Ratio Least Connections||
The Ratio Least Connections methods cause the system to select the pool member according to the ratio of the number of connections that each pool member has active.
Note – If a ratio weight is not specified, it will be treated as a default value of 1.
|You don’t see this ratio least connections used very often in the wild, and for good reason – there are usually better options.|
need a suggestion for this requirment
we have pool of 5 members ,client requirment atleast 2 servers are UP in pool, only than VIP is marked as UP .
how we can do this ? any irule ?
The Suhu says
Thanks for the explanation, i want to ask what suitable for my condition.
I have 4 node server. in my problem when authenticate to third party OAuth.
Client –> Node 1 –> 3th party OAuth –> Node 1 (succeed)
Sometimes in reality:
Client –> Node 1 –> 3th party OAuth –> random Node (failed)
what method that suitable for my condition?
Austin Geraci says
The consensus from the WTIT engineering team is: This is likely a persistence problem, not a load balancing method problem. In the OAuth flow you will leave the F5, then comes back and open a new TCP connection – the lack of persistence means you will go to the wrong backend server sometimes. The use of source or cookie persistence will likely help you, and you will likely need to turn on oneconnect as well. You can also look into having APM federate the auth instead of the node.
The WTIT team loves tackling problems like this, contact us if you’re interested in services. When I was an engineer doing delivery I could only dream of having something like our Always-On program to bounce questions like this off of a team of experts. Also with our AO program you’re covered for emergency engineering support. It wont replace your ongoing support with F5, that’s mainly for license entitlement, RMA, as well as bug fixes. The AO program is more like professional services on demand from the best of the best F5 engineers. Read more about the AO program here – WTIT Always On Program for Advanced F5 Support
Mike Gomes says
In your article on the Observed method you mention that it is not suitable for large pools. How many pools do you consider large?
Austin Geraci says
Hi Mike – The moral of the story is you need a large enough sample size of connections to establish a baseline and come up with good predictions. So in other words, the worry isn’t having too large of a pool because you’ll hit some limit, the worry is you would have too large of a pool and only a handful of connections per pool member/node, and subsequently not have a large enough sample size to determine the trends.
Amine Kadimi says
This post is a reference. I revisit it many times a year to refresh about topics like observed and predictive or to know when to use which LB method. Thanks a lot
Santosh Sharma says
I like your post. More than that I like you replied very quickly to the question asked.
Maximum of LB Techniques are revolving around Least connection or Ratio. some times manual ratio some times automated.
Michael Voss says
We need exactly what you described under “Dynamic Ratio” for some proxyservers. But I don’t understand and cannot find any articles in the web how the f5 calculate the ratio of a member.
We have two proxyservers in our lab and i have set up a dca base monitor with one variable for the OID .188.8.131.52.4.1.2021.10.1.3.1 (system load for 1 minute). I see via the snmp logging at the f5 valus of 0.7 for one server and 0.01 for the other. But I see that the f5 connect new sessions to the first one with the higher load? I misunderstand any things”
Austin Geraci says
Hi Michael – Thanks, I’m glad you liked it! You might be misunderstanding how F5 treats the vaules, the higher weight actually means more traffic will be sent to that pool member.
“The higher the value of the weight, the more traffic is directed to the pool member. The lower the value, the less traffic is directed to the pool member.”
The weight is calculated with this formula:
(Number of Nodes in Pool)^(Mem Coefficient((Mem Threshold – Mem Utilization)/Mem
Threshold)) + (Number of Nodes in Pool)^(CPU Coefficient((CPU Threshold – CPU Utilization)
/CPU Threshold)) + (Number of Nodes in Pool)^(Disk Coefficient((Disk Threshold – Disk Utilization)
You can read more about how the formula works and some of the nuances like how the value of “number of nodes in the pool” defaults to 10 when using snmp_dca because it’s associated with individual nodes vs pools here in F5’s overview of Dynamic Ratio Load Balancing – https://support.f5.com/csp/article/K9125