F5 Load Balancer: January 2020

Monday, 13 January 2020

What Is Round-Robin Load Balancing?

Round‑robin load balancing is one of the simplest methods for distributing client requests across a group of servers. Going down the list of servers in the group, the round‑robin load balancer forwards a client request to each server in turn. When it reaches the end of the list, the load balancer loops back and goes down the list again (sends the next request to the first listed server, the one after that to the second server, and so on).

For more information about load balancing, see Load Balancing: Scalable Traffic Management with NGINX Plus.

The main benefit of round‑robin load balancing is that it is extremely simple to implement. However, it does not always result in the most accurate or efficient distribution of traffic, because many round‑robin load balancers assume that all servers are the same: currently up, currently handling the same load, and with the same storage and computing capacity. The following variants to the round‑robin algorithm take additional factors into account and can result in better load balancing:

Weighted round robin – A weight is assigned to each server based on criteria chosen by the site administrator; the most commonly used criterion is the server’s traffic‑handling capacity. The higher the weight, the larger the proportion of client requests the server receives. If, for example, server A is assigned a weight of 3 and server B a weight of 1, the load balancer forwards 3 requests to server A for each 1 it sends to server B.
Dynamic round robin – A weight is assigned to each server dynamically, based on real‑time data about the server’s current load and idle capacity

Health Monitors and Load Balancing

One of the truisms of architecting highly available systems is that you never, ever want to load balance a request to a system that is down. Therefore, some sort of health (status) monitoring is required. For applications, that means not just pinging the network interface or opening a TCP connection, it means querying the application and verifying that the response is valid.

This, obviously, requires the application to respond. And respond often. Best practices suggest determining availability every 5 seconds or so. That means every X seconds the load balancing service is going to open up a connection to the application and make a request. Just like a user would do.

That adds load to the application. It consumes network, transport, application and (possibly) database resources. Resources that cannot be used to service customers. While the impact on a single application may appear trivial, it's not. Remember, as load increases performance decreases. And no matter how trivial it may appear, health monitoring is adding load to what may be an already heavily loaded application.

But Lori, you may be thinking, you expound on the importance of monitoring and visibility all the time! Are you saying we shouldn't be monitoring applications?

Nope, not at all. Visibility is paramount, providing the actionable data necessary to enable highly dynamic, automated operations such as elasticity. Visibility through health-monitoring is a critical means of ensuring availability at both the local and global level.

What we may need to do, however, is move from active to passive monitoring.

PASSIVE MONITORING

Passive monitoring, as the modifier suggests, is not an active process. The Load balancer does not open up connections nor query an application itself. Instead, it snoops on responses being returned to clients and from that infers the current status of the application.

For example, if a request for content results in an HTTP error message, the load balancer can determine whether or not the application is available and capable of processing subsequent requests. If the load balancer is a BIG-IP, it can mark the service as "down" and invoke an active monitor to probe the application status as well as retrying the request to another available instance – insuring end-users do

Passive (inband) monitors are not binary. That is, they aren't simple "on" or "off" based on HTTP status codes. Such monitors can be configured to track the number of failures and evaluate failure rates against a configurable failure interval. When such thresholds are exceeded, the application can then be marked as "down".

Passive monitors aren't restricted to availability status, either. They can also monitor for performance (response time). Failure to meet response time expectations results in a failure, and the application continues to be watched for subsequent failures.

Passive monitors are, like most inline/inband technologies, transparent. They quietly monitor traffic and act upon that traffic without adding overhead to the process.

Passive monitoring gives operations the visibility necessary to enable predictable performance and to meet or exceed user expectations with respect to uptime, without negatively impacting performance or capacity of the applications it is monitoring.

F5 BIG-IP health checks and HTTP errors

When a web site or application becomes too large to run on a single server, it’s frequently placed on multiple servers with a load balancer in front of them to spread the load and also to remove faulty servers from the pool.

The F5 BIG-IP is one such load balancer.

To ensure that faulty servers are removed from the load balancer pool automatically, you need to set a health check (or monitor). These monitors poll the server every few seconds and based on what it receives can make decisions on whether to leave a server in a pool or mark it as faulty.

When monitoring web applications you create a http (or https) monitor, however the default out of the box behaviour is to check two conditions:

That content is returned by the server.
That the http/https port is open and responding.

Unfortunately applications can also fail in such a way that the above checks pass but the application is not available to the end user (eg: if it’s returning a 500 Internal Server Error). This will result in users intermittently receiving an error page, or worse – receiving an error page for an extended period of time if session persistence is turned on (ie: once a session is created, they’ll be directed to that one server).

To protect against this you need to create a new health check (monitor) that not only checks the above conditions, but also checks to ensure that the HTTP status code returned with the request is not an error code.

First of all you need to create a new HTTP monitor. You’ll have your typical send string (eg: HEAD / HTTP/1.0) but the receive string is what’s important.

Here’s the receive string I use:

HTTP/1.[01] [23]0[0-6]

This string will accept any returned HTTP status code from 200-206 and 300-306. Any error codes (ie: 4xx and 5xx) will result in the receive string not being matched and the check failing.

Here’s an example of the fully configured check:

You’ll note that I’m using HTTP/1.0 for the check despite HTTP/1.1 being the current version of the protocol. This is because I use the one health check across multiple clients, and HTTP/1.1 would require me to include a host header, eg:

HEAD / HTTP/1.1
Host: shaun.net
Connection: close

Using HTTP/1.0 (which does not support virtual hosts) eliminates this requirement and makes using a single check for many different clients much easier.

Once you’ve created your new monitor to your requirements you need to go to go the pool and add the new check.

After you’ve changed this, all you need to do is update the configuration and your new health check is active. If your application starts returning a 500 Internal Server Error, it should automatically be taken out of the pool within seconds!

Speeding up Secure TCP Connections

Implementing SSL/TLS can significantly impact server performance, because the SSL handshake operation (a series of messages the client and server exchange to verify that the connection is trusted) is quite CPU-intensive. The default timeout for the SSL handshake is 60 seconds and it can be redefined with the ssl_handshake_timeout directive. We do not recommend setting this value too low or too high, as that might result either in handshake failure or a long time to wait for the handshake to complete:

HTTPS Server Optimization

SSL operations consume extra CPU resources. The most CPU-intensive operation is the SSL handshake. There are two ways to minimize the number of these operations per client:

Enabling keepalive connections to send several requests via one connection
Reusing SSL session parameters to avoid SSL handshakes for parallel and subsequent connections

Sessions are stored in the SSL session cache shared between worker processes and configured by the ssl_session_cache directive. One megabyte of cache contains about 4000 sessions. The default cache timeout is 5 minutes. This timeout can be increased using the ssl_session_timeout directive.

. Below is a sample configuration optimized for a multi-core system with 10 megabyte shared session cache:

worker_processes auto;

http {
    ssl_session_cache   shared:SSL:10m;
    ssl_session_timeout 10m;

    server {
        listen              443 ssl;
        server_name         www.example.com;
        keepalive_timeout   70;

        ssl_certificate     www.example.com.crt;
        ssl_certificate_key www.example.com.key;
        ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers         HIGH:!aNULL:!MD5;
        #...
    }
}

What is a Reverse Proxy vs. Load Balancer?

Reverse proxy servers and load balancers are components in a client-server computing architecture. Both act as intermediaries in the communication between the clients and servers, performing functions that improve efficiency. They can be implemented as dedicated, purpose-built devices, but increasingly in modern web architectures they are software applications that run on commodity hardware.

The basic definitions are simple:

A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client.
A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client.

But they sound pretty similar, right? Both types of application sit between clients and servers, accepting requests from the former and delivering responses from the latter. No wonder there’s confusion about what’s a reverse proxy vs. load balancer. To help tease them apart, let’s explore when and why they’re typically deployed at a website.

Load Balancing

Load balancers are most commonly deployed when a site needs multiple servers because the volume of requests is too much for a single server to handle efficiently. Deploying multiple servers also eliminates a single point of failure, making the website more reliable. Most commonly, the servers all host the same content, and the load balancer’s job is to distribute the workload in a way that makes the best use of each server’s capacity, prevents overload on any server, and results in the fastest possible response to the client.

A load balancer can also enhance the user experience by reducing the number of error responses the client sees. It does this by detecting when servers go down, and diverting requests away from them to the other servers in the group. In the simplest implementation, the load balancer detects server health by intercepting error responses to regular requests. Application health checks are a more flexible and sophisticated method in which the load balancer sends separate health-check requests and requires a specified type of response to consider the server healthy.

Another useful function provided by some load balancers is session persistence, which means sending all requests from a particular client to the same server. Even though HTTP is stateless in theory, many applications must store state information just to provide their core functionality – think of the shopping basket on an e-commerce site. Such applications underperform or can even fail in a load-balanced environment, if the load balancer distributes requests in a user session to different servers instead of directing them all to the server that responded to the initial request.

Reverse Proxy

Whereas deploying a load balancer makes sense only when you have multiple servers, it often makes sense to deploy a reverse proxy even with just one web server or application server. You can think of the reverse proxy as a website’s “public face.” Its address is the one advertised for the website, and it sits at the edge of the site’s network to accept requests from web browsers and mobile apps for the content hosted at the website. The benefits are two-fold:

Increased security – No information about your backend servers is visible outside your internal network, so malicious clients cannot access them directly to exploit any vulnerabilities. Many reverse proxy servers include features that help protect backend servers from distributed denial-of-service (DDoS) attacks, for example by rejecting traffic from particular client IP addresses (blacklisting), or limiting the number of connections accepted from each client.
Increased scalability and flexibility – Because clients see only the reverse proxy’s IP address, you are free to change the configuration of your backend infrastructure. This is particularly useful In a load-balanced environment, where you can scale the number of servers up and down to match fluctuations in traffic volume.

Another reason to deploy a reverse proxy is for web acceleration – reducing the time it takes to generate a response and return it to the client. Techniques for web acceleration include the following:

Compression – Compressing server responses before returning them to the client (for instance, with gzip) reduces the amount of bandwidth they require, which speeds their transit over the network.
SSL termination – Encrypting the traffic between clients and servers protects it as it crosses a public network like the Internet. But decryption and encryption can be computationally expensive. By decrypting incoming requests and encrypting server responses, the reverse proxy frees up resources on backend servers which they can then devote to their main purpose, serving content.
Caching – Before returning the backend server’s response to the client, the reverse proxy stores a copy of it locally. When the client (or any client) makes the same request, the reverse proxy can provide the response itself from the cache instead of forwarding the request to the backend server. This both decreases response time to the client and reduces the load on the backend server.

How can I version controll my F5 Big-IP LTM load balancer configs while allowing changes via iControl or the web interface

Currently we use F5 Big-IP LTMs (IP load balancers) in our environment for load balancing. We have an existing process for making changes to LTM configs and pushing them out. I'm trying to figure out the best way to accommodate a new need of our environment.

This is our current method for making changes to our LTM configs (to create new VIPs or add or remove nodes for example) :

· Connect to a server where we maintain copies of our LTM configs

· check out a copy of the config from RCS version control

· vi the config file and make the change on this server

· diff the new config against the previous in version control

· paste this diff into a change control ticket in our ticketing system

· have a fellow network engineer buddy check the diff and sign off on it

· scp the new config file out to our secondary LTM server

· run a "bigpipe verify load /tmp/bigip.conf" to verify the new conf

· copy the staged bigip.conf over the live one in /config/bigip.conf

· run a "bigpipe load"

· tech logs into the web interface to confirm it looks good

· synchronize from our seconary LTM to our primary LTM by running "bigpipe config sync" on our secondary

We want to grant other teams the "Operator" permission to enable and disable nodes in a VIP pool. Doing so writes the changed node state into the config file.

How can we allow other teams to use iControl or the web interface to administratively enable and disable nodes, while maintaining our ability to version control and review config changes before deploying them while not overwriting the node states that have been set on the live production LTMs?

Sol:--

I would use bigpipe export and import.

then when you are ready to deploy do an export of the current state and then a final sdiff against your modified source controlled scf and keep the pool setting differences for the current state export. Then import the output of the sdiff.

Another option is to use something like chef to make changes. You can source control your recipes.

Creating an F5 Pool And Assign Multiple Health Monitors To It

Say I create two nodes SERVER1 and SERVER2

create ltm node SERVER1 description SERVER1 address 10.1.1.1%200
create ltm node SERVER2 description SERVER2 address 10.1.1.2%200

After I added the nodes I wanted to create a pool and assign it multiple Health Monitors instead of just a single one. In my script I have something like this

create ltm pool some_pool_1 members add { SERVER1:0 SERVER2:0 } monitor health_monitor_1 health_monitor_2 monitor_3 health_monitor_4 health_monitor_5

This will only assign this health_monitor_1 before throwing a Syntax Error: "health_monitor_2" unknown property. When I go into health_monitor_1 I can see SERVER1 and SERVER2 but when I go into any of the other Health Monitors I do not see the nodes SERVER1 and SERVER2in there. I have to go into the Pool and manually assign it the other Health Monitors. Can someone help me change my script

create ltm pool some_pool_1 members add { SERVER1:0 SERVER2:0 } monitor health_monitor_1 health_monitor_2 monitor_3 health_monitor_4 health_monitor_5

to be able to assign multiple health monitors to my pool?

solution:--

If you want to attach multiple monitors to the pool you are creating, you need to put them in parentheses:

create ltm pool p1 members add { 10.1.1.1:80 10.1.1.2:80 } monitor "http https"

or, if you want to have a minimum number of two monitors working:

create ltm pool p1 members add { 10.1.1.1:80 10.1.1.2:80 } monitor min 2 of { tcp http https }

How to tune TCP for high-frequency connections between two nodes

In our data center we have a F5 running on BigIP hardware that acts as a single ingress point for HTTPS requests from client machines in various office locations across the country. F5 terminates TLS and then forwards all requests to two Traefik load balancers, which route distribute the requests to the various service instances (Traefik nodes are running in Docker on Red Hat Enterprise but I believe that is irrelevant for my problem). From a throughput, CPU and memory point of view, those three network components are more than capable to handle the amount of requests and traffic with plenty of capacity to spare.

However, we noticed frequent 1000ms delays in HTTP(S) requests that clients make, particularly during high-load times. We tracked the problem to the following root cause:

During high-load times, the F5 "client" initiates new TCP connections to the Traefik "server" nodes at a high frequency (possibly 100+ per second).
Those connections are terminated on the Traefik "server" side when the HTTP responses have been returned.
Each closed connection remains in a TIME_WAIT state for 60 seconds on the Traefik host.
When the F5 initiates a new connection, it randomly chooses an available port from its ephemeral port range.
Sometimes (often during high load), there is a already a connection in Traefik in TIME_WAIT state with the same source IP + port, destination IP + port combination. When this happens, the TCP stack (?) on the Traefik host ignores the first SYN packet. Note: RFC 6056 calls this collision of instance-ids.
After 1000ms the retransmission timeout (RTO) mechanism kicks in on the F5 and resends the SYN packet. This time the Traefik host accepts the connection and completes the request correctly.

Obviously, those 1000ms delays are absolutely unacceptable. So we have considered the following solutions so far:

Reduce the RTO in F5 to retransmit faster, e.g. to 200ms.
Reduce net.ipv4.tcp_fin_timeout to close abandoned ~~TIME_WAIT~~ connections faster.
Update: This only applies to connections abandoned by the other side, when no FIN is returned. It does not have any effect on connections in TIME_WAIT state.
Enable net.ipv4.tcp_tw_reuse: Useless for incoming connections.
Enable net.ipv4.tcp_tw_recycle: AFAIK contra-indicated if client sends randomized TCP timestamps. Contradicting information (incl. empirical evidence) whether this feature was removed from Linux or not. Also, generally recommended NOT to mess with.
Add more source IPs and/or make Traefik listen on multiple ports to increase # of permutations in IP/port tuples.

I'll discard #1 because that's just a band-aid. Delays still occur, just a little less noticable. #3 wouldn't have any effect anyway, #4 would most likely render the system non-functional. That leaves #2 and #5.

But based on what I learned after reading through dozens of posts and technical articles, both of them will ultimately only reduce the chance of those "collisions". Because, what ultimately prevents the sending side, F5, to (pseudo)randomly choose a combination of ephemeral port, source IP and target port that still exists in TIME_WAIT state on the targeted Traefik host, regardless of how short the fin_timeout setting is (which should stay in the many sec range anyway)? We would only reduce the possibility of collisions, not eliminate it.

After all my research and in times of gigantic web applications, it really surprises me that this problem is not more discussed on the web (and solutions available). I'd really appreciate your thoughts and ideas on whether there is a better, more systematic solution in TCP land that will drive the occurrence of collisions near zero. I'm thinking along the lines of a TCP configuration that will allow the Traefik host to immediately accept a new connection despite an old connection being in TIME_WAIT state. But as of now, no luck in finding that.

Random thoughts and points:

At this point it is not feasible to change our various in-house applications to use longer-running HTTP(S) connections to reduce the number of requests/connections per second.
The network architecture of F5 and Traefik is not up for discussion, cannot be changed.
I recently investigated the ephemeral port selection on Windows clients. That algorithm seems to be sequential, not random. Maximizes time until port is reused, reduces security.
During load tests on an otherwise idle system, we generated ~100 HTTP requests/connections per second. The first collisions occurred already after a few seconds (say before 2000 requests total), even though the F5 is configured to use more than 60k ephemeral ports. I assume this is due to the pseudo-random nature of the port selection algorithm, which seems to do a fairly poor job of avoiding instance-id collisions.
The fact that the Traefik host accepts the TCP connection on SYN packet retransmission is probably a feature of the TCP implementation. RFC6056 speaks of TIME_WAIT assassination, which might be related to this.

Update: Per The Star Experiment, the net.ipv4.tcp_fin_timeout setting does NOT affect the TIME_WAIT state, only the FIN_WAIT_2 state. And per Samir Jafferali, on Linux systems (incl. our Red Hat Linux) the TIME_WAIT period is hardcoded in the source code and cannot be configured. On BSD according to the source it is configurable but I haven't verified this.

solution:--

Turns out there was a very simple solution to this problem after all, which we figured out after working with the Traefik vendor for a while. Turns out also that the fact that we are running Traefik in Docker does matter. The problem and solution is very specific to our setup but I still want to document here it in case others should encounter the same. Nevertheless, this does not invalidate the other, more general recommendations as collisions of instance IDs are a real problem.

Long story short: All Traefik instances are configured as host-constrained containers (i.e. tied to specific hosts) running in a Docker Swarm cluster. Traefik instances need to expose a port at host level so that they become reachable from the F5, which obviously is not a Docker Swarm participant. Those exposed ports had been configured in ingress mode, which was not only unnecessary (no need to route traffic through the Docker Swarm ingress network) but was also the cause for the dropped/ignored SYN packets. Once we switched the port mode to host, the delays disappeared.

Before:

  ports:
  - target: 8080
    published: 8080
    protocol: tcp
    mode: ingress

After:

  ports:
  - target: 8080
    published: 8080
    protocol: tcp
    mode: host

F5 Load Balancer