When you work with any technology there reaches a point where the “it’s a black box” approach is no longer valid and you have to dig in a little deeper and understand how the product works. With F5 BIG-IP this means understanding how traffic flows through the appliance and how to monitor and watch it.
TMOS – Client and Server Traffic
The F5 BIG-IP Traffic Management Operating System (TMOS) is a dual-stack full proxy which means the client terminates their TCP connection with the BIG-IP and the BIG-IP then makes a new TCP connection to the backend server. So as far as the client is concerned the F5 is the server and as far as the backend server is concerned the F5 is the client. So when you are troubleshooting it is important to understand there will always be client-side traffic and server-side traffic. The names are pretty self explaining but can be misleading in the troubleshooting process. What I mean by this is you’ll more than likely need to look at both client and server side traffic to gain a better understanding in how the application behaves/operated.
The F5 BIG-IP Traffic Management Operating System (TMOS) is a dual-stack full proxy which means the client terminates their TCP connection with the BIG-IP and the BIG-IP then makes a new TCP connection to the backend server. So as far as the client is concerned the F5 is the server and as far as the backend server is concerned the F5 is the client. So when you are troubleshooting it is important to understand there will always be client-side traffic and server-side traffic. The names are pretty self explaining but can be misleading in the troubleshooting process. What I mean by this is you’ll more than likely need to look at both client and server side traffic to gain a better understanding in how the application behaves/operated.
LTM Monitors
LTM monitors are used to evaluate the health of a pool member or a node. They typically run at a set interval (15 seconds by default) and will mark a pool member or node down after 3 failed intervals (this setting is configurable). If you are unsure why a monitor is failing the first place to look is the Local Traffic Manager logs. These logs are accessible via the GUI (System -> Logging) or the CLI (less /var/log/ltm) and will give you some basic information such as:
– when the resource was makes offline
– if the resource is flapping
LTM monitors are used to evaluate the health of a pool member or a node. They typically run at a set interval (15 seconds by default) and will mark a pool member or node down after 3 failed intervals (this setting is configurable). If you are unsure why a monitor is failing the first place to look is the Local Traffic Manager logs. These logs are accessible via the GUI (System -> Logging) or the CLI (less /var/log/ltm) and will give you some basic information such as:
– when the resource was makes offline
– if the resource is flapping
The LTM log will not however tell you why the monitor failed. To determine this you typically need to run a synthetic request using a CLI based tool such as curl or the TMSH monitor test command. Please note: if you can not access the application using these steps the F5 is probably not at fault – no matter how much the application owner swears everything works on the server
If this is a new application deployment I typically see monitor failures resulting from:
– Networking/firewall issues
– does the BIG-IP have a Self-IP on the sames network as the server?
– If not does the BIG-IP have a route to that network?
– Application issues
– Is the web server using name bases virtual directories? If so, what HTTP host header is it expecting?
– does the host OS have a firewall installed/configured?
– Networking/firewall issues
– does the BIG-IP have a Self-IP on the sames network as the server?
– If not does the BIG-IP have a route to that network?
– Application issues
– Is the web server using name bases virtual directories? If so, what HTTP host header is it expecting?
– does the host OS have a firewall installed/configured?
If this is an existing application you need to answer the tried and true “what changed” question. In these scenarios I typically work my way down this checklist to see where the problem lies:
– can I ping the server?
– can I telnet to the port?
– can I run a synthetic request using a CLI tool like curl?
– does the web server respond with the correct website (if you’re using customized HTTP monitors – which I highly recommend)
– can I ping the server?
– can I telnet to the port?
– can I run a synthetic request using a CLI tool like curl?
– does the web server respond with the correct website (if you’re using customized HTTP monitors – which I highly recommend)
These steps will usually lead me to the underlying issue or point me to the team who managed the device/server with the issue.
No comments:
Post a Comment