Tuesday, 17 September 2019

Troubleshooting BIG-IP - The Basics

Here's some of the terms used in this article that are handy to remember.
  • vip - When we refer to a VIP, we're referring to the virtual IP assigned to a virtual server. Often we'll use the term VIP interchangeably referring to the virtual server. The vip is an object within BIG-IP that listens for address and service requests. A client send traffic to the vip which routes according to the virtual server's configuration.
  • Node - The node is the server and service assigned to receive traffic from a virtual IP/Server. You will usually have more than one node defined to receive traffic from behind a virtual server.
  • Pool - The virtual server will have a pool defined to send traffic to. Server nodes are assigned to one or more pools and the pool defines how to balance the traffic between them.
  • ADC - BIG-IP is an Application Delivery Controller. Load Balancing, SSL Offloading, Compression, Acceleration, and traffic management all are features that define how an application delivery controller operate.
  • SNAT - SNAT or secure network address translation translates the source IP address within a connection to a BIG-IP system IP address that you define. The destination node then uses that new source address as it's destination address when responding to the request. SNAT ensures server nodes always send traffic back through the BIG-IP system. There are always one-off cases where you don't want this but SNAT is your friend.

Test Offline

Hopefully you can test out your full application stack prior to going live.  There are those times though when a go-live scenario is an application release nightmare and you're pushing out features left and right following cutover.  That's no fun and it will make troubleshooting worse.  If you have disaster recovery scenarios in place, you SHOULD have a redundant environment or something resembling one.  You can test and troubleshoot against this offline "data center" or whatever you have running so you're not causing constant resets to your live application.
  • If you have n+1 redundant application stacks (in production or other environment levels), test against the one with the least traffic (no traffic is preferred).  Some people run backup procedures against offline data centers which is great if you're not troubleshooting a problem.  Additional traffic will muddle the waters, especially if you're running vague tcpdumps.
  • Don't test half of the application stack.  Are you testing via IP only instead of using the DNS to resolve the application FQDN?  Is the database in your offline instance synched?  Make sure you're testing the full stack regardless of it being offline or not.  If you had a DNS issue and were only using the IP, you'd never duplicate the problem.  Whatever your offline instance is (test, stage, production, development) be wary of variables that will skew troubleshooting results; at best note them down for later inspection if needed.
  • Even offline, applications can be chatty if integrated to other systems; federation, other data integration systems, directory syncs.  If possible, temporarily suspend these external influences.  It could be a simple as pausing a script or it could be suspending OLAP Cube genration within SQL.  Noise always introduces variation.  Be aware of these outside influences and inspect accordingly.

Remember the Core Concepts

There are two core needs for any ADC to operate properly and these need to work prior to dissecting your application.  An ADC has to properly operate on your network and be able to speak to a client and server networks.  These can be the same network and you're simply hair-pinning your ADC traffic, or you have segregated networking needs. Separate interfaces, properly configured trunks, tagged VLANS..... you know the drill.  Trying to figure out why your application doesn't work is going to take a long time if BIG-IP stack isn't able to talk to your server network.  Part two of this is remembering the core concepts of a virtual IP.  You need a valid IP, you need a pool, you need a node and you need a port to listen on.  These things do get overlooked so if you're surprised, don't be.  It happens.
  • System Requirements
    • Can you reach the BIG-IP from the client network you're testing on?  An admin can slide a firewall change affecting application A and inadvertently break access to application B, C, D... Making sure you can reach your BIG-IP from all required networks is sometimes a good thing to check.  Believe me, this is an issue more than we like to admit.
    • Is BIG-IP accepting and distributing traffic properly?  If you're building your first application, this is a normal step.  If this is your 30th application, you assume BIG-IP is behaving properly.  There are cases where you'll need to step back and make sure BIG-IP is receiving traffic on listening interfaces and attempting to distribute traffic to your nodes.  You can check BIG-IP statistics for some basic sanity checks but it always helps to run a tcpdump or spin up Wireshark to just give you that warm and fuzzy feeling of self assurance.
  • Virtual IP Requirements
    • Is your VIP on a valid client network and listening?  It's easy to build a VIP for network X but select network Y for VLAN and Tunnel Traffic options.  Port scan from a client to validate!
    • Does your VIP have a valid pool and active pool members ready to receive client traffic?  A surprising amount of support calls are resolved because the admin, in haste, just threw a tcp or tcp_half_open monitor to get the node available in the pool and the service behind the required port was actually down.  If you're hurrying the basics, you're going to have a bad time! Make sure those nodes are up and listening on proper monitors, and they're available for use in your intended application pool.
    • Is your traffic going to BIG-IP but you're not seeing anything come back?  Are you running asynchronous routing?  If so, did you remember to SNAT?  A very common issue is misunderstanding when SNAT is needed.  Many times we'll have a developer or admin state "but I need to see the source IP of the client traffic"... that's a separate problem.  You're not going to see anything if you're application works.  Either SNAT your traffic or make BIG-IP your outbound application gateway.  SNAT is not discussion, it's a way of life!  Read up heavily on this hopefully PRIOR to implementation but if you don't, you'll just have some additional clean up down the road.  Leave that for the intern.

Reduce Complexity

Overly complex installations require a lot of troubleshooting if something goes wrong during go-live.  This is often why people do cutovers through staged releases; they're releasing smaller changes that can be easily managed.  When a problem arises, it's very helpful to isolate the issue quickly and reduced complexity or starting with simple problem solving is your best bet.  Remember that firewall admin that slipped in an ACL change that broke your application?  If you didn't start with basics, you'd still be checking certificate dates, http profiles, and iRule syntax before you had the epiphany to see if ANY traffic was reaching your BIG-IP.
  • As in testing offline, if possible use an offline datacenter.  This lowers the traffic significantly and can make tcpdumps quite manageable.
  • Disable all but one node.  Reducing the client traffic to a single server node eases traffic inspection by an order of magnitude.  If the problem you're solving is isolated to a single back end server, this too can also speed up the isolation process.
  • Drop out of SSL and go unencrypted.  If you're having "weird" issues, is it reproducible with non-SSL traffic?  This may not always be as easy as it sounds, but being able to determine if encryption or security is playing a problematic role can speed up troubleshooting significantly.
  • Does the application work without BIG-IP involved?  Sounds silly but it's a valid question and where you should generally start.  Make sure the application responds with basic functionality because ADC stacks for all their value, do add complexity to your environment.  Being able to segregate the two for sanity checks is sometimes a good idea.  Your vendor may also force you to do this if you call them with an application question.  Or lie to them.  That's cool too.

Additional Tools for Diagnosing Problems

I've run many applications behind BIG-IP and my toolset has remained mostly unchanged, mostly.  Sometimes I start a little too deep for basic troubleshooting by diving into a packet capture from the get go, but I've used Wireshark enough that it's second nature now.  As an application owner, all of your tools for problem diagnosis should be second nature too.
  • Wireshark - I have to say I started out with Bloodhound (the Microsoft internal network monitor tool) way back in the NT 3.51 days.  But when Wireshark released, it was a game changer.  Being able to easily reassemble VOIP traffic into a listenable wav file to illustrate to a customer the jitter analysis in the tcp dump was amazing.  Nowadays, there are plenty of players in the packet capture/analysis game, but there's a reason Wireshark is a verb; it's the standard... and we have an F5 Wireshark plugin for it too.
  • tcpdump/ssldump - Knowing how to run tcpdump and ssldump on your BIG-IP is a requirement when contacting support so you might as well learn it.  It'll end up coming in handy down the road when you also need to run ring dumps from a server looking for problematic traffic. 
  • Nmap  - Install it everywhere.  It's available for every operating system so there's no reason not to have it installed.  Quickly analyze system availability and determine if the application's even listening to your requests.  Nmap can do a lot more but as an advanced port scanner, it's all you'll ever need.
  • Openssl -  It's good to run Openssl for many reasons, from certificate analysis and CSR creation, to running your own CA for testing.  The bonus for troubleshooting is the s_client SSL/TLS program.  Connect and see what happens behind the SSL/TLS negotiation without needing a packet capture.  Security professionals, networks admins, and application owners rely on Openssl's s_client to validate their TLS configurations.
  • Curl  - The website is down.  Is it?  Or is it your browser's inability to pass traffic due to the 30 extensions you have running?  Curl is your site's sanity check to see exactly what's loading.  It's quick and painless and can answer several initial troubleshooting questions right off the bat.  And it does TLS so you can even overlap your Openssl s_client tests if you need.
  • HttpWatch or Fiddler - These are the real winners here when troubleshooting an application response.  Especially when you don't own the entire application stack.  Each have their strengths and weaknesses but between the two, you can diagnose almost any web application issue quickly.  Is the web site responding?  Are you receiving the correct certificate?  Is the data loading after CSS? What's that weird 3rd party script running?  All can be answered with either of these tools.

All of these recommendations were written up based on real support calls made by competent administrators who are new to BIG-IP or are new to their role as application administrator.  If you're a developer and are new to BIG-IP, welcome and don't feel bad, we all started out making the exact same mistakes.  Practice makes perfect so dive into your BIG-IP environment or purchase a BIG-IP Developer Lab License for yourself just to play around with.  Hopefully you're feeling only slightly frustrated but just remember to break down your problems and take them one at a time.  It's a good life lesson but it's also how you're going to fix your BIG-IP too.
Source:F5.com

No comments:

Post a Comment

iRule

  iRule: -- o iRule is a powerful and flexible feature within the BIG-IP local traffic management (LTM). o IRule is a powerful & flexibl...