Saturday, 3 August 2019

BIG-IP HIGH AVAILABILITY PART ONE

BIG-IP High Availability Design

it is based on v11.2. This post implements HA using v11.6.0.

As much as I would like to have a full blown lab with hardware, the finances don't yet allow me, so I have to make do with a virtual lab. I am running two virtual edition (VE) instances of the BIG-IP product within VMware Workstation.  In a production environment, you will of course need to observe the F5 recommended best practices. These include:

  • Use both directly connected hardwired failover and network-based failover.
  • Use both either a dedicated or shared VLAN and management interface for failover.
  • Use a dedicated mirroring VLAN & directly connected interface.
  • Use a primary and secondary mirroring addresses.
Some other things to consider:
  • When implementing HA on a VIPRION system you will be setting this up between the VCMP guests only.
  • When implementing HA using VEs, you will want to deploy each VE on a separate physical host for resiliency.
  • If you wish to deploy an active/active scenario take a look here.
  • If the HA devices are Layer 2 adjacent, consider a non-routable VLAN for the failover link appropriately sized to conserve IP addressing space.
The following diagram shows the lab design. I will use a dedicated virtual interface and VLAN for each of the HA functions.

Device Service Clustering (DSC)

It's worth taking some time to understand the various components and technologies that make up DSC. Having a clear understanding also helps to ensure that the configuration will progress smoothly without issue and will help you troubleshoot and issues along the way.

DSC is just another name for HA. The key components of DSC include:

  • Device trust & trust domains: basically all BIG-IP systems need to be able to trust each other before they can form a HA relationship. They do this by exchanging digital certificates. A trust domain is simply therefore a collection of BIG-IP devices that trust each other.
  • Device groups: a DG is a collection of BIG-IP devices in the same trust domain that can synchronise and failover their configuration. There are two flavours:
    • Sync-Failover: your typical/standard DG that allows for failover between two or more BIG-IP device.
    • Sync-Only: a DG that allows you to synchronise specific data within folder. This post I wrote covers folder and Sync-Only DGs in more detail.
  • Centralized Management Infrastructure (CMI): used with the ConfigSync process, this component allows the local MCP daemon (see below) to exchange MCP messages and commit ID updates to its peers.
There are also some pre-requisites that need to be in place before configuring DSC:
  • Licensing: each device must have the same product licensing and module provisioning
  • Software version: each device must run the same BIG-IP software version
  • Management IP: each device must have a unique management IP address
  • NTP: in a production environment NTP must be configured and in sync
  • Ports: the following ports are required between the two devices:
    • TCP 4353: Used for ConfigSync
    • TCP 443: Used to setup up the device trust
    • TCP 1029-1043 (v11.4+): Used for connection and persistence mirroring. A separate port starting at 1029 and incrementing by 1 is used for each traffic group.
    • UDP 1026: Used for network failover
    • UDP 1028 (v11.0 - 11.3): Used for connection persistence and mirroring. A single port is used for all traffic groups

A quick note on the ports above. It's always worth checking basic connectivity between the two peers as the physical network may be managed by a 3rd party. If deploying a VE, the host system, whether it's VMware of Hyper-V may also be managed by a 3rd party so definitely worth double checking.

BIG-IP Daemons

I thought it may be worth breaking this out into its own sub-section. A daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user. A BIG-IP system requires a number of core daemons to run in order for the system to be operational. In regards to DSC the main one that will pop up is the MCP daemon.

The Master Control Program (MCP) daemon allows communication between userland processes and the TMM. Needless to say, it's pretty important.

What is a userland process? This forms part of a computer system's memory & hardware protection mechanism. Basically, userland or user space processes are those that run outside of the systems kernel. A daemon is a userland process. If a daemon requires access to the system's hardware (input/output etc.) it has to go through the kernel. This Wikipedia page and this Stack Exchange post help to explain the userland concept further as the above description is a massive simplification.

In my minds eye this is how I visualise the above concept.

Using the above diagram as a reference point, when an configuration update is made the change is communicated to the local MCPD process. This communicates to the local TMM process which then sends the configuration & commit ID to the remote TMM process where the process is reversed.

BIG-IP Processes

Before deploying HA you should also check to ensure the following processes are running:
  • devmgmtd: Establishes and maintains device group functionality.
  • mcpd: Allows userland processes to communicate with TMM.
  • sod: Provides failover and restart capability.
  • tmm: Performs traffic management for the system. 
This can be checked with a simple command:

# bigstart status devmgmtd mcpd sod tmm

devmgmtd     run (pid 7139) 2 days
mcpd         run (pid 5974) 2 days
sod          run (pid 6951) 2 days
tmm          run (pid 10694) 2 days, 1 start

Certificates

As described above a trust domain requires the exchanges of digital certificates. So which certificates will the system exchange?

As described in part two of this series, when you setup device trust between the two peers, the first thing you must do is to enter the management IP address of the peer device to retrieve some information. This first TLS exchange uses the device certificate stored in System > Device Certificates > Device Certificate. This certificate can either be a self-signed certificate (the default) or you may wish to have your internal PKI sign the certificate.

During the process of establishing the MCPD mesh, the two peers communicate over TCP 4353 and use the identity certificates stored in Device Management  ››  Device Trust : Identity.

When you setup a BIG-IP system the system acts as its own Certificate Authority (CA) and creates it's own identity certificate. We can verify this to be true by looking at the certificates in the following locations:

  • CA: /Common/dtca.crt
  • Identity: /Common/dtdi.crt 
The following shows the CA cert and identity of each BIG-IP device prior to any HA configuration. As you can see each BIG-IP has signed its own identity certificate.

In the next post we'll go ahead and start configuring HA.



1 comment:

iRule

  iRule: -- o iRule is a powerful and flexible feature within the BIG-IP local traffic management (LTM). o IRule is a powerful & flexibl...