Saturday 3 August 2019

BIG-IP HIGH AVAILABILITY PART TWO

The previous post in this three part series provided an overview of some design decisions and requirements to setup HA.

Now we're done doing all the pre-checks and have an understanding of the various DSC components we can go ahead and start configuring. In my experience the order in which you do these next few steps is not important except setting up the trust between the devices which must be done last. 

Each of these steps is required on each physical device, with the exception of the trust configuration which can be done on either device.

Self IPs and VLANs

The very first thing you need to do when setting up HA between two BIG-IP devices is to configure the basic network settings. The two critical pieces of information are VLANs and a self-IP for each VLAN. Using the topology in the first post I go ahead and create each of the VLANs and self-IPs. Do not forget to create the floating IP(s) on both units.

ConfigSync

First off we will configure the ConfigSync. Go to Device Management  ››  Devices  ››  <DEVICE>  ›› Device Connectivity ›› ConfigSync and enter the Local Address.

Failover

Next we'll configure failover. In a production environment you may want to configure two Failover Unicast addresses - a TMM self IP and the management IP address. In this lab I am selecting the HA VLAN self IP. Device Management  ››  Devices  ››  <DEVICE>  ›› Device Connectivity ›› Failover Network. 

Mirroring

F5s recommendation is to use a dedicated VLAN and interface for mirroring and to configure a primary and secondary address. In my lab I will use a single dedicated virtual interface and VLAN for this purpose. Device Management  ››  Devices  ››  <DEVICE>  ›› Device Connectivity ›› Mirroring:

Device Trust

We're now ready to setup the trust domain. If we go to Device Management  ››  Device Trust : Peer List click Add and enter the management IP address, administrator username and password of the peer and then click Retrieve Device Information. You will be presented this page:

So what is this being presented? As we are entering the management IP address of the peer to establish the device trust, the peer presents the certificate associated with this interface. This happens to be the certificate found in System  ››  Device Certificates : Device Certificate  ››  Device Certificate. In this instance I have yet to go through the process of getting a device certificate from a PKI so here the peer device is presenting a self-signed certificate.

After you click Finished, based on my observations the following sequence of events occur:

1. The BIG-IP device you initiate the device trust from acts as the TLS client and initiates a TCP 3-way handshake on port 4353 to its peer over its ConfigSync IP address. The devices then go through a TLS mutual authentication process with each other whereby each presents its identity certificate to the other.

If we take a look at the /var/log/ltm file during this process we see the following:

ltm-1 notice mcpd[6608]: 01071436:5: CMI listener established at 10.128.2.110 port 6699
ltm-1 notice mcpd[6608]: 01071431:5: Attempting to connect to CMI peer 10.128.2.111 port 6699
ltm-1 err mcpd[6608]: 0107142f:3: Can't connect to CMI peer 10.128.2.111, TMM outbound listener not yet created

ltm-1 notice mcpd[6608]: 0107143a:5: CMI reconnect timer: enabled
ltm-1 notice mcpd[6608]: 01071431:5: Attempting to connect to CMI peer 10.128.2.111 port 6699
ltm-1 notice mcpd[6608]: 01071432:5: CMI peer connection established to 10.128.2.111 port 6699
ltm-1 notice mcpd[6608]: 01071451:5: Received CMI hello from /Common/ltm-2.lab.com
ltm-1 notice mcpd[6608]: 01071038:5: Master Key updated by user %cmi-mcpd-peer-10.128.2.111
ltm-1 notice mcpd[6608]: 010714a0:5: Sync of device group /Common/device_trust_group to commit id 17 6407634457465257616 /Common/ltm-2.lab.com 0 from device /Common/ltm-2.lab.com complete.
ltm-1 notice mcpd[6608]: 0107143a:5: CMI reconnect timer: disabled, all peers are connected


What is going on here? Remember the CMI process is used as part of the ConfigSync process to allow the local MCP daemon to exchange MCP messages and commit ID updates to its peers. If we look at each line we can work out what is happening:

  • CMI listener established at 10.128.2.110 port 6699
    • The local MCPD has initialised and created a listener on port 6699 so it can now accept incoming CMI connections.
  • Attempting to connect to CMI peer 10.128.2.111 port 6699:
    • The local MCPD process is attempting to set up a CMI connection to its peer in the trust domain. 
  • Can't connect to CMI peer 10.128.2.111, TMM outbound listener not yet created:
    • Not an issue per se but the local process has not yet established a listener (or failed to bind the socket). 
  • CMI reconnect timer: enabled:
    • The local device is starting up a timer to try reconnecting every five seconds. 
  • CMI peer connection established to 10.128.2.111 port 6699:
    • This device has successfully created a CMI connection to another device in the trust domain. 
  • Received CMI hello from /Common/ltm-2.lab.com:
    • The peer device has established a CMI connection to this device. 
  • Master Key updated by user %cmi-mcpd-peer-10.128.2.111 
    • Not 100% on this but I suspect this has something to do with the exchange of certificates and the encryption used in the communication exchange(s).
  • Sync of device group /Common/device_trust_group to commit id 17 6407634457465257616 /Common/ltm-2.lab.com 0 from device /Common/ltm-2.lab.com complete.
    • The two devices are now part of the trust domain and are in sync for the default device_trust_group. The commit ID 17 is the ID for this transaction.
  • CMI reconnect timer: disabled, all peers are connected
    • Fairly self-explanatory this one.
There now exists a full mesh between the MCP processes on each device.

2. What happens next is that the failover communication channel is established between the devices.
What is interesting is that if we look at any of these messages in Wireshark we can see that they are relaying information about the Traffic Group that has automatically been created. This traffic group contains (or will contain) objects that need to failover between the two devices if a Sync Failover group is created.

3. Next the mirroring communication channel is established between the two devices.

4. We can now confirm there is an MCP mesh between the two devices via the CLI:

[root@ltm-1:Standby:In Sync] config # netstat -pan | grep -E 6699
tcp        0      0 ::ffff:10.128.2.110:40872   ::ffff:10.128.2.111:6699    ESTABLISHED 6608/mcpd
tcp        0      0 ::ffff:10.128.2.110:6699    ::ffff:10.128.2.111:57790   ESTABLISHED 6608/mcpd


[root@ltm-2:Active:In Sync] config # netstat -pan | grep -E 6699
tcp        0      0 ::ffff:10.128.2.111:6699    ::ffff:10.128.2.110:40872   ESTABLISHED 6806/mcpd
tcp        0      0 ::ffff:10.128.2.111:57790   ::ffff:10.128.2.110:6699    ESTABLISHED 6806/mcpd

Device Groups

At this stage all three channels of communication are now open and established and there exists an MCP mesh. The two devices will both be in the active state as we have not created a Sync Failover Device Group

There is, however, a default Sync Only Device Group which is automatically created called device_trust_group. This is a system-generated and manage device group used to synchronise trust information across all devices.

As both devices are now in the Active state we want to go ahead a create a Sync Failover Device Group which will turn this into an active/standby pair.

For most implementations you'll have two members. As per F5 best practice recommendation the Network Failovercheckbox is ticked. 

The decision as to whether you enable Automatic Sync is going to be based on your policy. By having it enabled you reduce the number of steps during implementation, one less thing to think about. However, you lose the ability to confirm all your configuration before committing to the peer device.

In our network we do not enable Full Sync. I personally don't see the value in enabling it. By leaving it disabled the system will only send an incremental ConfigSync to its peer up to the value (size) in the box. If the ConfigSync is bigger than this a full sync will be done anyway.

Once created, we should now see the devices assume their active and standby roles. Both will show a prompt in the top-left corner displaying 'Awaiting Initial Sync':

If you click that it will take you to Device Management  ››  Overview. Here under the Devices section we want to click the name of the device we're currently logged into indicated by the word 'Self' and click Sync Device to Group then Sync.

The two devices are now in an active/standby configuration. In the next post we'll verify the ConfigSync is working as expected.












No comments:

Post a Comment

iRule

  iRule: -- o iRule is a powerful and flexible feature within the BIG-IP local traffic management (LTM). o IRule is a powerful & flexibl...