failover of F5 LTM

 

1, Normally we use HA group (fast failover) because failover when using VLAN fail-safe or Gateway fail-safe will take about 10 secs. HA group failover happens almost immediately.

2, We are using version 11.6 and I have found that we need change failover method (in traffic group) to HA group in order to make HA group failover works.
You may check HA score with command show /sys ha-group

When you have failover method as HA order configured, it shows like this:
LB(Active)(/Common)(tmos)# show /sys ha-group detail

————————–
Sys::HA Group: lb01-ha
————————–
State enabled
Active Bonus 10
Score 0

——————————————–
| Sys::HA Group Trunk: nko-lb01-ha:lb-trunk
——————————————–
| Threshold 1
| Percent Up 100
| Weight 20

HA group score is always 0, no failover will happen even if you shutdown the trunk. When you change failover method to HA group, then it shows as below:
LB(Active)(/Common)(tmos)# show /sys ha-group

————————–
Sys::HA Group: lb01-ha
————————–
State enabled
Active Bonus 10
Score 20

——————————————–
| Sys::HA Group Trunk: nko-lb01-ha:lb-trunk
——————————————–
| Threshold 1
| Percent Up 100
| Weight 20
| Score Contribution 20

3, HA failover unicast configuration
Always you need configure 2 ips in order to make failover works: MGMT IP and failover IP. Especially failover IP is in a dedicated failover link among LTM nodes.
Removing mgmt IP will cause both LTM nodes switch to active statue even failover ip is configured and reachable. Removing failover IP will cause the same issue even if the mgmt ip is configured and reachable.

Sync and mirror ip can be configured as failover IP only, mgmt ip is not necessary here.

4, What will triger failover?
https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/bigip-device-service-clustering-admin-11-5-0/8.html

refer to above link:

The BIG-IP system initiates failover according to any of several events that you define. These events fall into these categories:

System fail-safe
With system fail-safe, the BIG-IP system monitors various hardware components, as well as the heartbeat of various system services. You can configure the system to initiate failover whenever it detects a heartbeat failure.
Gateway fail-safe
With gateway fail-safe, the BIG-IP system monitors traffic between an active BIG-IP system in a device group and a pool containing a gateway router. You can configure the system to initiate failover whenever some number of gateway routers in a pool of routers becomes unreachable.
VLAN fail-safe
With VLAN fail-safe, the BIG-IP system monitors network traffic going through a specified VLAN. You can configure the system to initiate failover whenever the system detects a loss of traffic on the VLAN and the fail-safe timeout period has elapsed.
HA groups
With an HA group, the BIG-IP system monitors trunk, pool, or cluster health to create an HA health score for a device. You can configure the system to initiate failover whenever the health score falls below configurable levels.
Auto-failback
When you enable auto-failback, a traffic group that has failed over to another device fails back to a preferred device when that device is available. If you do not enable auto-failback for a traffic group, and the traffic group fails over to another device, the traffic group remains active on that device until that device becomes unavailable.

5, failover methods:

refer to link https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/bigip-device-service-clustering-admin-11-5-0/8.html

  • Select Load Aware when the device group contains heterogeneous platforms and you want to ensure that a traffic group fails over to the device with the most capacity at the moment that failover occurs.
  • Select HA Order to cause the traffic group to fail over to the first available device in the Failover Order list.
  • Select HA Group to cause the BIG-IP system to trigger failover based on an HA health score for the device.
Advertisements

F5 reset tshoot

The following causes are those of the most generous causes that clients get reset from F5:

1, retransmission 5 times + timeout, reset

2, If F5 does not support any of the SSL versions/ciphers client wants to use, F5 would respond with TCP/RST immediately with reset.

3, ssl handshake timeout by default 10 secs

4,Application caused reset.The simplest is when you close the socket, and then write more data on the output stream. By closing the socket, you told your peer that you are done talking, and it can forget about your connection. When you send more data on that stream anyway, the peer rejects it with an RST to let you know it isn’t listen
5, one arm scenario, vip need have snat configured in case the backend server has default gw bypass f5, it that case, f5 connection towards backend server will timeout, after that f5 will send reset to client side

6, following item5, if automap is configured,  source is translated to self IP on egress interface heading toward servers, if no self ip on that vlan configured on f5, f5 will send reset packet.

7, The Server SSL profile Secure Renegotiate setting is set to Require or Require Strict. The back-end SSL server lacks support for the Transport Layer Security (TLS) Renegotiation Indication Extension

8, HTTP header size exceeded by server

9, HTTP header size exceeded by client

10, When an existing client-side connection has been detached from the server-side connection and reselects a new server, the BIG-IP system sends a TCP RST to the server to close the existing server-side connection. This behavior typically comes from using iRule commands such as LB::reselect.

11, No route to host

12, The BIG-IP system receives a SYN for either one of the following conditions:

  • A virtual server of type reject
  • A port that is protected by the Port Lockdown settings on a self IP address