2.11. Split brain notification and automatic recovery

Split brain is a situation where, due to temporary failure of all network links between cluster nodes, and possibly due to intervention by a cluster management software or human error, both nodes switched to the Primary role while disconnected. This is a potentially harmful state, as it implies that modifications to the data might have been made on either node, without having been replicated to the peer. Thus, it is likely in this situation that two diverging sets of data have been created, which cannot be trivially merged.

DRBD split brain is distinct from cluster split brain, which is the loss of all connectivity between hosts managed by a distributed cluster management application such as Heartbeat. To avoid confusion, this guide uses the following convention:

  • Split brain refers to DRBD split brain as described in the paragraph above.
  • Loss of all cluster connectivity is referred to as a cluster partition, an alternative term for cluster split brain.

DRBD allows for automatic operator notification (by email or other means) when it detects split brain. See Section 5.17.1, “Split brain notification” for details on how to configure this feature.

While the recommended course of action in this scenario is to manually resolve the split brain and then eliminate its root cause, it may be desirable, in some cases, to automate the process. DRBD has several resolution algorithms available for doing so:

  • Discarding modifications made on the younger primary. In this mode, when the network connection is re-established and split brain is discovered, DRBD will discard modifications made, in the meantime, on the node which switched to the primary role last.
  • Discarding modifications made on the older primary. In this mode, DRBD will discard modifications made, in the meantime, on the node which switched to the primary role first.
  • Discarding modifications on the primary with fewer changes. In this mode, DRBD will check which of the two nodes has recorded fewer modifications, and will then discard all modifications made on that host.
  • Graceful recovery from split brain if one host has had no intermediate changes. In this mode, if one of the hosts has made no modifications at all during split brain, DRBD will simply recover gracefully and declare the split brain resolved. Note that this is a fairly unlikely scenario. Even if both hosts only mounted the file system on the DRBD block device (even read-only), the device contents typically would be modified (eg. by filesystem journal replay), ruling out the possibility of automatic recovery.

Whether or not automatic split brain recovery is acceptable depends largely on the individual application. Consider the example of DRBD hosting a database. The discard modifications from host with fewer changes approach may be fine for a web application click-through database. By contrast, it may be totally unacceptable to automatically discard any modifications made to a financial database, requiring manual recovery in any split brain event. Consider your application’s requirements carefully before enabling automatic split brain recovery.

Refer to Section 5.17.2, “Automatic split brain recovery policies” for details on configuring DRBD’s automatic split brain recovery policies.