The LINSTOR User’s Guide

Please Read This First

This guide is intended to serve users of the Software-Defined-Storage Solution LINSTOR as a definitive reference guide and handbook.

This guide assumes, throughout, that you are using the latest version of LINSTOR and related tools.

This guide is organized as follows:

LINSTOR

1. Basic administrative tasks / Setup

LINSTOR is a configuration management system for storage on Linux systems. It manages LVM logical volumes and/or ZFS ZVOLs on a cluster of nodes. It leverages DRBD for replication between different nodes and to provide block storage devices to users and applications. It manages snapshots, encryption and caching of HDD backed data in SSDs via bcache.

1.1. Concepts and Terms

This section goes over core concepts and terms that you will need to familiarize yourself with to understand how LINSTOR works and deploys storage. The section is laid out in a "ground up" approach.

1.1.1. Installable Components

linstor-controller

A LINSTOR setup requires at least one active controller and one or more satellites.

The linstor-controller contains the database that holds all configuration information for the whole cluster. It makes all decisions that need to have a view of the whole cluster. The controller is typically deployed as a HA service using Pacemaker and DRBD as it is a crucial part of the system. Multiple controllers can be used for LINSTOR but only one can be active.

linstor-satellite

The linstor-satellite runs on each node where LINSTOR consumes local storage or provides storage to services. It is stateless; it receives all the information it needs from the controller. It runs programs like lvcreate and drbdadm. It acts like a node agent.

linstor-client

The linstor-client is a command line utility that you use to issue commands to the system and to investigate the status of the system.

1.1.2. Objects

Objects are the end result which LINSTOR presents to the end-user or application, such as; Kubernetes/OpenShift, a replicated block device (DRBD), NVMeOF target, etc.

Node

Node’s are a server or container that participate in a LINSTOR cluster. The Node attribute defines:

  • Determines which LINSTOR cluster the node participates in

  • Sets the role of the node: Controller, Satellite, Auxiliary

  • NetInterface objects define the node’s connectivity

NetInterface

As the name implies, this is how you define the interface/address of a node’s network interface.

Definitions

Definitions define attributes of an object, they can be thought of as profile or template. Objects created will inherit the configuration defined in the definitions. A definition must be defined prior to creating the associated object. For example; you must create a ResourceDefinition prior to creating the Resource

StoragePoolDefinition
  • Defines the name of a storage pool

ResourceDefinition

Resource definitions define the following attributes of a resource:

  • The name of a DRBD resource

  • The TCP port for DRBD to use for the resource’s connection

VolumeDefinition

Volume definitions define the following:

  • A volume of a DRBD resource

  • The size of the volume

  • The volume number of the DRBD resource’s volume

  • The meta data properties of the volume

  • The minor number to use for the DRBD device associated with the DRBD volume

StoragePool

The StoragePool identifies storage in the context of LINSTOR. It defines:

  • The configuration of a storage pool on a specific node

  • The storage back-end driver to use for the storage pool on the cluster node (LVM, ZFS, etc)

  • The parameters and configuration to pass to the storage backed driver

Resource

LINSTOR has now has expanded its capabilities to manage a broader set of storage technologies outside of just DRBD. A Resource:

  • Represents the placement of a DRBD resource, as defined within the ResourceDefinition

  • Places a resource on a node in the cluster

  • Defines the placement of a ResourceDefinition on a node

Volume

Volumes are a subset of a Resource. A Resource could have multiple volumes, for example you may wish to have your database stored on slower storage than your logs in your MySQL cluster. By keeping the volumes under a single resource you are essentially creating a consistency group. The Volume attribute can define also define attributes on a more granular level.

1.2. Broader Context

While LINSTOR might be used to make the management of DRBD more convenient, it is often integrated with software stacks higher up. Such integration exist already for Kubernetes, OpenStack, OpenNebula and Proxmox. Chapters specific to deploying LINSTOR in these environments are included in this guide.

The southbound drivers used by LINSTOR are LVM, thinLVM and ZFS with support for Swordfish in progress.

1.3. Packages

LINSTOR is packaged in both the .rpm and the .deb variants:

  1. linstor-client contains the command line client program. It depends on python which is usually already installed. In RHEL 8 systems you will need to symlink python

  2. linstor-controller and linstor-satellite Both contain systemd unit files for the services. They depend on Java runtime environment (JRE) version 1.8 (headless) or higher.

For further detail about these packages see the Installable Components section above.

If you have a support subscription to LINBIT, you will have access to our certified binaries via our repositories.

1.4. Installation

If you want to use LINSTOR in containers skip this Topic and use the "Containers" section below for the installation.

1.4.1. Ubuntu Linux

If you want to have the option of creating replicated storage using DRBD, you will need to install drbd-dkms and drbd-utils. These packages will need to be installed on all nodes. You will also need to choose a volume manager, either ZFS or LVM, in this instance we’re using LVM.

# apt install -y drbd-dkms drbd-utils lvm2

Depending on whether your node is a LINSTOR controller, satellite, or both (Combined) will determine what packages are required on that node. For combined type nodes, we’ll need both the controller and satellite LINSTOR package.

Combined node:

# apt install linstor-controller linstor-satellite  linstor-client

That will make our remaining nodes our Satellites, so we’ll need to install the following packages on them:

# apt install linstor-satellite  linstor-client

1.4.2. SUSE Linux Enterprise Server

SLES High Availability Extension (HAE) includes DRBD.

On SLES, DRBD is normally installed via the software installation component of YaST2. It comes bundled with the High Availabilty package selection.

As we download DRBD’s newest module we can check if the LVM-tools are up to date as well. User who prefer a command line install may simply issue the following command to get the newest DRBD and LVM version:

# zypper install drbd lvm2

Depending on whether your node is a LINSTOR controller, satellite, or both (Combined) will determine what packages are required on that node. For combined type nodes, we’ll need both the controller and satellite LINSTOR package.

Combined node:

# zypper install linstor-controller linstor-satellite  linstor-client

That will make our remaining nodes our Satellites, so we’ll need to install the following packages on them:

# zypper install linstor-satellite  linstor-client

1.4.3. CentOS

CentOS has had DRBD 8 since release 5. For DRBD 9 you’ll need to look at EPEL and similar sources. Alternatively, if you have an active support contract with LINBIT you can utilize our RHEL 8 repositories. DRBD can be installed using yum. We can also check for the newest version of the LVM-tools as well.

LINSTOR requires DRBD 9 if you wish to have replicated storage. This requires an external repository to be configured, either LINBIT’s or a 3rd parties.
# yum install drbd kmod-drbd lvm2

Depending on whether your node is a LINSTOR controller, satellite, or both (Combined) will determine what packages are required on that node. For combined type nodes, we’ll need both the controller and satellite LINSTOR package.

On RHEL 8 systems you will need to install python2 for the linstor-client to work.

Combined node:

# yum install linstor-controller linstor-satellite  linstor-client

That will make our remaining nodes our Satellites, so we’ll need to install the following packages on them:

# yum install linstor-satellite  linstor-client

1.5. Containers

LINSTOR is also available as containers. The base images are available in LINBIT’s container registry, drbd.io.

In order to access the images, you first have to login to the registry (reach out to sales@linbit.com for credentials):

# docker login drbd.io

The containers available in this repo are:

  • drbd.io/drbd9:rhel8

  • drbd.io/drbd9:rhel7

  • drbd.io/drbd9:bionic

  • drbd.io/linstor-csi

  • drbd.io/linstor-controller

  • drbd.io/linstor-satellite

  • drbd.io/linstor-client

An up to date list of available images with versions can be retrieved by opening http://drbd.io in your browser. Make sure to access the host via "http", as the registry’s images themselves are served via "https".

To load the kernel module, needed only for LINSTOR satellites, you’ll need to run a drbd9 container in privileged mode. The kernel module containers either retrieve an official LINBIT package from a customer repository, or they try to build the kernel modules from source. If you intend to build from source, you need to have the according kernel headers (e.g., kernel-devel) installed on the host. There are 3 ways to execute such a module load container:

  • Specifying a LINBIT node hash and a distribution.

  • Bind-mounting an existing repository configuration.

  • Not doing one of the above to trigger a build from source.

Example using a hash and a distribution:

# docker run -it --rm --privileged -v /lib/modules:/lib/modules \
  -e LB_DIST=rhel7.7 -e LB_HASH=ThisIsMyNodeHash \
  drbd.io/drbd9:rhel7

Example using an existing repo config.

# docker run -it --rm --privileged -v /lib/modules:/lib/modules \
  -v /etc/yum.repos.d/linbit.repo:/etc/yum.repos.d/linbit.repo:ro \
  drbd.io/drbd9:rhel7
In both cases (hash + distribution, as well as bind-mounting) the hash or config has to be from a node that has a special property set. Feel free to contact our support, and we set this property.

Example building from shipped source (RHEL based):

# docker run -it --rm --privileged -v /lib/modules:/lib/modules \
  -v /usr/src:/usr/src:ro \
  drbd.io/drbd9:rhel7

Example building from shipped source (Debian based):

# docker run -it --rm --privileged -v /lib/modules:/lib/modules \
  -v /usr/src:/usr/src:ro -v /usr/lib:/usr/lib:ro \
  drbd.io/drbd9:bionic

Do not bind-mount /usr/lib on RHEL based systems! And do not forget to bind-mount it on Debian based ones.

For now (i.e., pre DRBD 9 version "9.0.17"), you must use the containerized DRBD kernel module, as opposed to loading a kernel module onto the host system. If you intend to use the containers you should not install the DRBD kernel module on your host systems. For DRBD version 9.0.17 or greater, you can install the kernel module as usual on the host system, but you need to make sure to load the module with the usermode_helper=disabled parameter (e.g., modprobe drbd usermode_helper=disabled).

Then run the LINSTOR satellite container, also privileged, as a daemon:

# docker run -d --name=linstor-satellite --net=host -v /dev:/dev --privileged drbd.io/linstor-satellite
net=host is required for the containerized drbd-utils to be able to communicate with the host-kernel via netlink.

To run the LINSTOR controller container as a daemon, mapping ports 3370, 3376 and 3377 on the host to the container:

# docker run -d --name=linstor-controller -p 3370:3370 -p 3376:3376 -p 3377:3377 drbd.io/linstor-controller

To interact with the containerized LINSTOR cluster, you can either use a LINSTOR client installed on a system via packages, or via the containerized LINSTOR client. To use the LINSTOR client container:

# docker run -it --rm -e LS_CONTROLLERS=<controller-host-IP-address> drbd.io/linstor-client node list

From this point you would use the LINSTOR client to initialize your cluster and begin creating resources using the typical LINSTOR patterns.

To stop and remove a daemonized container and image:

# docker stop linstor-controller
# docker rm linstor-controller

1.6. Initializing your cluster

We assume that the following steps are accomplished on all cluster nodes:

  1. The DRBD9 kernel module is installed and loaded

  2. drbd-utils are installed

  3. LVM tools are installed

  4. linstor-controller and/or linstor-satellite its dependencies are installed

  5. The linstor-client is installed on the linstor-controller node

Start and enable the linstor-controller service on the host where it has been installed:

# systemctl enable --now linstor-controller

If you are sure the linstor-controller service gets automatically enabled on installation you can use the following command as well:

# systemctl start linstor-controller

1.7. Using the LINSTOR client

Whenever you run the LINSTOR command line client, it needs to know where your linstor-controller runs. If you do not specify it, it will try to reach a locally running linstor-controller listening on IP 127.0.0.1 port 3376. Therefore we will use the linstor-client on the same host as the linstor-controller.

The linstor-satellite requires ports 3366 and 3367. The linstor-controller requires ports 3376 and 3377. Make sure you have these ports allowed on your firewall.
# linstor node list

should give you an empty list and not an error message.

You can use the linstor command on any other machine, but then you need to tell the client how to find the linstor-controller. As shown, this can be specified as a command line option, an environment variable, or in a global file:

# linstor --controllers=alice node list
# LS_CONTROLLERS=alice linstor node list

Alternatively you can create the /etc/linstor/linstor-client.conf file and populate it like below.

[global]
controllers=alice

If you have multiple linstor-controllers configured you can simply specify them all in a comma separated list. The linstor-client will simply try them in the order listed.

The linstor-client commands can also be used in a much faster and convenient way by only writing the starting letters of the parameters e.g.: linstor node listlinstor n l

1.8. Adding nodes to your cluster

The next step is to add nodes to your LINSTOR cluster. You need to provide:

  1. A node name which must match the output of uname -n

  2. The IP address of the node.

# linstor node create bravo 10.43.70.3

When you use linstor node list you will see that the new node is marked as offline. Now start and enable the linstor-satellite on that node so that the service comes up on reboot as well:

# systemctl enable --now  linstor-satellite

You can also use systemctl start linstor-satellite if you are sure that the service is already enabled as default and comes up on reboot.

About 10 seconds later you will see the status in linstor node list becoming online. Of course the satellite process may be started before the controller knows about the existence of the satellite node.

In case the node which hosts your controller should also contribute storage to the LINSTOR cluster, you have to add it as a node and start the linstor-satellite as well.

1.9. Storage pools

StoragePools identify storage in the context of LINSTOR. To group storage pools from multiple nodes, simply use the same name on each node. For example, one valid approach is to give all SSDs one name and all HDDs another.

On each host contributing storage, you need to create either an LVM VG or a ZFS zPool. The VGs and zPools identified with one LINSTOR storage pool name may have different VG or zPool names on the hosts, but do yourself a favor and use the same VG or zPool name on all nodes.

# vgcreate vg_ssd /dev/nvme0n1 /dev/nvme1n1 [...]

These then need to be registered with LINSTOR:

# linstor storage-pool create lvm alpha pool_ssd vg_ssd
# linstor storage-pool create lvm bravo pool_ssd vg_ssd
The storage pool name and common metadata is referred to as a storage pool definition. The listed commands create a storage pool definition implicitly. You can see that by using linstor storage-pool-definition list. Creating storage pool definitions explicitly is possible but not necessary.

To list your storage-pools you can use:

# linstor storage-pool list

or using the short version

# linstor sp l

In case anything goes wrong with the storage pool’s VG/zPool, e.g. the VG having been renamed or somehow became invalid you can delete the storage pool in LINSTOR with the following command, given that only resources with all their volumes in the so-called 'lost' storage pool are attached. This feature is available since LINSTOR v0.9.13.

# linstor storage-pool lost alpha pool_ssd

or using the short version

# linstor sp lo alpha pool_ssd

Should the deletion of the storage pool be prevented due to attached resources or snapshots with some of its volumes in another still functional storage pool, hints will be given in the 'status' column of the corresponding list-command (e.g. linstor resource list). After deleting the LINSTOR-objects in the lost storage pool manually, the lost-command can be executed again to ensure a complete deletion of the storage pool and its remaining objects.

1.9.1. A storage pool per backend device

In clusters where you have only one kind of storage and the capability to hot-repair storage devices, you may choose a model where you create one storage pool per physical backing device. The advantage of this model is to confine failure domains to a single storage device.

1.10. Resource groups

A resource group is a parent object of a resource definition where all property changes made on a resource group will be inherited by it’s resource definition children. The resource group also stores settings for automatic placement rules and can spawn a resource definition depending on the stored rules.

In simpler terms, resource groups are like templates that define characteristics of resources created from them. Changes to these pseudo templates will be applied to all resources that were created from the resource group, retroactively.

Using resource groups to define how you’d like your resources provisioned should be considered the de facto method for deploying volumes provisioned by LINSTOR. Chapters that follow which describe creating each resource from a resource-definition and volume-definition should only be used in special scenarios.
Even if you choose not to create and use resource-groups in your LINSTOR cluster, all resources created from resource-definitions and volume-definitions will exist in the 'DfltRscGrp' resource-group.

A simple pattern for deploying resources using resource groups would look like this:

# linstor resource-group create my_ssd_group --storage-pool pool_ssd --place-count 2
# linstor volume-group create my_ssd_group
# linstor resource-group spawn-resources my_ssd_group my_ssd_res 20G

The commands above would result in a resource named 'my_ssd_res' with a 20GB volume replicated twice being automatically provisioned from nodes who participate in the storage pool named 'pool_ssd'.

A more useful pattern could be to create a resource group with settings you’ve determined are optimal for your use case. Perhaps you have to run nightly online verifications of your volumes' consistency, in that case, you could create a resource group with the 'verify-alg' of your choice already set so that resources spawned from the group are pre-configured with 'verify-alg' set:

# linstor resource-group create my_verify_group --storage-pool pool_ssd --place-count 2
# linstor resource-group drbd-options --verify-alg crc32c my_verify_group
# linstor volume-group create my_verify_group
# for i in {00..19}; do
    linstor resource-group spawn-resources my_verify_group res$i 10G
  done

The commands above result in twenty 10GiB resources being created each with the 'crc32c' 'verify-alg' pre-configured.

You can tune the settings of individual resources or volumes spawned from resource groups by setting options on the respective resource-definition or volume-definition. For example, if 'res11' from the example above is used by a very active database receiving lots of small random writes, you might want to increase the 'al-extents' for that specific resource:

# linstor resource-definition drbd-options --al-extents 6007 res11

If you configure a setting in a resource-definition that is already configured on the resource-group it was spawned from, the value set in the resource-definition will override the value set on the parent resource-group. For example, if the same 'res11' was required to use the slower but more secure 'sha256' hash algorithm in it’s verifications, setting the 'verify-alg' on the resource-definition for 'res11' would override the value set on the resource-group:

# linstor resource-definition drbd-options --verify-alg sha256 res11
A rule of thumb for the hierarchy in which settings are inherited is the value "closer" to the resource or volume wins: volume-definition settings take precedence over volume-group settings, and resource-definition settings take precedence over resource-group settings.

1.11. Cluster configuration

1.11.1. Available storage plugins

LINSTOR has the following supported storage plugins as of writing:

  • Thick LVM

  • Thin LVM with a single thin pool

  • Thick ZFS

  • Thin ZFS

1.12. Creating and deploying resources/volumes

In the following scenario we assume that the goal is to create a resource 'backups' with a size of '500 GB' that is replicated among three cluster nodes.

First, we create a new resource definition:

# linstor resource-definition create backups

Second, we create a new volume definition within that resource definition:

# linstor volume-definition create backups 500G

If you want to change the size of the volume-definition you can simply do that by:

# linstor volume-definition set-size backups 0 100G

The parameter 0 is the number of the volume in the resource backups. You have to provide this parameter , because resources can have multiple volumes and they are identified by a so called volume-number. This number can be found by listing the volume-definitions.

The size of a volume-definition can only be decreased if it has no resource. Despite of that the size can be increased even with an deployed resource.

So far we have only created objects in LINSTOR’s database, not a single LV was created on the storage nodes. Now you have the choice of delegating the task of placement to LINSTOR or doing it yourself.

1.12.1. Manual placement

With the resource create command you may assign a resource definition to named nodes explicitly.

# linstor resource create alpha backups --storage-pool pool_hdd
# linstor resource create bravo backups --storage-pool pool_hdd
# linstor resource create charlie backups --storage-pool pool_hdd

1.12.2. Autoplace

The value after autoplace tells LINSTOR how many replicas you want to have. The storage-pool option should be obvious.

# linstor resource create backups --auto-place 3 --storage-pool pool_hdd

Maybe not so obvious is that you may omit the --storage-pool option, then LINSTOR may select a storage pool on its own. The selection follows these rules:

  • Ignore all nodes and storage pools the current user has no access to

  • Ignore all diskless storage pools

  • Ignore all storage pools not having enough free space

From the remaining storage pools, LINSTOR currently chooses the one with the most available free space.

If everything went right the DRBD-resource has now been created by LINSTOR. This can be checked by looking for the DRBD block device with the lsblk command which should look like drbd0000 or similar.

Now we should be able to mount the block device of our resource and start using LINSTOR.

2. Further LINSTOR tasks

2.1. DRBD clients

By using the --diskless option instead of --storage-pool you can have a permanently diskless DRBD device on a node. This means that the resource will appear as block device and can be mounted to the filesystem without an existing storage-device. The data of the resource is accessed over the network on another nodes with the same resource.

# linstor resource create delta backups --diskless

2.2. LINSTOR - DRBD consistency group/multiple volumes

The so called consistency group is a feature from DRBD. It is mentioned in this user-guide, due to the fact that one of LINSTOR’s main functions is to manage storage-clusters with DRBD. Multiple volumes in one resource are a consistency group.

This means that changes on different volumes from one resource are getting replicated in the same chronological order on the other Satellites.

Therefore you don’t have to worry about the timing if you have interdependent data on different volumes in a resource.

To deploy more than one volume in a LINSTOR-resource you have to create two volume-definitions with the same name.

# linstor volume-definition create backups 500G
# linstor volume-definition create backups 100G

2.3. Volumes of one resource to different Storage-Pools

This can be achieved by setting the StorPoolName property to the volume definitions before the resource is deployed to the nodes:

# linstor resource-definition create backups
# linstor volume-definition create backups 500G
# linstor volume-definition create backups 100G
# linstor volume-definition set-property backups 0 StorPoolName pool_hdd
# linstor volume-definition set-property backups 1 StorPoolName pool_ssd
# linstor resource create alpha backups
# linstor resource create bravo backups
# linstor resource create charlie backups
Since the volume-definition create command is used without the --vlmnr option LINSTOR assigned the volume numbers starting at 0. In the following two lines the 0 and 1 refer to these automatically assigned volume numbers.

Here the 'resource create' commands do not need a --storage-pool option. In this case LINSTOR uses a 'fallback' storage pool. Finding that storage pool, LINSTOR queries the properties of the following objects in the following order:

  • Volume definition

  • Resource

  • Resource definition

  • Node

If none of those objects contain a StorPoolName property, the controller falls back to a hard-coded 'DfltStorPool' string as a storage pool.

This also means that if you forgot to define a storage pool prior deploying a resource, you will get an error message that LINSTOR could not find the storage pool named 'DfltStorPool'.

2.4. LINSTOR without DRBD

LINSTOR can be used without DRBD as well. Without DRBD, LINSTOR is able to provision volumes from LVM and ZFS backed storage pools, and create those volumes on individual nodes in your LINSTOR cluster.

Currently LINSTOR supports the creation of LVM and ZFS volumes with the option of layering some combinations of LUKS, DRBD, and/or NVMe-oF/NVMe-TCP on top of those volumes.

For example, assume we have a Thin LVM backed storage pool defined in our LINSTOR cluster named, thin-lvm:

# linstor --no-utf8 storage-pool list
+--------------------------------------------------------------+
| StoragePool | Node      | Driver   | PoolName          | ... |
|--------------------------------------------------------------|
| thin-lvm    | linstor-a | LVM_THIN | drbdpool/thinpool | ... |
| thin-lvm    | linstor-b | LVM_THIN | drbdpool/thinpool | ... |
| thin-lvm    | linstor-c | LVM_THIN | drbdpool/thinpool | ... |
| thin-lvm    | linstor-d | LVM_THIN | drbdpool/thinpool | ... |
+--------------------------------------------------------------+

We could use LINSTOR to create a Thin LVM on linstor-d that’s 100GiB in size using the following commands:

# linstor resource-definition create rsc-1
# linstor volume-definition create rsc-1 100GiB
# linstor resource create --layer-list storage \
          --storage-pool thin-lvm linstor-d rsc-1

You should then see you have a new Thin LVM on linstor-d. You can extract the device path from LINSTOR by listing your linstor resources with the --machine-readable flag set:

# linstor --machine-readable resource list | grep device_path
            "device_path": "/dev/drbdpool/rsc-1_00000",

If you wanted to layer DRBD on top of this volume, which is the default --layer-list option in LINSTOR for ZFS or LVM backed volumes, you would use the following resource creation pattern instead:

# linstor resource-definition create rsc-1
# linstor volume-definition create rsc-1 100GiB
# linstor resource create --layer-list drbd,storage \
          --storage-pool thin-lvm linstor-d rsc-1

You would then see that you have a new Thin LVM backing a DRBD volume on linstor-d:

# linstor --machine-readable resource list | grep -e device_path -e backing_disk
            "device_path": "/dev/drbd1000",
            "backing_disk": "/dev/drbdpool/rsc-1_00000",

The complete list of currently supported --layer-list combinations are as follows:

  • drbd,luks,storage

  • drbd,storage

  • luks,storage

  • nvme,storage

  • storage

For information about the prerequisites for the luks layer, refer to the Encrypted Volumes section of this User’s Guide.

2.4.1. NVMe-oF/NVMe-TCP LINSTOR Layer

NVMe-oF/NVMe-TCP allows LINSTOR to connect diskless resources to a node with the same resource where the data is stored over NVMe fabrics. This leads to the advantage that resources can be mounted without using local storage by accessing the data over the network. LINSTOR is not using DRBD in this case, and therefore NVMe resources provisioned by LINSTOR are not replicated, the data is stored on one node.

NVMe-oF only works on RDMA-capable networks and NVMe-TCP on every network that can carry IP traffic. If you want to know more about NVMe-oF/NVMe-TCP visit https://www.linbit.com/en/nvme-linstor-swordfish/ for more information.

To use NVMe-oF/NVMe-TCP with LINSTOR the package nvme-cli needs to be installed on every Node which acts as a Satellite and will use NVMe-oF/NVMe-TCP for a resource:

If you are not using Ubuntu use the suitable command for installing packages on your OS - SLES: zypper - CentOS: yum
# apt install nvme-cli

To make a resource which uses NVMe-oF/NVMe-TCP an additional parameter has to be given as you create the resource-definition:

# linstor resource-definition create nvmedata  -l nvme,storage
As default the -l (layer-stack) parameter is set to drbd, storage when DRBD is used. If you want to create LINSTOR resources with neither NVMe nor DBRD you have to set the -l parameter to only storage.

Create the volume-definition for our resource:

# linstor volume-definiton create nvmedata 500G

Before you create the resource on your nodes you have to know where the data will be stored locally and which node accesses it over the network.

First we create the resource on the node where our data will be stored:

# linstor resource create alpha nvmedata --storage-pool pool_ssd

On the nodes where the resource-data will be accessed over the network, the resource has to be defined as diskless:

# linstor resource create beta nvmedata -d

The -d parameter creates the resource on this node as diskless.

Now you can mount the resource nvmedata on one of your nodes.

If your nodes have more than one NIC you should force the route between them for NVMe-of/NVME-TCP, otherwise multiple NIC’s could cause troubles.

2.5. Managing Network Interface Cards

LINSTOR can deal with multiple network interface cards (NICs) in a machine, they are called netif in LINSTOR speak.

When a satellite node is created a first netif gets created implicitly with the name default. Using the --interface-name option of the node create command you can give it a different name.

Additional NICs are created like this:

# linstor node interface create alpha 100G_nic 192.168.43.221
# linstor node interface create alpha 10G_nic 192.168.43.231

NICs are identified by the IP address only, the name is arbitrary and is not related to the interface name used by Linux. The NICs can be assigned to storage pools so that whenever a resource is created in such a storage pool, the DRBD traffic will be routed through the specified NIC.

# linstor storage-pool set-property alpha pool_hdd PrefNic 10G_nic
# linstor storage-pool set-property alpha pool_ssd PrefNic 100G_nic

FIXME describe how to route the controller <-> client communication through a specific netif.

2.6. Encrypted volumes

LINSTOR can handle transparent encryption of drbd volumes. dm-crypt is used to encrypt the provided storage from the storage device.

Basic steps to use encryption:

  1. Disable user security on the controller (this will be obsolete once authentication works)

  2. Create a master passphrase

  3. Add luks to the layer-list

  4. Don’t forget to re-enter the master passphrase after a controller restart.

2.6.1. Disable user security

Disabling the user security on the Linstor controller is a one time operation and is afterwards persisted.

  1. Stop the running linstor-controller via systemd: systemctl stop linstor-controller

  2. Start a linstor-controller in debug mode: /usr/share/linstor-server/bin/Controller -c /etc/linstor -d

  3. In the debug console enter: setSecLvl secLvl(NO_SECURITY)

  4. Stop linstor-controller with the debug shutdown command: shutdown

  5. Start the controller again with systemd: systemctl start linstor-controller

2.6.2. Encrypt commands

Below are details about the commands.

Before LINSTOR can encrypt any volume a master passphrase needs to be created. This can be done with the linstor-client.

# linstor encryption create-passphrase

crypt-create-passphrase will wait for the user to input the initial master passphrase (as all other crypt commands will with no arguments).

If you ever want to change the master passphrase this can be done with:

# linstor encryption modify-passphrase

The luks layer can be added when creating the resource-definition or the resource itself, whereas the former method is recommended since it will be automatically applied to all resource created from that resource-definition.

# linstor resource-definition create crypt_rsc --layer-list luks,storage

To enter the master passphrase (after controller restart) use the following command:

# linstor encryption enter-passphrase
Whenever the linstor-controller is restarted, the user has to send the master passphrase to the controller, otherwise LINSTOR is unable to reopen or create encrypted volumes.

2.7. Checking the state of your cluster

LINSTOR provides various commands to check the state of your cluster. These commands start with a 'list-' prefix and provide various filtering and sorting options. The '--groupby' option can be used to group and sort the output in multiple dimensions.

# linstor node list
# linstor storage-pool list --groupby Size

2.8. Managing snapshots

Snapshots are supported with thin LVM and ZFS storage pools.

2.8.1. Creating a snapshot

Assuming a resource definition named 'resource1' which has been placed on some nodes, a snapshot can be created as follows:

# linstor snapshot create resource1 snap1

This will create snapshots on all nodes where the resource is present. LINSTOR will ensure that consistent snapshots are taken even when the resource is in active use.

2.8.2. Restoring a snapshot

The following steps restore a snapshot to a new resource. This is possible even when the original resource has been removed from the nodes where the snapshots were taken.

First define the new resource with volumes matching those from the snapshot:

# linstor resource-definition create resource2
# linstor snapshot volume-definition restore --from-resource resource1 --from-snapshot snap1 --to-resource resource2

At this point, additional configuration can be applied if necessary. Then, when ready, create resources based on the snapshots:

# linstor snapshot resource restore --from-resource resource1 --from-snapshot snap1 --to-resource resource2

This will place the new resource on all nodes where the snapshot is present. The nodes on which to place the resource can also be selected explicitly; see the help (linstor snapshot resource restore -h).

2.8.3. Rolling back to a snapshot

LINSTOR can roll a resource back to a snapshot state. The resource must not be in use. That is, it may not be mounted on any nodes. If the resource is in use, consider whether you can achieve your goal by restoring the snapshot instead.

Rollback is performed as follows:

# linstor snapshot rollback resource1 snap1

A resource can only be rolled back to the most recent snapshot. To roll back to an older snapshot, first delete the intermediate snapshots.

2.8.4. Removing a snapshot

An existing snapshot can be removed as follows:

# linstor snapshot delete resource1 snap1

2.9. Setting options for resources

DRBD options are set using LINSTOR commands. Configuration in files such as /etc/drbd.d/global_common.conf that are not managed by LINSTOR will be ignored. The following commands show the usage and available options:

# linstor controller drbd-options -h
# linstor resource-definition drbd-options -h
# linstor volume-definition drbd-options -h
# linstor resource drbd-peer-options -h

For instance, it is easy to set the DRBD protocol for a resource named backups:

# linstor resource-definition drbd-options --protocol C backups

2.10. Adding and removing disks

LINSTOR can convert resources between diskless and having a disk. This is achieved with the resource toggle-disk command, which has syntax similar to resource create.

For instance, add a disk to the diskless resource backups on 'alpha':

# linstor resource toggle-disk alpha backups --storage-pool pool_ssd

Remove this disk again:

# linstor resource toggle-disk alpha backups --diskless

2.10.1. Migrating disks

In order to move a resource between nodes without reducing redundancy at any point, LINSTOR’s disk migrate feature can be used. First create a diskless resource on the target node, and then add a disk using the --migrate-from option. This will wait until the data has been synced to the new disk and then remove the source disk.

For example, to migrate a resource backups from 'alpha' to 'bravo':

# linstor resource create bravo backups --diskless
# linstor resource toggle-disk bravo backups --storage-pool pool_ssd --migrate-from alpha

2.11. DRBD Proxy with LINSTOR

LINSTOR expects DRBD Proxy to be running on the nodes which are involved in the relevant connections. It does not currently support connections via DRBD Proxy on a separate node.

Suppose our cluster consists of nodes 'alpha' and 'bravo' in a local network and 'charlie' at a remote site, with a resource definition named backups deployed to each of the nodes. Then DRBD Proxy can be enabled for the connections to 'charlie' as follows:

# linstor drbd-proxy enable alpha charlie backups
# linstor drbd-proxy enable bravo charlie backups

The DRBD Proxy configuration can be tailored with commands such as:

# linstor drbd-proxy options backups --memlimit 100000000
# linstor drbd-proxy compression zlib backups --level 9

LINSTOR does not automatically optimize the DRBD configuration for long-distance replication, so you will probably want to set some configuration options such as the protocol:

# linstor resource-connection drbd-options alpha charlie backups --protocol A
# linstor resource-connection drbd-options bravo charlie backups --protocol A

Please contact LINBIT for assistance optimizing your configuration.

2.12. External database

It is possible to have LINSTOR working with an external database provider like Postgresql, MariaDB and since version 1.1.0 even ETCD key value store is supported.

To use an external database there are a few additional steps to configure. External-SQL database need to download their JDBC-driver and copy it to the lib folder, ETCD just needs to correctly configure the LINSTOR.

2.12.1. Postgresql

Postgresql JDBC driver can be downloaded here:

And afterwards copied to: /usr/share/linstor-server/lib/

A sample Postgresql linstor.toml looks like this:

[db]
user = "linstor"
password = "linstor"
connection_url = "jdbc:postgresql://localhost/linstor"

2.12.2. MariaDB/Mysql

MariaDB JDBC driver can be downloaded here:

And afterwards copied to: /usr/share/linstor-server/lib/

A sample MariaDB linstor.toml looks like this:

[db]
user = "linstor"
password = "linstor"
connection_url = "jdbc:mariadb://localhost/LINSTOR?createDatabaseIfNotExist=true"
The LINSTOR schema/database is created as LINSTOR so make sure the mariadb connection string refers to the LINSTOR schema, as in the example above.

2.12.3. ETCD

ETCD is a distributed key-value store that makes it easy to keep your LINSTOR database distributed in a HA-setup. The ETCD driver is already included in the LINSTOR-controller package and only needs to be configured in the linstor.toml.

More information on how to install and configure ETCD can be found here: ETCD docs

And here is a sample [db] section from the linstor.toml:

[db]
## only set user/password if you want to use authentication, only since LINSTOR 1.2.1
# user = "linstor"
# password = "linstor"

## for etcd
## do not set user field if no authentication required
connection_url = "etcd://etcdhost1:2379,etcdhost2:2379,etcdhost3:2379"

## if you want to use TLS, only since LINSTOR 1.2.1
# ca_certificate = "ca.pem"
# client_certificate = "client.pem"

## if you want to use client TLS authentication too, only since LINSTOR 1.2.1
# client_key_pcks8_pem = "client-key.pcks8"
## set client_key_password if private key has a password
# client_key_password = "mysecret"

2.13. LINSTOR REST-API

To make LINSTOR’s administrative tasks more accessible and also available for web-frontends a REST-API has been created. The REST-API is embedded in the linstor-controller and since LINSTOR 0.9.13 configured via the the linstor.toml configuration file.

[http]
  enabled = true
  port = 3370
  listen_addr = 127.0.0.1  # to disable remote access

If you want to use the REST-API the current documentation can be found on the following link: https://app.swaggerhub.com/apis-docs/Linstor/Linstor/

2.13.1. LINSTOR REST-API HTTPS

The HTTP REST-API can also run secured by HTTPS and is highly recommended if you use any features that require authorization. Todo so you have to create a java keystore file with a valid certificate that will be used to encrypt all HTTPS traffic.

Here is a simple example on how you can create a self signed certificate with the keytool that is included in the java runtime:

keytool -keyalg rsa -keysize 2048 -genkey -keystore ./keystore_linstor.jks\
 -alias linstor_controller\
 -dname "CN=localhost, OU=SecureUnit, O=ExampleOrg, L=Vienna, ST=Austria, C=AT"

keytool will ask for a password to secure the generated keystore file and is needed for the LINSTOR-controller configuration. In your linstor.toml file you have to add the following section:

[https]
  keystore = "/path/to/keystore_linstor.jks"
  keystore_password = "linstor"

Now (re)start the linstor-controller and the HTTPS REST-API should be available on port 3371.

More information on how to import other certificates can be found here: https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html

When HTTPS is enabled, all requests to the HTTP /v1/ REST-API will be redirected to the HTTPS redirect.
LINSTOR REST-API HTTPS restricted client access

Client access can be restricted by using a SSL truststore on the Controller. Basically you create a certificate for your client and add it to your truststore and the client then uses this certificate for authentication.

First create a client certificate:

keytool -keyalg rsa -keysize 2048 -genkey -keystore client.jks\
 -storepass linstor -keypass linstor\
 -alias client1\
 -dname "CN=Client Cert, OU=client, O=Example, L=Vienna, ST=Austria, C=AT"

Then we import this certificate to our controller truststore:

keytool -importkeystore\
 -srcstorepass linstor -deststorepass linstor -keypass linstor\
 -srckeystore client.jks -destkeystore trustore_client.jks

And enable the truststore in the linstor.toml configuration file:

[https]
  keystore = "/path/to/keystore_linstor.jks"
  keystore_password = "linstor"
  truststore = "/path/to/trustore_client.jks"
  truststore_password = "linstor"

Now restart the Controller and it will no longer be possible to access the controller API without a correct certificate.

The LINSTOR client needs the certificate in PEM format, so before we can use it we have to convert the java keystore certificate to the PEM format.

# Convert to pkcs12
keytool -importkeystore -srckeystore client.jks -destkeystore client.p12\
 -storepass linstor -keypass linstor\
 -srcalias client1 -srcstoretype jks -deststoretype pkcs12

# use openssl to convert to PEM
openssl pkcs12 -in client.p12 -out client_with_pass.pem

To avoid entering the PEM file password all the time it might be convenient to remove the password.

openssl rsa -in client_with_pass.pem -out client1.pem
openssl x509 -in client_with_pass.pem >> client1.pem

Now this PEM file can easily be used in the client:

linstor --certfile client1.pem node list

The --certfile parameter can also added to the client configuration file, see Using the LINSTOR client for more details.

2.14. Logging

Linstor uses SLF4J with Logback as binding. This gives Linstor the possibilty to distinguish between the log levels ERROR, WARN, INFO, DEBUG and TRACE (in order of increasing verbosity). In the current linstor version (1.1.2) the user has the following four methods to control the logging level, ordered by priority (first has highest priority):

  1. TRACE mode can be enabled or disabled using the debug console:

    Command ==> SetTrcMode MODE(enabled)
    SetTrcMode           Set TRACE level logging mode
    New TRACE level logging mode: ENABLED
  2. When starting the controller or satellite a command line argument can be passed:

    java ... com.linbit.linstor.core.Controller ... --log-level INFO
    java ... com.linbit.linstor.core.Satellite  ... --log-level INFO
  3. The recommended place is the logging section in the /etc/linstor/linstor.toml file:

    [logging]
       level="INFO"
  4. As Linstor is using Logback as an implementation, /usr/share/linstor-server/lib/logback.xml can also be used. Currently only this approach supports different log levels for different components, like shown in the example below:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration scan="false" scanPeriod="60 seconds">
    <!--
     Values for scanPeriod can be specified in units of milliseconds, seconds, minutes or hours
     https://logback.qos.ch/manual/configuration.html
    -->
     <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
       <!-- encoders are assigned the type
            ch.qos.logback.classic.encoder.PatternLayoutEncoder by default -->
       <encoder>
         <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger - %msg%n</pattern>
       </encoder>
     </appender>
     <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
       <file>${log.directory}/linstor-${log.module}.log</file>
       <append>true</append>
       <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
         <Pattern>%d{yyyy_MM_dd HH:mm:ss.SSS} [%thread] %-5level %logger - %msg%n</Pattern>
       </encoder>
       <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>logs/linstor-${log.module}.%i.log.zip</FileNamePattern>
         <MinIndex>1</MinIndex>
         <MaxIndex>10</MaxIndex>
       </rollingPolicy>
       <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>2MB</MaxFileSize>
       </triggeringPolicy>
     </appender>
     <logger name="LINSTOR/Controller" level="INFO" additivity="false">
       <appender-ref ref="STDOUT" />
       <!-- <appender-ref ref="FILE" /> -->
     </logger>
     <logger name="LINSTOR/Satellite" level="INFO" additivity="false">
       <appender-ref ref="STDOUT" />
       <!-- <appender-ref ref="FILE" /> -->
     </logger>
     <root level="WARN">
       <appender-ref ref="STDOUT" />
       <!-- <appender-ref ref="FILE" /> -->
     </root>
    </configuration>

See the Logback Manual to find more details about logback.xml.

When none of the configuration methods above is used Linstor will default to INFO log level.

2.15. Secure Satellite connections

It is possible to have the LINSTOR use SSL secure TCP connection between controller and satellite connections. Without going into further details on how java’s SSL engine works we will give you command line snippets using the keytool from java’s runtime environment on how to configure a 3 node setup using secure connections. The node setup looks like this:

Node alpha is the just the controller. Node bravo and node charlie are just satellites.

Here are the commands to generate such a keystore setup, values should of course be edited for your environment.

# create directories to hold the key files
mkdir -p /tmp/linstor-ssl
cd /tmp/linstor-ssl
mkdir alpha bravo charlie


# create private keys for all nodes
keytool -keyalg rsa -keysize 2048 -genkey -keystore alpha/keystore.jks\
 -storepass linstor -keypass linstor\
 -alias alpha\
 -dname "CN=Max Mustermann, OU=alpha, O=Example, L=Vienna, ST=Austria, C=AT"

keytool -keyalg rsa -keysize 2048 -genkey -keystore bravo/keystore.jks\
 -storepass linstor -keypass linstor\
 -alias bravo\
 -dname "CN=Max Mustermann, OU=bravo, O=Example, L=Vienna, ST=Austria, C=AT"

keytool -keyalg rsa -keysize 2048 -genkey -keystore charlie/keystore.jks\
 -storepass linstor -keypass linstor\
 -alias charlie\
 -dname "CN=Max Mustermann, OU=charlie, O=Example, L=Vienna, ST=Austria, C=AT"

# import truststore certificates for alpha (needs all satellite certificates)
keytool -importkeystore\
 -srcstorepass linstor -deststorepass linstor -keypass linstor\
 -srckeystore bravo/keystore.jks -destkeystore alpha/certificates.jks

keytool -importkeystore\
 -srcstorepass linstor -deststorepass linstor -keypass linstor\
 -srckeystore charlie/keystore.jks -destkeystore alpha/certificates.jks

# import controller certificate into satellite truststores
keytool -importkeystore\
 -srcstorepass linstor -deststorepass linstor -keypass linstor\
 -srckeystore alpha/keystore.jks -destkeystore bravo/certificates.jks

keytool -importkeystore\
 -srcstorepass linstor -deststorepass linstor -keypass linstor\
 -srckeystore alpha/keystore.jks -destkeystore charlie/certificates.jks

# now copy the keystore files to their host destinations
ssh root@alpha mkdir /etc/linstor/ssl
scp alpha/* root@alpha:/etc/linstor/ssl/
ssh root@bravo mkdir /etc/linstor/ssl
scp bravo/* root@bravo:/etc/linstor/ssl/
ssh root@charlie mkdir /etc/linstor/ssl
scp charlie/* root@charlie:/etc/linstor/ssl/

# generate the satellite ssl config entry
echo '[netcom]
  type="ssl"
  port=3367
  server_certificate="ssl/keystore.jks"
  trusted_certificates="ssl/certificates.jks"
  key_password="linstor"
  keystore_password="linstor"
  truststore_password="linstor"
  ssl_protocol="TLSv1.2"
' | ssh root@bravo "cat > /etc/linstor/linstor_satellite.toml"

echo '[netcom]
  type="ssl"
  port=3367
  server_certificate="ssl/keystore.jks"
  trusted_certificates="ssl/certificates.jks"
  key_password="linstor"
  keystore_password="linstor"
  truststore_password="linstor"
  ssl_protocol="TLSv1.2"
' | ssh root@charlie "cat > /etc/linstor/linstor_satellite.toml"

Now just start controller and satellites and add the nodes with --communication-type SSL.

2.16. AutoQuorum Policies

LINSTOR automatically configures quorum policies on resources when quorum is achievable. This means, whenever you have at least two diskful and one or more diskless resource assignments, or three or more diskful resource assignments, LINSTOR will enable quorum policies for your resources automatically.

Inversely, LINSTOR will automatically disable quorum policies whenever there are less than the minimum required resource assignments to achieve quorum.

This is controlled via the, DrbdOptions/auto-quorum, property which can be applied to the linstor-controller, resource-group, and resource-definition. Accepted values for the DrbdOptions/auto-quorum property are disabled, suspend-io, and io-error.

Setting the DrbdOptions/auto-quorum property to disabled will allow you to manually, or more granularly, control the quorum policies of your resources should you so desire.

The default policies for DrbdOptions/auto-quorum are quorum majority, and on-no-quorum io-error. For more information on DRBD’s quorum features and their behavior, please refer to the quorum section of the DRBD user’s guide.
The DrbdOptions/auto-quorum policies will override any manually configured properties if DrbdOptions/auto-quorum is not disabled.

For example, to manually set the quorum policies of a resource-group named my_ssd_group, you would use the following commands:

# linstor resource-group set-property my_ssd_group DrbdOptions/auto-quorum disabled
# linstor resource-group set-property my_ssd_group DrbdOptions/Resource/quorum majority
# linstor resource-group set-property my_ssd_group DrbdOptions/Resource/on-no-quorum suspend-io

You may wish to disable DRBD’s quorum features completely. To do that, you would need to first disable DrbdOptions/auto-quorum on the appropriate LINSTOR object, and then set the DRBD quorum features accordingly. For example, use the following commands to disable quorum entirely on the my_ssd_group resource-group:

# linstor resource-group set-property my_ssd_group DrbdOptions/auto-quorum disabled
# linstor resource-group set-property my_ssd_group DrbdOptions/Resource/quorum off
# linstor resource-group set-property my_ssd_group DrbdOptions/Resource/on-no-quorum
Setting DrbdOptions/Resource/on-no-quorum to an empty value in the commands above deletes the property from the object entirely.

2.17. Getting help

2.17.1. From the command line

A quick way to list available commands on the command line is to type linstor.

Further information on subcommands (e.g., list-nodes) can be retrieved in two ways:

# linstor node list -h
# linstor help node list

Using the 'help' subcommand is especially helpful when LINSTOR is executed in interactive mode (linstor interactive).

One of the most helpful features of LINSTOR is its rich tab-completion, which can be used to complete basically every object LINSTOR knows about (e.g., node names, IP addresses, resource names, …​). In the following examples, we show some possible completions, and their results:

# linstor node create alpha 1<tab> # completes the IP address if hostname can be resolved
# linstor resource create b<tab> c<tab> # linstor assign-resource backups charlie

If tab-completion does not work out of the box, please try to source the appropriate file:

# source /etc/bash_completion.d/linstor # or
# source /usr/share/bash_completion/completions/linstor

For zsh shell users linstor-client can generate a zsh compilation file, that has basic support for command and argument completion.

# linstor gen-zsh-completer > /usr/share/zsh/functions/Completion/Linux/_linstor

2.17.2. From the community

For help from the community please subscribe to our mailing list located here: https://lists.linbit.com/listinfo/drbd-user

2.17.3. GitHub

To file bug or feature request please check out our GitHub page https://github.com/linbit

2.17.4. Paid support and development

Alternatively, if you wish to purchase remote installation services, 24/7 support, access to certified repositories, or feature development please contact us: +1-877-454-6248 (1-877-4LINBIT) , International: +43-1-8178292-0 | sales@linbit.com

3. LINSTOR Volumes in Kubernetes

This chapter describes LINSTOR volumes in Kubernetes as managed by the LINSTOR CSI plugin

3.1. Kubernetes Overview

Kubernetes is a container orchestrator (CO) made by Google. Kubernetes defines the behavior of containers and related services via declarative specifications. In this guide, we’ll focus on on using kubectl to manipulate .yaml files that define the specifications of Kubernetes objects.

3.2. LINSTOR CSI Plugin Deployment

Instructions for deploying the CSI plugin can be found on the project’s github. This will result in a linstor-csi-controller StatefulSet and a linstor-csi-node DaemonSet running in the kube-system namespace.

NAME                       READY   STATUS    RESTARTS   AGE     IP              NODE
linstor-csi-controller-0   5/5     Running   0          3h10m   191.168.1.200   kubelet-a
linstor-csi-node-4fcnn     2/2     Running   0          3h10m   192.168.1.202   kubelet-c
linstor-csi-node-f2dr7     2/2     Running   0          3h10m   192.168.1.203   kubelet-d
linstor-csi-node-j66bc     2/2     Running   0          3h10m   192.168.1.201   kubelet-b
linstor-csi-node-qb7fw     2/2     Running   0          3h10m   192.168.1.200   kubelet-a
linstor-csi-node-zr75z     2/2     Running   0          3h10m   192.168.1.204   kubelet-e

3.3. Basic Configuration and Deployment

Once all linstor-csi Pods are up and running, we can provision volumes using the usual Kubernetes workflows.

Configuring the behavior and properties of LINSTOR volumes deployed via Kubernetes is accomplished via the use of StorageClasses. Here below is the simplest practical StorageClass that can be used to deploy volumes:

Listing 1. linstor-basic-sc.yaml
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  # The name used to identify this StorageClass.
  name: linstor-basic-storage-class
  # The name used to match this StorageClass with a provisioner.
  # linstor.csi.linbit.com is the name that the LINSTOR CSI plugin uses to identify itself
provisioner: linstor.csi.linbit.com
parameters:
  # LINSTOR will provision volumes from the drbdpool storage pool configured
  # On the satellite nodes in the LINSTOR cluster specified in the plugin's deployment
  storagePool: "drbdpool"

We can create the StorageClass with the following command:

kubectl create -f linstor-basic-sc.yaml

Now that our StorageClass is created, we can now create a PersistentVolumeClaim which can be used to provision volumes known both to Kubernetes and LINSTOR:

Listing 2. my-first-linstor-volume-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-first-linstor-volume
  annotations:
    # This line matches the PersistentVolumeClaim with our StorageClass
    # and therefore our provisioner.
    volume.beta.kubernetes.io/storage-class: linstor-basic-storage-class
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

We can create the PersistentVolumeClaim with the following command:

kubectl create -f my-first-linstor-volume-pvc.yaml

This will create a PersistentVolumeClaim known to Kubernetes, which will have a PersistentVolume bound to it, additionally LINSTOR will now create this volume according to the configuration defined in the linstor-basic-storage-class StorageClass. The LINSTOR volume’s name will be a UUID prefixed with csi- This volume can be observed with the usual linstor resource list. Once that volume is created, we can attach it to a pod. The following Pod spec will spawn a Fedora container with our volume attached that busy waits so it is not unscheduled before we can interact with it:

Listing 3. my-first-linstor-volume-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: fedora
  namespace: default
spec:
  containers:
  - name: fedora
    image: fedora
    command: [/bin/bash]
    args: ["-c", "while true; do sleep 10; done"]
    volumeMounts:
    - name: my-first-linstor-volume
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: my-first-linstor-volume
    persistentVolumeClaim:
      claimName: "my-first-linstor-volume"

We can create the Pod with the following command:

kubectl create -f my-first-linstor-volume-pod.yaml

Running kubectl describe pod fedora can be used to confirm that Pod scheduling and volume attachment succeeded.

To remove a volume, please ensure that no pod is using it and then delete the PersistentVolumeClaim via kubectl. For example, to remove the volume that we just made, run the following two commands, noting that the Pod must be unscheduled before the PersistentVolumeClaim will be removed:

kubectl delete pod fedora # unschedule the pod.

kubectl get pod -w # wait for pod to be unscheduled

kubectl delete pvc my-first-linstor-volume # remove the PersistentVolumeClaim, the PersistentVolume, and the LINSTOR Volume.

3.4. Snapshots

Creating snapshots and creating new volumes from snapshots is done via the use of VolumeSnapshots, VolumeSnapshotClasses, and PVCs. First, you’ll need to create a VolumeSnapshotClass:

Listing 4. my-first-linstor-snapshot-class.yaml
kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1alpha1
metadata:
  name: my-first-linstor-snapshot-class
  namespace: kube-system
snapshotter: io.drbd.linstor-csi

Create the VolumeSnapshotClass with kubectl:

kubectl create -f my-first-linstor-snapshot-class.yaml

Now we will create a volume snapshot for the volume that we created above. This is done with a VolumeSnapshot:

Listing 5. my-first-linstor-snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
  name: my-first-linstor-snapshot
spec:
  snapshotClassName: my-first-linstor-snapshot-class
  source:
    name: my-first-linstor-volume
    kind: PersistentVolumeClaim

Create the VolumeSnapshot with kubectl:

kubectl create -f my-first-linstor-snapshot.yaml

Finally, we’ll create a new volume from the snapshot with a PVC.

Listing 6. my-first-linstor-volume-from-snapshot.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-first-linstor-volume-from-snapshot
spec:
  storageClassName: linstor-basic-storage-class
  dataSource:
    name: my-first-linstor-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

Create the PVC with kubectl:

kubectl create -f my-first-linstor-volume-from-snapshot.yaml

3.5. Volume Accessibility

LINSTOR volumes are typically accessible both locally and over the network.

By default, the CSI plugin will attach volumes directly if the Pod happens to be scheduled on a kubelet where its underlying storage is present. However, Pod scheduling does not currently take volume locality into account. The replicasOnSame parameter can be used to restrict where the underlying storage may be provisioned, if locally attached volumes are desired.

See localStoragePolicy to see how this default behavior can be modified.

3.6. Advanced Configuration

In general, all configuration for LINSTOR volumes in Kubernetes should be done via the StorageClass parameters, as seen with the storagePool in the basic example above. We’ll give all the available options an in-depth treatment here.

3.6.1. nodeList

nodeList is a list of nodes for volumes to be assigned to. This will assign the volume to each node and it will be replicated among all of them. This can also be used to select a single node by hostname, but it’s more flexible to use replicasOnSame to select a single node.

If you use this option, you must not use autoPlace.
This option determines on which LINSTOR nodes the underlying storage for volumes will be provisioned and is orthogonal from which kubelets these volumes will be accessible.

Example: nodeList: "node-a node-b node-c"

Example: nodeList: "node-a"

3.6.2. autoPlace

autoPlace is an integer that determines the amount of replicas a volume of this StorageClass will have. For instance, autoPlace: 3 will produce volumes with three-way replication. If neither autoPlace nor nodeList are set, volumes will be automatically placed on one node.

If you use this option, you must not use nodeList.
This option (and all options which affect autoplacement behavior) modifies the number of LINSTOR nodes on which the underlying storage for volumes will be provisioned and is orthogonal to which kubelets those volumes will be accessible from.

Example: autoPlace: 2

Default: autoPlace: 1

3.6.3. replicasOnSame

replicasOnSame is a list of key=value pairs used as required autoplacement selection labels when autoplace is used to determine where to provision storage. These labels correspond to LINSTOR node aux props. Please note both the key and value names are user-defined and arbitrary. Let’s explore this behavior with examples assuming a LINSTOR cluster such that node-a is configured with the following aux props zone=z1 and role=backups, while node-b is configured with only zone=z1.

If we configure a StorageClass with autoPlace: "1" and replicasOnSame: "zone=z1 role=backups", then all volumes created from that StorageClass will be provisioned on node-a, since that is the only node with all of the correct key=value pairs in the LINSTOR cluster. This is the most flexible way to select a single node for provisioning.

If we configure a StorageClass with autoPlace: "1" and replicasOnSame: "zone=z1", then volumes will be provisioned on either node-a or node-b as they both have the zone=z1 aux prop.

If we configure a StorageClass with autoPlace: "2" and replicasOnSame: "zone=z1 role=backups", then provisioning will fail, as there are not two or more nodes that have the appropriate aux props.

If we configure a StorageClass with autoPlace: "2" and replicasOnSame: "zone=z1", then volumes will be provisioned on both node-a and node-b as they both have the zone=z1 aux prop.

Example: replicasOnSame: "zone=z1 role=backups"

3.6.4. replicasOnDifferent

replicasOnDifferent is a list of key=value pairs to avoid as autoplacement selection. It is the inverse of replicasOnSame.

Example: replicasOnDifferent: "no-csi-volumes=true"

3.6.5. localStoragePolicy

localStoragePolicy determines, via volume topology, which LINSTOR Satellites volumes should be assigned and from where Kubernetes will access volumes. The behavior of each option is explained below in detail.

If you specify a nodeList, volumes will be created on those nodes, irrespective of the localStoragePolicy; however, the accessibility reporting will still be as described.
You must set volumeBindingMode: WaitForFirstConsumer in the StorageClass and the LINSTOR Satellites running on the kubelets must be able to support the diskfull placement of volumes as they are configured in that StorageClass for required or preferred to work properly.
Use topologyKey: "linbit.com/hostname" rather than topologyKey: "kubernetes.io/hostname" if you are setting affinity in your Pod or StatefulSet specs.

Example: localStoragePolicy: required

ignore (default)

When localStoragePolicy is set to ignore, regular autoplacement occurs based on autoplace, replicasOnSame, and replicasOnDifferent. Volume location will not affect Pod scheduling in Kubernetes and the volumes will be accessed over the network if they’re not local to the kubelet where the Pod was scheduled.

required

When localStoragePolicy is set to required, Kubernetes will report a list of places that it wants to schedule a Pod in order of preference. The plugin will attempt to provision the volume(s) according to that preference. The number of volumes to be provisioned in total is based off of autoplace.

If all preferences have been attempted, but no volumes where successfully assigned volume creation will fail.

In case of multiple replicas when all preferences have been attempted, and at least one has succeeded, but there are still replicas remaining to be provisioned, autoplace behavior will apply for the remaining volumes.

With this option set, Kubernetes will consider volumes that are not locally present on a kubelet to be unaccessible from that kubelet.

preferred

When localStoragePolicy is set to preferred, volume placement behavior will be the same as when it’s set to required with the exception that volume creation will not fail if no preference was able to be satisfied. Volume accessibility will be the same as when set to ignore.

3.6.6. storagePool

storagePool is the name of the LINSTOR storage pool that will be used to provide storage to the newly-created volumes.

Only nodes configured with this same storage pool with be considered for autoplacement. Likewise, for StorageClasses using nodeList all nodes specified in that list must have this storage pool configured on them.

Example: storagePool: my-storage-pool

3.6.7. disklessStoragePool

disklessStoragePool is an optional parameter that only effects LINSTOR volumes assigned disklessly to kubelets i.e., as clients. If you have a custom diskless storage pool defined in LINSTOR, you’ll specify that here.

Example: disklessStoragePool: my-custom-diskless-pool

3.6.8. encryption

encryption is an optional parameter that determines whether to encrypt volumes. LINSTOR must be configured for encryption for this to work properly.

Example: encryption: "true"

3.6.9. filesystem

filesystem is an option parameter to specify the filesystem for non raw block volumes. Currently supported options are xfs and ext4.

Example: filesystem: "xfs"

Default: filesystem: "ext4"

3.6.10. fsOpts

fsOpts is an optional parameter that passes options to the volume’s filesystem at creation time.

Please note these values are specific to your chosen filesystem.

Example: fsOpts: "-b 2048"

3.6.11. mountOpts

mountOpts is an optional parameter that passes options to the volume’s filesystem at mount time.

Example: mountOpts: "sync,noatime"

4. LINSTOR Volumes in Proxmox VE

This chapter describes DRBD in Proxmox VE via the LINSTOR Proxmox Plugin.

4.1. Proxmox VE Overview

Proxmox VE is an easy to use, complete server virtualization environment with KVM, Linux Containers and HA.

'linstor-proxmox' is a Perl plugin for Proxmox that, in combination with LINSTOR, allows to replicate VM disks on several Proxmox VE nodes. This allows to live-migrate active VMs within a few seconds and with no downtime without needing a central SAN, as the data is already replicated to multiple nodes.

4.2. Proxmox Plugin Installation

LINBIT provides a dedicated public repository for Proxmox VE users. This repository not only contains the Proxmox plugin, but the whole DRBD SDS stack including a DRBD SDS kernel module and user space utilities.

The DRBD9 kernel module is installed as a dkms package (i.e., drbd-dkms), therefore you’ll have to install pve-headers package, before you set up/install the software packages from LINBIT’s repositories. Following that order, ensures that the kernel module will build properly for your kernel. If you don’t plan to install the latest Proxmox kernel, you have to install kernel headers matching your current running kernel (e.g., pve-headers-$(uname -r)). If you missed this step, then still you can rebuild the dkms package against your current kernel, (kernel headers have to be installed in advance), by issuing apt install --reinstall drbd-dkms command.

LINBIT’s repository can be enabled as follows, where "$PVERS" should be set to your Proxmox VE major version (e.g., "5", not "5.2"):

# wget -O- https://packages.linbit.com/package-signing-pubkey.asc | apt-key add -
# PVERS=5 && echo "deb http://packages.linbit.com/proxmox/ proxmox-$PVERS drbd-9.0" > \
	/etc/apt/sources.list.d/linbit.list
# apt update && apt install linstor-proxmox

4.3. LINSTOR Configuration

For the rest of this guide we assume that you have a LINSTOR cluster configured as described in Initializing your cluster. Also make sure to setup each node as a "Combined" node. Start the "linstor-controller" on one node, and the "linstor-satellite" on all nodes. The preferred way to use the plugin, starting from version 4.1.0, is via LINSTOR resource groups and a single volume group within every resource group. LINSTOR resource groups are described in Resource groups. All the required LINSTOR configuration (e.g., redundancy count) has to be set on the resource group.

4.4. Proxmox Plugin Configuration

The final step is to provide a configuration for Proxmox itself. This can be done by adding an entry in the /etc/pve/storage.cfg file, with a content similar to the following.

drbd: drbdstorage
   content images,rootdir
   controller 10.11.12.13
   resourcegroup defaultpool

The "drbd" entry is fixed and you are not allowed to modify it, as it tells to Proxmox to use DRBD as storage backend. The "drbdstorage" entry can be modified and is used as a friendly name that will be shown in the PVE web GUI to locate the DRBD storage. The "content" entry is also fixed, so do not change it. The redundancy (specified in the resource group) specifies how many replicas of the data will be stored in the cluster. The recommendation is to set it to 2 or 3 depending on your setup. The data is accessible from all nodes, even if some of them do not have local copies of the data. For example, in a 5 node cluster, all nodes will be able to access 3 copies of the data, no matter where they are stored in. The "controller" parameter must be set to the IP of the node that runs the LINSTOR controller service. Only one node can be set to run as LINSTOR controller at the same time. If that node fails, start the LINSTOR controller on another node and change that value to its IP address. There are more elegant ways to deal with this problem. For more, see later in this chapter how to setup a highly available LINSTOR controller VM in Proxmox.

Recent versions of the plugin allow to define multiple different storage pools. Such a configuration would look like this:

drbd: drbdstorage
   content images,rootdir
   controller 10.11.12.13
   resourcegroup defaultpool

drbd: fastdrbd
   content images,rootdir
   controller 10.11.12.13
   resourcegroup ssd

drbd: slowdrbd
   content images,rootdir
   controller 10.11.12.13
   resourcegroup backup

By now, you should be able to create VMs via Proxmox’s web GUI by selecting "drbdstorage", or any other of the defined pools as storage location.

NOTE: DRBD supports only the raw disk format at the moment.

At this point you can try to live migrate the VM - as all data is accessible on all nodes (even on Diskless nodes) - it will take just a few seconds. The overall process might take a bit longer if the VM is under load and if there is a lot of RAM being dirtied all the time. But in any case, the downtime should be minimal and you will see no interruption at all.

4.5. Making the Controller Highly-Available

For the rest of this guide we assume that you installed LINSTOR and the Proxmox Plugin as described in LINSTOR Configuration.

The basic idea is to execute the LINSTOR controller within a VM that is controlled by Proxmox and its HA features, where the storage resides on DRBD managed by LINSTOR itself.

The first step is to allocate storage for the VM: Create a VM as usual and select "Do not use any media" on the "OS" section. The hard disk should of course reside on DRBD (e.g., "drbdstorage"). 2GB disk space should be enough, and for RAM we chose 1GB. These are the minimum requirements for the appliance LINBIT provides to its customers (see below). If you wish to set up your own controller VM, and you have enough hardware resources available, you can increase these minimum values. In the following use case, we assume that the controller VM was created with ID 100, but it is fine if this VM was created at a later time and has a different ID.

LINBIT provides an appliance for its customers that can be used to populate the created storage. For the appliance to work, we first create a "Serial Port". First click on "Hardware" and then on "Add" and finally on "Serial Port":

pm add serial1 controller vm
Figure 1. Adding a Serial Port

If everything worked as expected the VM definition should then look like this:

pm add serial2 controller vm
Figure 2. VM with Serial Port

The next step is to copy the VM appliance to the VM disk storage. This can be done with qemu-img.

Make sure to replace the VM ID with the correct one.
# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \
  of=/dev/drbd/by-res/vm-100-disk-1/0

Once completed you can start the VM and connect to it via the Proxmox VNC viewer. The default user name and password are both "linbit". Note that we kept the default configuration for the ssh server, so you will not be able to log in to the VM via ssh and username/password. If you want to enable that (and/or "root" login), enable these settings in /etc/ssh/sshd_config and restart the ssh service. As this VM is based on "Ubuntu Bionic", you should change your network settings (e.g., static IP) in /etc/netplan/config.yaml. After that you should be able to ssh to the VM:

pm ssh controller vm
Figure 3. LINBIT LINSTOR Controller Appliance

In the next step you add the controller VM to the existing cluster:

# linstor node create --node-type Controller \
  linstor-controller 10.43.7.254
As the Controller VM will be handled in a special way by the Proxmox storage plugin (comparing to the rest of VMs), we must make sure all hosts have access to its backing storage, before PVE HA starts the VM, otherwise the VM will fail to start. See below for the details on how to achieve this.

In our test cluster the Controller VM disk was created in DRBD storage and it was initially assigned to one host (use linstor resource list to check the assignments). Then, we used linstor resource create command to create additional resource assignments to the other nodes of the cluster for this VM. In our lab consisting of four nodes, we created all resource assignments as diskful, but diskless assignments are fine as well. As a rule of thumb keep the redundancy count at "3" (more usually does not make sense), and assign the rest as diskless.

As the storage for the Controller VM must be made available on all PVE hosts in some way, we must make sure to enable the drbd.service on all hosts (given that it is not controlled by LINSTOR at this stage):

# systemctl enable drbd
# systemctl start drbd

By default, at startup the linstor-satellite service deletes all of its resource files (.res) and regenerates them. This conflicts with the drbd services that needs these resource files to start the controller VM. It is good enough to first bring up the resources via drbd.service and ensure that the linstor-satellite.service, which brings up the controller resource never deletes the according res file. To make the necessary changes, you need to create a drop-in for the linstor-satellite.service via systemctl (do *not edit the file directly).

systemctl edit linstor-satellite
[Service]
Environment=LS_KEEP_RES=vm-100-disk
[Unit]
After=drbd.service

Of course adapt the name of the controller VM in the LS_KEEP_RES variable. Note that the value given is interpreted as regex, so you don’t need to specify the exact name.

Don’t forget to restart the linstor-satellite.service.

After that, it is time for the final steps, namely switching from the existing controller (residing on the physical host) to the new one in the VM. So let’s stop the old controller service on the physical host, and copy the LINSTOR controller database to the VM host:

# systemctl stop linstor-controller
# systemctl disable linstor-controller
# scp /var/lib/linstor/* root@10.43.7.254:/var/lib/linstor/

Finally, we can enable the controller in the VM:

# systemctl start linstor-controller # in the VM
# systemctl enable linstor-controller # in the VM

To check if everything worked as expected, you can query the cluster nodes on a physical PVE host by asking the controller in the VM: linstor --controllers=10.43.7.254 node list. It is perfectly fine that the controller (which is just a Controller and not a "Combined" host) is shown as "OFFLINE". This might change in the future to something more reasonable.

As the last — but crucial — step, you need to add the "controlervm" option to /etc/pve/storage.cfg, and change the controller IP address to the IP address of the Controller VM:

drbd: drbdstorage
   content images,rootdir
   resourcegroup defaultpool
   controller 10.43.7.254
   controllervm 100

Please note the additional setting "controllervm". This setting is very important, as it tells to PVE to handle the Controller VM differently than the rest of VMs stored in the DRBD storage. In specific, it will instruct PVE to NOT use LINSTOR storage plugin for handling the Controller VM, but to use other methods instead. The reason for this, is that simply LINSTOR backend is not available at this stage. Once the Controller VM is up and running (and the associated LINSTOR controller service inside the VM), then the PVE hosts will be able to start the rest of virtual machines which are stored in the DRBD storage by using LINSTOR storage plugin. Please make sure to set the correct VM ID in the "controllervm" setting. In this case is set to "100", which represents the ID assigned to our Controller VM.

It is very important to make sure that the Controller VM is up and running at all times and that you are backing it up at regular times(mostly when you do modifications to the LINSTOR cluster). Once the VM is gone, and there are no backups, the LINSTOR cluster must be recreated from scratch.

To prevent accidental deletion of the VM, you can go to the "Options" tab of the VM, in the PVE GUI and enable the "Protection" option. If however you accidentally deleted the VM, such requests are ignored by our storage plugin, so the VM disk will NOT be deleted from the LINSTOR cluster. Therefore, it is possible to recreate the VM with the same ID as before(simply recreate the VM configuration file in PVE and assign the same DRBD storage device used by the old VM). The plugin will just return "OK", and the old VM with the old data can be used again. In general, be careful to not delete the controller VM and "protect" it accordingly.

Currently, we have the controller executed as VM, but we should make sure that one instance of the VM is started at all times. For that we use Proxmox’s HA feature. Click on the VM, then on "More", and then on "Manage HA". We set the following parameters for our controller VM:

pm manage ha controller vm
Figure 4. HA settings for the controller VM

As long as there are surviving nodes in your Proxmox cluster, everything should be fine and in case the node hosting the Controller VM is shut down or lost, Proxmox HA will make sure the controller is started on another host. Obviously the IP of the controller VM should not change. It is up to you as an administrator to make sure this is the case (e.g., setting a static IP, or always providing the same IP via dhcp on the bridged interface).

It is important to mention at this point that in the case that you are using a dedicated network for the LINSTOR cluster, you must make sure that the network interfaces configured for the cluster traffic, are configured as bridges (i.e vmb1,vmbr2 etc) on the PVE hosts. If they are setup as direct interfaces (i.e eth0,eth1 etc), then you will not be able to setup the Controller VM vNIC to communicate with the rest of LINSTOR nodes in the cluster, as you cannot assign direct network interfaces to the VM, but only bridged interfaces.

One limitation that is not fully handled with this setup is a total cluster outage (e.g., common power supply failure) with a restart of all cluster nodes. Proxmox is unfortunately pretty limited in this regard. You can enable the "HA Feature" for a VM, and you can define "Start and Shutdown Order" constraints. But both are completely separated from each other. Therefore it is hard/impossible to guarantee that the Controller VM will be up and running, before all other VMs are started.

It might be possible to work around that by delaying VM startup in the Proxmox plugin itself until the controller VM is up (i.e., if the plugin is asked to start the controller VM it does it, otherwise it waits and pings the controller). While a nice idea, this would horribly fail in a serialized, non-concurrent VM start/plugin call event stream where some VM should be started (which then are blocked) before the Controller VM is scheduled to be started. That would obviously result in a deadlock.

We will discuss these options with Proxmox, but we think the current solution is valuable in most typical use cases, as is. Especially, compared to the complexity of a pacemaker setup. Use cases where one can expect that not the whole cluster goes down at the same time are covered. And even if that is the case, only automatic startup of the VMs would not work when the whole cluster is started. In such a scenario the admin just has to wait until the Proxmox HA service starts the controller VM. After that all VMs can be started manually/scripted on the command line.

5. LINSTOR Volumes in OpenNebula

This chapter describes DRBD in OpenNebula via the usage of the LINSTOR storage driver addon.

Detailed installation and configuration instructions and be found in the README.md file of the driver’s source.

5.1. OpenNebula Overview

OpenNebula is a flexible and open source cloud management platform which allows its functionality to be extended via the use of addons.

The LINSTOR addon allows the deployment of virtual machines with highly available images backed by DRBD and attached across the network via DRBD’s own transport protocol.

5.2. OpenNebula addon Installation

Installation of the LINSTOR storage addon for OpenNebula requires a working OpenNebula cluster as well as a working LINSTOR cluster.

With access to LINBIT’s customer repositories you can install the linstor-opennebula with

# apt install linstor-opennebula

or

# yum install linstor-opennebula

Without access to LINBIT’s prepared packages you need to fall back to instructions on it’s GitHub page.

A DRBD cluster with LINSTOR can be installed and configured by following the instructions in this guide, see Initializing your cluster.

The OpenNebula and DRBD clusters can be somewhat independent of one another with the following exception: OpenNebula’s Front-End and Host nodes must be included in both clusters.

Host nodes do not need a local LINSTOR storage pools, as virtual machine images are attached to them across the network [1].

5.3. Deployment Options

It is recommended to use LINSTOR resource groups to configure the deployment how you like it, see OpenNebula resource group. Previous auto-place and deployment nodes modes are deprecated.

5.4. Configuration

5.4.1. Adding the driver to OpenNebula

Modify the following sections of /etc/one/oned.conf

Add linstor to the list of drivers in the TM_MAD and DATASTORE_MAD sections:

TM_MAD = [
  executable = "one_tm",
  arguments = "-t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,linstor"
]
DATASTORE_MAD = [
    EXECUTABLE = "one_datastore",
    ARGUMENTS  = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter,linstor -s shared,ssh,ceph,fs_lvm,qcow2,linstor"

Add new TM_MAD_CONF and DS_MAD_CONF sections:

TM_MAD_CONF = [
    NAME = "linstor", LN_TARGET = "NONE", CLONE_TARGET = "SELF", SHARED = "yes", ALLOW_ORPHANS="yes",
    TM_MAD_SYSTEM = "ssh,shared", LN_TARGET_SSH = "NONE", CLONE_TARGET_SSH = "SELF", DISK_TYPE_SSH = "BLOCK",
    LN_TARGET_SHARED = "NONE", CLONE_TARGET_SHARED = "SELF", DISK_TYPE_SHARED = "BLOCK"
]
DS_MAD_CONF = [
    NAME = "linstor", REQUIRED_ATTRS = "BRIDGE_LIST", PERSISTENT_ONLY = "NO",
    MARKETPLACE_ACTIONS = "export"
]

After making these changes, restart the opennebula service.

5.4.2. Configuring the Nodes

The Front-End node issues commands to the Storage and Host nodes via Linstor

Storage nodes hold disk images of VMs locally.

Host nodes are responsible for running instantiated VMs and typically have the storage for the images they need attached across the network via Linstor diskless mode.

All nodes must have DRBD9 and Linstor installed. This process is detailed in the User’s Guide for DRBD9

It is possible to have Front-End and Host nodes act as storage nodes in addition to their primary role as long as they the meet all the requirements for both roles.

Front-End Configuration

Please verify that the control node(s) that you hope to communicate with are reachable from the Front-End node. linstor node list for locally running Linstor controllers and linstor --controllers "<IP:PORT>" node list for remotely running Linstor Controllers is a handy way to test this.

Host Configuration

Host nodes must have Linstor satellite processes running on them and be members of the same Linstor cluster as the Front-End and Storage nodes, and may optionally have storage locally. If the oneadmin user is able to passwordlessly ssh between hosts then live migration may be used with the even with the ssh system datastore.

Storage Node Configuration

Only the Front-End and Host nodes require OpenNebula to be installed, but the oneadmin user must be able to passwordlessly access storage nodes. Refer to the OpenNebula install guide for your distribution on how to manually configure the oneadmin user account.

The Storage nodes must use storage pools created with a driver that’s capable of making snapshots, such as the thin LVM plugin.

In this example preparation of thinly-provisioned storage using LVM for Linstor, you must create a volume group and thinLV using LVM on each storage node.

Example of this process using two physical volumes (/dev/sdX and /dev/sdY) and generic names for the volume group and thinpool. Make sure to set the thinLV’s metadata volume to a reasonable size, once it becomes full it can be difficult to resize:

pvcreate /dev/sdX /dev/sdY
vgcreate drbdpool /dev/sdX /dev/sdY
lvcreate -l 95%VG --poolmetadatasize 8g -T /dev/drbdpool/drbdthinpool

Then you’ll create storage pool(s) on Linstor using this as the backing storage.

5.4.3. Permissions for Oneadmin

The oneadmin user must have passwordless sudo access to the mkfs command on the Storage nodes

oneadmin ALL=(root) NOPASSWD: /sbin/mkfs
Groups

Be sure to consider the groups that oneadmin should be added to in order to gain access to the devices and programs needed to access storage and instantiate VMs. For this addon, the oneadmin user must belong to the disk group on all nodes in order to access the DRBD devices where images are held.

usermod -a -G disk oneadmin

5.4.4. Creating a New Linstor Datastore

Create a datastore configuration file named ds.conf and use the onedatastore tool to create a new datastore based on that configuration. There are two mutually exclusive deployment options: LINSTOR_AUTO_PLACE and LINSTOR_DEPLOYMENT_NODES. If both are configured, LINSTOR_AUTO_PLACE is ignored. For both of these options, BRIDGE_LIST must be a space separated list of all storage nodes in the Linstor cluster.

5.4.5. OpenNebula resource group

Since version 1.0.0 LINSTOR supports resource groups. A resource group is a centralized point for settings that all resources linked to that resource group share.

Create a resource group and volume group for your datastore, here we create one with 2 node redundency and use a created opennebula-storagepool:

linstor resource-group create OneRscGrp --place-count 2 --storage-pool opennebula-storagepool
linstor volume-group create

Now add a OpenNebula datastore using the LINSTOR plugin:

cat >ds.conf <<EOI
NAME = linstor_datastore
DS_MAD = linstor
TM_MAD = linstor
TYPE = IMAGE_DS
DISK_TYPE = BLOCK
LINSTOR_RESOURCE_GROUP = "OneRscGrp"
COMPATIBLE_SYS_DS = 0
BRIDGE_LIST = "alice bob charlie"  #node names
EOI

onedatastore create ds.conf

5.4.6. Plugin attributes

LINSTOR_CONTROLLERS

LINSTOR_CONTROLLERS can be used to pass a comma separated list of controller ips and ports to the Linstor client in the case where a Linstor controller process is not running locally on the Front-End, e.g.:

LINSTOR_CONTROLLERS = "192.168.1.10:8080,192.168.1.11:6000"

LINSTOR_CLONE_MODE

Linstor supports 2 different clone modes and are set via the LINSTOR_CLONE_MODE attribute:

  • snapshot

The default mode is snapshot it uses a linstor snapshot and restores a new resource from this snapshot, which is then a clone of the image. This mode is usually faster than using the copy mode as snapshots are cheap copies.

  • copy

The second mode is copy it creates a new resource with the same size as the original and copies the data with dd to the new resource. This mode will be slower than snapshot, but is more robust as it doesn’t rely on any snapshot mechanism, it is also used if you are cloning an image into a different linstor datastore.

5.4.7. Deprecated attributes

The following attributes are deprecated and will be removed in version after the 1.0.0 release.

LINSTOR_STORAGE_POOL

LINSTOR_STORAGE_POOL attribute is used to select the LINSTOR storage pool your datastore should use. If resource groups are used this attribute isn’t needed as the storage pool can be select by the auto select filter options. If LINSTOR_AUTO_PLACE or LINSTOR_DEPLOYMENT_NODES is used and LINSTOR_STORAGE_POOL is not set, it will fallback to the DfltStorPool in LINSTOR.

LINSTOR_AUTO_PLACE

The LINSTOR_AUTO_PLACE option takes a level of redundancy which is a number between one and the total number of storage nodes. Resources are assigned to storage nodes automatically based on the level of redundancy.

LINSTOR_DEPLOYMENT_NODES

Using LINSTOR_DEPLOYMENT_NODES allows you to select a group of nodes that resources will always be assigned to. Please note that the bridge list still contains all of the storage nodes in the Linstor cluster.

5.4.8. LINSTOR as system datastore

Linstor driver can also be used as a system datastore, configuration is pretty similar to normal datastores, with a few changes:

cat >system_ds.conf <<EOI
NAME = linstor_system_datastore
TM_MAD = linstor
TYPE = SYSTEM_DS
LINSTOR_RESOURCE_GROUP = "OneSysRscGrp"
BRIDGE_LIST = "alice bob charlie"  # node names
EOI

onedatastore create system_ds.conf

Also add the new sys datastore id to the COMPATIBLE_SYS_DS to your image datastores (COMMA separated), otherwise the scheduler will ignore them.

If you want live migration with volatile disks you need to enable the --unsafe option for KVM, see: opennebula-doc

5.5. Live Migration

Live migration is supported even with the use of the ssh system datastore, as well as the nfs shared system datastore.

5.6. Free Space Reporting

Free space is calculated differently depending on whether resources are deployed automatically or on a per node basis.

For datastores which place per node, free space is reported based on the most restrictive storage pools from all nodes where resources are being deployed. For example, the capacity of the node with the smallest amount of total storage space is used to determine the total size of the datastore and the node with the least free space is used to determine the remaining space in the datastore.

For a datastore which uses automatic placement, size and remaining space are determined based on the aggregate storage pool used by the datastore as reported by LINSTOR.

6. LINSTOR volumes in Openstack

This chapter describes DRBD in Openstack for persistent, replicated, and high-performance block storage with LINSTOR Driver.

6.1. Openstack Overview

Openstack consists of a wide range of individual services; the two that are mostly relevant to DRBD are Cinder and Nova. Cinder is the block storage service, while Nova is the compute node service that’s responsible for making the volumes available for the VMs.

The LINSTOR driver for OpenStack manages DRBD/LINSTOR clusters and makes them available within the OpenStack environment, especially within Nova compute instances. LINSTOR-backed Cinder volumes will seamlessly provide all the features of DRBD/LINSTOR while allowing OpenStack to manage all their deployment and management. The driver will allow OpenStack to create and delete persistent LINSTOR volumes as well as managing and deploying volume snapshots and raw volume images.

Aside from using the kernel-native DRBD protocols for replication, the LINSTOR driver also allows using iSCSI with LINSTOR cluster(s) to provide maximum compatibility. For more information on these two options, please see Choosing the Transport Protocol.

6.2. LINSTOR for Openstack Installation

An initial installation and configuration of DRBD and LINSTOR must be completed prior to installing OpenStack driver. Each LINSTOR node in a cluster should also have a Storage Pool defined as well. Details about LINSTOR installation can be found here.

6.2.1. Here’s a synopsis on quickly setting up a LINSTOR cluster on Ubuntu:

Install DRBD and LINSTOR on Cinder node as a LINSTOR Controller node:
# First, set up LINBIT repository per support contract

# Install DRBD and LINSTOR packages
sudo apt update
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-controller linstor-satellite linstor-client
sudo apt install -y drbdtop

# Start both LINSTOR Controller and Satellite Services
systemctl enable linstor-controller.service
systemctl start linstor-controller.service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# For Diskless Controller, skip the following two 'sudo' commands

# For Diskful Controller, create backend storage for DRBD/LINSTOR by creating
# a Volume Group 'drbdpool' and specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool
OpenStack measures storage size in GiBs.
Install DRBD and LINSTOR on other node(s) on the LINSTOR cluster:
# First, set up LINBIT repository per support contract

# Install DRBD and LINSTOR packages
sudo apt update
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-satellite
sudo apt install -y drbdtop

# Start only the LINSTOR Satellite service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool
Lastly, from the Cinder node, create LINSTOR Satellite Node(s) and Storage Pool(s)
# Create a LINSTOR cluster, including the Cinder node as one of the nodes
# For each node, specify node name, its IP address, volume type (diskless) and
# volume location (drbdpool/thinpool)

# Create the controller node as combined controller and satellite node
linstor node create cinder-node-name 192.168.1.100 --node-type Combined

# Create the satellite node(s)
linstor node create another-node-name 192.168.1.101
# repeat to add more satellite nodes in the LINSTOR cluster

# Create LINSTOR Storage Pool on each nodes
# For each node, specify node name, its IP address,
# storage pool name (DfltStorPool),
# volume type (diskless / lvmthin) and node type (Combined)

# Create diskless Controller node on the Cinder controller
linstor storage-pool create diskless cinder-node-name DfltStorPool

# Create diskful Satellite nodes
linstor storage-pool create lvmthin another-node-name DfltStorPool drbdpool/thinpool
# repeat to add a storage pool to each node in the LINSTOR cluster

6.2.2. Install the LINSTOR driver file

The linstor driver will be officially available starting OpenStack Stein release. The latest release is located at LINBIT OpenStack Repo. It is a single Python file called linstordrv.py. Depending on your OpenStack installation, its destination may vary.

Place the driver ( linstordrv.py ) in an appropriate location within your OpenStack Cinder node.

For Devstack:

/opt/stack/cinder/cinder/volume/drivers/linstordrv.py

For Ubuntu:

/usr/lib/python2.7/dist-packages/cinder/volume/drivers/linstordrv.py

For RDO Packstack:

/usr/lib/python2.7/site-packages/cinder/volume/drivers/linstordrv.py

6.3. Cinder Configuration for LINSTOR

6.3.1. Edit Cinder configuration file cinder.conf in /etc/cinder/ as follows:

Enable LINSTOR driver by adding 'linstor' to enabled_backends
[DEFAULT]
...
enabled_backends=lvm, linstor
...
Add the following configuration options at the end of the cinder.conf
[linstor]
volume_backend_name = linstor
volume_driver = cinder.volume.drivers.linstordrv.LinstorDrbdDriver
linstor_default_volume_group_name=drbdpool
linstor_default_uri=linstor://localhost
linstor_default_storage_pool_name=DfltStorPool
linstor_default_resource_size=1
linstor_volume_downsize_factor=4096

6.3.2. Update Python python libraries for the driver

sudo pip install google --upgrade
sudo pip install protobuf --upgrade
sudo pip install eventlet --upgrade

6.3.3. Create a new backend type for LINSTOR

Run these commands from the Cinder node once environment variables are configured for OpenStack command line operation.

cinder type-create linstor
cinder type-key linstor set volume_backend_name=linstor

6.3.4. Restart the Cinder services to finalize

For Devstack:

sudo systemctl restart devstack@c-vol.service
sudo systemctl restart devstack@c-api.service
sudo systemctl restart devstack@c-sch.service

For RDO Packstack:

sudo systemctl restart openstack-cinder-volume.service
sudo systemctl restart openstack-cinder-api.service
sudo systemctl restart openstack-cinder-scheduler.service

For full OpenStack:

sudo systemctl restart cinder-volume.service
sudo systemctl restart cinder-api.service
sudo systemctl restart cinder-scheduler.service

6.3.5. Verify proper installation:

Once the Cinder service is restarted, a new Cinder volume with LINSTOR backend may be created using the Horizon GUI or command line. Use following as a guide for creating a volume with the command line.

# Check to see if there are any recurring errors with the driver.
# Occasional 'ERROR' keyword associated with the database is normal.
# Use Ctrl-C to stop the log output to move on.
sudo systemctl -f -u devstack@c-* | grep error

# Create a LINSTOR test volume.  Once the volume is created, volume list
# command should show one new Cinder volume.  The 'linstor' command then
# should list actual resource nodes within the LINSTOR cluster backing that
# Cinder volume.
openstack volume create --type linstor --size 1 --availability-zone nova linstor-test-vol
openstack volume list
linstor resource list

6.3.6. Additional Configuration

More to come

6.4. Choosing the Transport Protocol

There are two main ways to run DRBD/LINSTOR with Cinder:

These are not exclusive; you can define multiple backends, have some of them use iSCSI, and others the DRBD protocol.

6.4.1. iSCSI Transport

The default way to export Cinder volumes is via iSCSI. This brings the advantage of maximum compatibility - iSCSI can be used with every hypervisor, be it VMWare, Xen, HyperV, or KVM.

The drawback is that all data has to be sent to a Cinder node, to be processed by an (userspace) iSCSI daemon; that means that the data needs to pass the kernel/userspace border, and these transitions will cost some performance.

6.4.2. DRBD/LINSTOR Transport

The alternative is to get the data to the VMs by using DRBD as the transport protocol. This means that DRBD 9[2] needs to be installed on the Cinder node as well.

Since OpenStack only functions in Linux, using DRBD/LINSTOR Transport restricts deployment only on Linux hosts with KVM at the moment.

One advantage of that solution is that the storage access requests of the VMs can be sent via the DRBD kernel module to the storage nodes, which can then directly access the allocated LVs; this means no Kernel/Userspace transitions on the data path, and consequently better performance. Combined with RDMA capable hardware you should get about the same performance as with VMs accessing a FC backend directly.

Another advantage is that you will be implicitly benefitting from the HA background of DRBD: using multiple storage nodes, possibly available over different network connections, means redundancy and avoiding a single point of failure.

Default configuration options for Cinder driver assumes the Cinder node to be a Diskless LINSTOR node. If the node is a Diskful node, please change the 'linstor_controller_diskless=True' to 'linstor_controller_diskless=False' and restart the Cinder services.

6.4.3. Configuring the Transport Protocol

In the LINSTOR section in cinder.conf you can define which transport protocol to use. The initial setup described at the beginning of this chapter is set to use DRBD transport. You can configure as necessary as shown below. Then Horizon[3] should offer these storage backends at volume creation time.

  • To use iSCSI with LINSTOR:

        volume_driver=cinder.volume.drivers.drbdmanagedrv.DrbdManageIscsiDriver
  • To use DRBD Kernel Module with LINSTOR:

        volume_driver=cinder.volume.drivers.drbdmanagedrv.DrbdManageDrbdDriver

The old class name "DrbdManageDriver" is being kept for the time because of compatibility reasons; it’s just an alias to the iSCSI driver.

To summarize:

  • You’ll need the LINSTOR Cinder driver 0.1.0 or later, and LINSTOR 0.6.5 or later.

  • The DRBD transport protocol should be preferred whenever possible; iSCSI won’t offer any locality benefits.

  • Take care to not run out of disk space, especially with thin volumes.

7. LINSTOR Volumes in Docker

This chapter describes LINSTOR volumes in Docker as managed by the LINSTOR Docker Volume Plugin.

7.1. Docker Overview

Docker is a platform for developing, shipping, and running applications in the form of Linux containers. For stateful applications that require data persistence, Docker supports the use of persistent volumes and volume_drivers.

The LINSTOR Docker Volume Plugin is a volume driver that provisions persistent volumes from a LINSTOR cluster for Docker containers.

7.2. LINSTOR Plugin for Docker Installation

To install the linstor-docker-volume package provided by LINBIT, you’ll need to have LINBIT’s drbd-9.0 repository enabled. Once enabled, use your package manager to install the linstor-docker-volume package:

# yum install linstor-docker-volume

  -- OR --

# apt install linstor-docker-volume

Alternatively, you can build and install from source:

# git clone https://github.com/LINBIT/linstor-docker-volume
# cd ./linstor-docker-volume
# make
  ..snip..
# make doc
  ..snip..
# make install
  ..snip..

Once installed, enable and start the linstor-docker-volume.socket and linstor-docker-volume.service via systemd:

# systemctl enable linstor-docker-volume.socket linstor-docker-volume.service --now

7.3. LINSTOR Plugin for Docker Configuration

As the plugin has to communicate to the LINSTOR controller via the LINSTOR python library, we must tell the plugin where to find the LINSTOR Controller node in it’s configuration file:

# cat /etc/linstor/docker-volume.conf
[global]
controllers = linstor://hostnameofcontroller

It is also possible to set all of the command line options found in man linstor-docker-volume in the docker-volume.conf. For example:

# cat /etc/linstor/docker-volume.conf
[global]
storagepool = thin-lvm
fs = ext4
fsopts = -E discard
size = 100MB
replicas = 2
nodes = alpha,bravo,charlie
If neither replicas or nodes is defined, the default of replicas will be used, which is two diskful replicas. If both replicas and nodes are defined, nodes will be used and replicas will be ignored.

7.4. Example Usage

The following are some examples of how you might use the LINSTOR Docker Volume Plugin. In the following we expect a cluster consisting of three nodes (alpha, bravo, and charlie).

7.4.1. Example 1 - typical docker pattern

On node alpha:

$ docker volume create -d linstor \
             --opt fs=xfs --opt size=200 lsvol
$ docker run -it --rm --name=cont \
             -v lsvol:/data --volume-driver=linstor busybox sh
$ root@cont: echo "foo" > /data/test.txt
$ root@cont: exit

On node bravo:

$ docker run -it --rm --name=cont \
             -v lsvol:/data --volume-driver=linstor busybox sh
$ root@cont: cat /data/test.txt
  foo
$ root@cont: exit
$ docker volume rm lsvol

7.4.2. Example 2 - one diskfull assignment by name, two nodes diskless

$ docker volume create -d linstor --opt hosts=bravo lsvol

7.4.3. Example 3 - one diskfull assignment, no matter where, two nodes diskless

$ docker volume create -d linstor --opt replicas=1 lsvol

7.4.4. Example 4 - two diskfull assignments by name, charlie diskless

$ docker volume create -d linstor --opt hosts=alpha,bravo lsvol

7.4.5. Example 5 - two diskfull assignments, no matter where, one node diskless

$ docker volume create -d linstor --opt replicas=2 lsvol

7.4.6. Example 6 - using curl without docker, useful for developers

$ curl --unix-socket /run/docker/plugins/linstor.sock \
          -X POST -H "Content-Type: application/json"
          -d '{"Name":"linstorvol"}' http://localhost/VolumeDriver.List

Appendices

8. DRBD Manage

DRBD Manage will reach its EoL end of 2018 and will be replaced by LINSTOR.

DRBD Manage is an abstraction layer which takes over management of logical volumes (LVM) and management of configuration files for DRBD. Features of DRBD Manage include creating, resizing, and removing of replicated volumes. Additionally, DRBD Manage handles taking snapshots and creating volumes in consistency groups.

This chapter outlines typical administrative tasks encountered during day-to-day operations. It does not cover troubleshooting tasks, these are covered in detail in [ch-troubleshooting]. If you plan to use 'LVM' as storage plugin, please see read section Configuring LVM now, and then return to this point.

8.1. Migrating resources from DRBDManage to LINSTOR

The LINSTOR client contains a sub-command that can generate a migration script that adds existing DRBDManage nodes and resources to a LINSTOR cluster. Migration can be done without downtime. If you do not plan to migrate existing resources, continue with the next section.

The first thing to check is if the DRBDManage cluster is in a healthy state. If the output of drbdmanage assignments looks good, you can export the existing cluster database via drbdmanage export-ctrlvol > ctrlvol.json. You can then use that as input for the LINSTOR client. The client does not immediately migrate your resources, it just generates a shell script. Therefore, you can run the migration assistant multiple times and review/modify the generated shell script before actually executing it. Migration script generation is started via linstor dm-migrate ctrlvol.json dmmmigrate.sh. The script will ask a few questions and then generate the shell script. After carefully reading the script, you can then shutdown DRBDManage, and rename the following files. If you do not rename them, the lower-level drbd-utils pick up both kinds of resource files, the ones from DRBDManage and the ones from LINSTOR.

Obviously, you need the linstor-controller service started on one node and the linstor-satellite service on all nodes.

# drbdmanage shutdown -qc # on all nodes
# mv /etc/drbd.d/drbdctrl.res{,.dis} # on all nodes
# mv /etc/drbd.d/drbdmanage-resources.res{,.dis} # on all nodes
# bash dmmigrate.sh

8.2. Initializing your cluster

We assume that the following steps are accomplished on all cluster nodes:

  1. The DRBD9 kernel module is installed and loaded

  2. drbd-utils are installed

  3. LVM tools are installed

  4. drbdmanage and its dependencies are installed

Note that drbdmanage uses dbus-activation to start its server component when necessary, do not start the server manually.

The first step is to review the configuration file of drbdmanage (/etc/drbdmanaged.cfg) and to create an LVM volume group with the name specified in the configuration. In the following we use the default name, which is drbdpool, and assume that the volume group consists of /dev/sda6 and /dev/sda7. Creating the volume group is a step that has to be executed on every cluster node:

# vgcreate drbdpool /dev/sda6 /dev/sda7

The second step is to initialize the so called control volume, which is then used to redundantly store your cluster configuration. If the node has multiple interfaces, you have to specify the IP address of the network interface that DRBD should use to communicate with other nodes in the cluster, otherwise the IP is optional. This step must only be done on exactly one cluster node.

# drbdmanage init 10.43.70.2

We recommend using 'drbdpool' as the name of your LVM volume group as it is the default value and makes your administration life easier. If, for whatever reason, you decide to use a different name, make sure that the option drbdctrl-vg is set accordingly in /etc/drbdmanaged.cfg. Configuration will be discussed in Cluster configuration.

8.3. Adding nodes to your cluster

Adding nodes to your cluster is easy and requires a single command with two parameters:

  1. A node name which must match the output of uname -n

  2. The IP address of the node.

Note

If DNS is configured properly, the tab-completion of drbdmanage is able to complete the IP of the given node name.

# drbdmanage add-node bravo 10.43.70.3

Here we assume that the command was executed on node 'alpha'. If the 'root' user is allowed to execute commands as 'root' on 'bravo' via ssh, then the node 'bravo' will automatically join your cluster.

If ssh access with public-key authentication is not possible, drbdmanage will print a join command that has to be executed on node 'bravo'. You can always query drbdmanage to output the join command for a specific node:

# drbdmanage howto-join bravo
# drbdmanage join -p 6999 10.43.70.3 1 alpha 10.43.70.2 0 cOQutgNMrinBXb09A3io

8.3.1. Types of DRBD Manage nodes

There are quite a few different types of DRBD Manage nodes; please see the diagram below.

drbdmanage venn
Figure 5. DRBD Manage node types

The rational behind the different types of nodes is as follows: Currently, a DRBD9/DRBD Manage cluster is limited to ~30 nodes. This is the current DRBD9 limit of nodes for a replicated resource. As DRBD Manage uses a DRBD volume itself (i.e., the control volume) to distribute the cluster information, DRBD Manage was also limited by the maximum number of DRBD9 nodes per resource.

The satellites concept relaxes that limit by splitting the cluster into:

  • Control nodes: Nodes having direct access to the control volume.

  • Satellite nodes: Nodes that are not directly connected to to the control volume, but are able to receive the content of the control volume via a control node.

In a cluster there is one special node, which we call the "leader". The leader is selected from the set of control nodes and it is the only node that writes data to the control volume (i.e., it has the control volume in DRBD Primary role). All the other control nodes in the cluster automatically switch their role to a "satellite node" and receive their cluster information via TCP/IP, like ordinary satellite nodes. If the current leader node fails, the cluster automatically selects a new leader node among the control nodes.

Control nodes have:

  • Direct access to the control volume

  • One of them is in the leader role, the rest act like satellite nodes.

  • Local storage if it is a normal control node

  • No local storage if it is a pure controller

Satellite nodes have:

  • No direct access to the control volume, they receive a copy of the cluster configuration via TCP/IP.

  • Local storage if it is a normal satellite node

  • No local storage if it is a pure client

satellitecluster
Figure 6. Cluster consisting of satellite nodes

External nodes:

  • Have no access to the control volume at all (no dedicated TCP/IP connection to a control node) and no local storage

  • Gets its configuration via a different channel (e.g., DRBD configuration via scp)

  • These are not the droids you are looking for, if you are not sure if you want to use that type of nodes.

8.3.2. Adding a control node

# drbdmanage add-node bravo 10.43.70.3

8.3.3. Adding a pure controller node

# drbdmanage add-node --no-storage bravo 10.43.70.3

8.3.4. Adding a satellite node

Here we assume that the node charlie was not added to the cluster so far. The following command adds charlie as a satellite node.

# drbdmanage add-node --satellite charlie 10.43.70.4

8.3.5. Adding a pure client node

# drbdmanage add-node --satellite --no-storage charlie 10.43.70.4

8.3.6. Adding an external node

# drbdmanage add-node --external delta 10.43.70.5

8.4. Cluster configuration

Drbdmanage knows many configuration settings like the log-level or the storage plugin that should be used (i.e., LVM, ThinLV, ThinPool, ZPool, or ThinZpool). Executing drbdmanage modify-config starts an editor that is used to specify theses settings. The configuration is split in several sections. If an option is specified in the [GLOBAL] section, this setting is used in the entire cluster. Additionally, it is possible to specify settings per node and per site. Node sections follow a syntax of [Node:nodename]. If an option is set globally and per node, the node setting overrules the global setting.

It is also possible to group nodes into sites. In order to make node 'alpha' part of site 'mysite', you have to specify the 'site' option in alpha’s node section:

# drbdmanage modify-config
[Node:alpha]
site = mysite

It is then also possible to specify drbdmanage settings per site using [Site:] sections. Lets assume that you want to set the 'loglevel' option in general to 'INFO', for site 'mysite' to 'WARN' and for node alpha, which is also part of site 'mysite' to DEBUG. This would result in the following configuration:

# drbdmanage modify-config
[GLOBAL]
loglevel = INFO

[Site:mysite]
loglevel = WARN

[Node:alpha]
site = mysite
loglevel = DEBUG

By executing drbdmanage modify-config without any options, you can edit global, per site and per node settings. It is also possible to execute 'modify-config' for a specific node. In this per-node view, it is possible to set further per-node specific settings like the storage plugin discussed in Configuring storage plugins.

8.5. Configuring storage plugins

Storage plugins are per node settings that are set with the help of the 'modify-config' sub command.

Lets assume you want to use the 'ThinLV' plugin for node 'bravo', where you want to set the 'pool-name' option to 'mythinpool':

# drbdmanage modify-config --node bravo
[GLOBAL]
loglevel = INFO

[Node:bravo]
storage-plugin = drbdmanage.storage.lvm_thinlv.LvmThinLv

[Plugin:ThinLV]
pool-name = mythinpool

8.5.1. Configuring LVM

More recent versions of the 'LVM tools' support detecting of file system signatures. Unfortunately the feature set of lvcreate varies a lot between distributions: Some of them support --wipesignatures, some support --yes, and that in all possible combinations. None of them supports a generic force flag. If lvcreate detects an existing file system signature, it prompts for input and therefore halts processing. If you use modern 'LVM tools', set this option in /etc/lvm/lvm.conf: wipe_signatures_when_zeroing_new_lvs = 0. Drbdmanage itself executes wipefs on created block devices.

If you use a version of 'LVM' where resources from snapshots are not activated, which we saw for the 'LvmThinPool' plugin, also set auto_set_activation_skip = 0 in /etc/lvm/lvm.conf.

8.5.2. Configuring ZFS

For ZFS the same configuration steps apply, like setting the 'storage-plugin' for the node that should make use of ZFS volumes. Please note that we don’t make use of ZFS as a file system, but of ZFS as a logical volume manager. The admin is then free to create any file system she/he desires on top of the DRBD device backed by a ZFS volume. It is also important to note that if you make use of the ZFS plugin, all DRBD resources are created on ZFS, but in case this node is a control node, it still needs LVM for it’s control volume.

In the most common case only the following steps are necessary.

# zpool create drbdpool /dev/sdX /dev/sdY
# drbdmanage modify-config --node bravo
[Node:bravo]
storage-plugin = drbdmanage.storage.zvol2.Zvol2
Currently it is not supported to switch storage plugins on the fly. The workflow is: Add a new node, modify the configuration for that node, make use of the node. Changing other settings (like the log-level) on the fly is perfectly fine.

8.5.3. Discussion of the storage plugins

DRBD Manage has four supported storage plugins as of this writing:

  • Thick LVM (drbdmanage.storage.lvm.Lvm);

  • Thin LVM with a single thin pool (drbdmanage.storage.lvm_thinlv.LvmThinLv)

  • Thin LVM with thin pools for each volume (drbdmanage.storage.lvm_thinpool.LvmThinPool)

  • Thick ZFS (drbdmanage.storage.zvol2.Zvol2)

  • Thin ZFS (drbdmanage.storage.zvol2_thinlv.ZvolThinLv2)

For ZFS also legacy plugins (without the "2") exist. New users, and users that did not uses ZFS snapshots should use/switch to the newer version. An on-the-fly storage plugin switch is supported in this particular case.

Here’s a short discussion of the relative advantages and disadvantages of these plugins.

Table 1. DRBD Manage storage plugins, comparison
Topic lvm.Lvm lvm_thinlv.LvmThinLv lvm_thinpool.LvmThinPool

Pools

the VG is the pool

a single Thin pool

one Thin pool for each volume

Free Space reporting

Exact

Free space goes down as per written data and snapshots, needs monitoring

Each pool carves some space out of the VG, but still needs to be monitored if snapshots are used

Allocation

Fully pre-allocated

thinly allocated, needs nearly zero space initially

Snapshots

 — not supported — 

Fast, efficient (copy-on-write)

Stability

Well established, known code, very stable

Some kernel versions have bugs re Thin LVs, destroying data

Recovery

Easiest - text editor, and/or lvm configuration archives in /etc/lvm/, in the worst case dd with offset/length

All data in one pool, might incur running thin_check across everything (needs CPU, memory, time)

Independent Pools, so not all volumes damaged at the same time, faster thin_check (less CPU, memory, time)

8.6. Creating and deploying resources/volumes

In the following scenario we assume that the goal is to create a resource 'backups' with a size of '500 GB' that is replicated among 3 cluster nodes. First we show how to achieve the goal in individual steps, and then show a short-cut how to achieve it in a single step:

First, we create a new resource:

# drbdmanage add-resource backups

Second, we create a new volume within that resource:

# drbdmanage add-volume backups 500GB

In case we would not have used 'add-resource' in the first step, drbdmanage would have known that the resource did not exist and it would have created it.

The third step is to deploy the resource to 3 cluster nodes:

# drbdmanage deploy-resource backups 3

In this case drbdmanage chooses 3 nodes that fit all requirements best, which is by default the set of nodes with the most free space in the drbdpool volume group. We will see how to manually assign resources to specific nodes in a moment.

As deploying a new resource/volume to a set of nodes is a very common task, drbdmanage provides the following short-cut:

# drbdmanage add-volume backups 500GB --deploy 3

Manual deployment can be achieved by assigning a resource to specific nodes. For example if you decide to assign the 'backups' resource to 'bravo' and 'charlie', you should execute the following steps:

# drbdmanage add-volume backups 500GB
# drbdmanage assign-resource backups bravo
# drbdmanage assign-resource backups charlie

8.7. Managing snapshots

In the following we assume that the ThinLV plugin is used on all nodes that have deployed resources from which snapshots should be taken. For further information on how to configure the storage plugin, please refer to Cluster configuration.

8.7.1. Creating a snapshot

Here we continue the example presented in the previous sections, namely nodes 'alpha', 'bravo', 'charlie', and 'delta' with a resource 'backups' deployed on the first three nodes. The name of the snapshot will be 'snap_backups', and we want the snapshot to be taken on nodes 'bravo' and 'charlie'.

# drbdmanage create-snapshot snap_backups backups bravo charlie

8.7.2. Restoring a snapshot

In the following we want to restore the content of the snapshot 'snap_backups' to a new resource named 'res_backup_from_snap'.

# drbdmanage restore-snapshot res_backup_from_snap backups snap_backups

This will create a new resource with the name 'res_backup_from_snap'. This resource then gets automatically deployed to these nodes where currently the resource 'backups' is deployed.

8.7.3. Removing a snapshot

An existing snapshot can be removed as follows:

# drbdmanage remove-snapshot backups snap_backups

8.8. Checking the state of your cluster

Drbdmanage provides various commands to check the state of your cluster. These commands start with a 'list-' prefix and provide various filtering and sorting options. The '--groupby' option can be used to group and sort the output in multiple dimensions. Additional output can be turned on by using the '--show' option. In the following we show some typical examples:

# drbdmanage list-nodes
# drbdmanage list-volumes --groupby Size
# drbdmanage list-volumes --groupby Size --groupby Minor
# drbdmanage list-volumes --groupby Size --show Port

8.9. Setting options for resources

Currently, it is possible to set the following drbdsetup options:

  1. net-options

  2. peer-device-options

  3. disk-options

  4. resource-options

Additionally, it is possible to set DRBD event handler.

As for example net-options are allowed in the 'common' section as well as per resource, these commands then provide the according switches.

Setting max-buffers for a resource 'backups' looks like this:

# drbdmanage net-options --max-buffers 2048 --resource backups

Setting this option in the common section looks like this:

# drbdmanage net-options --max-buffers 2048 --common

Additionally, there is always an '--unset-' option for every option that can be specified. So, unsetting max-buffers for a resource 'backups' looks like this:

# drbdmanage net-options --unset-max-buffers --resource backups

It is possible to visualize currently set options with the 'show-options' subcommand.

Setting net-options per site is also supported. Lets assume 'alpha' and 'bravo' should be part of site 'first' and 'charlie' and 'delta' should be part of site 'second'. Further, we want to use DRBD protocol 'C' within the two sites, and protocol 'A' between the sites 'first' and 'second'. This would be set up as follows:

# drbdmanage modify-config
[Node:alpha]
site = first

[Node:bravo]
site = first

[Node:charlie]
site = second

[Node:delta]
site = second
# drbdmanage net-options --protocol C --sites 'first:first'
# drbdmanage net-options --protocol C --sites 'second:second'
# drbdmanage net-options --protocol A --sites 'first:second'

The '--sites' parameter follows a 'from:to' syntax, where currently 'from' and 'to' have a symetric semantic. Setting an option from 'first:second' also sets this option from 'second:first'.

DRBD event handler can be set in the 'common' section and per resource:

# drbdmanage handlers --common --after-resync-target /path/to/script.sh
# drbdmanage handlers --common --unset-after-resync-target
# drbdmanage handlers --resource backups --after-resync-target /path/to/script.sh

8.10. Rebalancing data with DRBD Manage

Rebalancing data means moving some assignments around, to make better use of the available resources. We’ll discuss the same example as for the manual workflow.

Given is an example policy that data needs to be available on 3 nodes, so you need at least 3 servers for your setup.

Now, as your storage demands grow, you will encounter the need for additional servers. Rather than having to buy 3 more servers at the same time, you can rebalance your data across a single additional node.

rebalance
Figure 7. DRBD data rebalancing

First, you need to add the new machine to the cluster; see Adding nodes to your cluster for the commands.

The next step is to add the assignment:

# drbdmanage assign <resource> <new-node>

Now you need to wait for the (initial) sync to finish; you can eg. use the command drbdadm status with (optionally) the resource name.

One of the nodes that still has the data will show a status like

replication:SyncSource peer-disk:Inconsistent done:5.34

while the target node will have a state of SyncTarget.

When the target assignment reaches a state of UpToDate, you have a full additional copy of your data on this node; now it is safe to remove the assignment from another node:

# drbdmanage unassign <resource> <old-node>

And voilà - you moved one assignment, in two[4] easy steps!

8.11. Getting help

The easiest way to get an overview about drbdmanage’s subcommands is to read the main man-page (man drbdmanage).

A quick way to list available commands on the command line is to type drbdmanage list.

Further information on subcommands (e.g., list-nodes) can be retrieved in three ways:

# man drbdmanage-list-nodes
# drbdmanage list-nodes -h
# drbdmanage help list-nodes

Using the 'help' subcommand is especially helpful when drbdmanage is executed in interactive mode (drbdmanage interactive).

One of the most helpful features of drbdmanage is its rich tab-completion, which can be used to complete basically every object drbdmanage knows about (e.g., node names, IP addresses, resource names, …​). In the following we show some possible completions, and their results:

# drbdmanage add-node alpha 1<tab> # completes the IP address if hostname can be resolved
# drbdmanage assign-resource b<tab> c<tab> # drbdmanage assign-resource backups charlie

If tab-completion does not work out of the box, please try to source the according file:

# source /etc/bash_completion.d/drbdmanage # or
# source /usr/share/bash_completion/completions/drbdmanage

1. If a host is also a storage node, it will use a local copy of an image if that is available
2. LINSTOR must be installed on Cinder node. Please see the note at [s-openstack-linstor-drbd-external-NOTE].
3. The OpenStack GUI
4. Or three, if you count waiting for the UpToDate state.