Troubleshooting

The main tool used to investigate issues between Ember-CSI and the Orchestrator are OpenShift/Kubernetes status and logs.

Ember-CSI runs 2 types of services, one is the controller and the other is the node type. While the controller takes care of the management operations (create, delete, map/unmap, snapshots, etc.) the node mostly takes care of doing the local attach and detach on the hosts that are running the pods.

These services follow the CSI specification, exposing all their operations through a gRPC interface that needs to be translated into OpenShift/Kubernetes objects. The sidecars present in the Ember-CSI pods are responsible for the translation.

Status

The first thing we need to do when we encounter an issue is make sure that all the containers in the Ember-CSI pods, the driver container and the sidecars, are running and that their restart counts are not increasing.

Instead of looking at all the pod in our deployment we can use the fact that the operator adds the embercsi_cr label to filter for the pods of a specific backend:

$ # On OpenShift
$ oc get pod -n <cluster-namespace> -l embercsi_cr=<backend_name> -o wide

$ # On Kubernetes
$ kubectl get pod -n <cluster-namespace> -l embercsi_cr=<backend_name> -o wide

Or the pods for all the Ember-CSI backends:

$ oc get pod -n <cluster-namespace> -l embercsi_cr -o wide

When using an iSCSI or FC backend we need to make sure that the system daemons required for the connections are running and they are not reporting errors if we encounter issues on the following operations:

  • Creating a volume from a source (volume or snapshot): On some drivers this is not a backend assisted operation, so the resources in the backend need to be accessed in the controller node.
  • Creating or destroying a pod that uses an Ember-CSI volume:

If we are running the daemons as systemd services on baremetal, we can check them running:

$ systemctl status iscsid multipathd
$ sudo journalctl -u iscsid -u multipathd

On the other hand, if we are running the daemons in the foreground inside containers, we’ll have to check the containers status and logs themselves.

Logs

One of the most versatile tools to debug issues in general are the logs, and Ember-CSI is no different.

The logs we’ll have to check will depend on the operations that are failing:

  • If it’s creating/deleting a volume or creating/deleting a snapshot, we should look into the Ember-CSI controller pod, primarily the driver container.
  • Creating/destroying a pod that uses a volume is one of the most complex operations, and it requires the controller pod, the node pod, and the kubelet, so we’ll have to look into all their logs.

By default Ember-CSI logs will be on INFO level and they can only be changed to DEBUG when creating the Storage Backend in the Advanced Settings section:

_images/advanced-settings.png

By setting the Debug logs checkbox:

_images/01-debug-logs.png

CSC

When debugging issues on complex flows, it’s very convenient to be able to test the individual tasks that form the flows. For that purpose the Ember-CSI has created containers with the csc tool for each of the CSI specs.

The csc tool allows us to execute specific CSI operations directly against an Ember-CSI service.

For example, we could run a create volume operation completely bypassing the Orchestrator. This way we could focus on the Ember-CSI code itself and the interactions with the storage solutions, removing the interactions with other elements such as OpenShift/Kubernetes scheduler and the sidecars.

Neither Kubernetes nor OpenShift allows adding containers to a running Pod, but there is an Alpha feature called Ephemeral Containers designed for debugging purposes that can do it.

We need to have the feature gate EphemeralContainers enabled in our Orchestrator. Specifically on the API, Scheduler, and Kubelet: --feature-gates=EphemeralContainers=true.

If it’s enabled we can add an Ephemeral container with the csc command to our running pod.

For the following steps we’ll assume we have used the name example as our Backend name.

First we check the CSI version that is using Ember-CSI:

$ oc describe pod example-controller-0|grep X_CSI_SPEC_VERSION
X_CSI_SPEC_VERSION:        1.0

Now that we know we are running CSI v1.0 we know the csc container we want to use: embercsi/csc:v1.0.0

With that we can write the csc.json file to add the Ephemeral Container:

{
  "apiVersion": "v1",
  "kind": "EphemeralContainers",
  "metadata": {
    "name": "example-controller-0"
  },
  "ephemeralContainers": [
    {
      "command": ["tail"],
      "args": ["-f", "/dev/null"],
      "image": "embercsi/csc:v1.0.0",
      "imagePullPolicy": "IfNotPresent",
      "name": "csc",
      "stdin": true,
      "tty": true,
      "terminationMessagePolicy": "File",
      "env": [ {"name": "CSI_ENDPOINT",
                "value": "unix:///csi-data/csi.sock"} ],
      "volumeMounts": [
          {
              "mountPath": "/csi-data",
              "mountPropagation": "HostToContainer",
              "name": "socket-dir"
          }
      ]
    }
  ]
}

And, assuming we don’t have any other Ephemeral Containers, we add it by replace the current value:

$ oc replace --raw /api/v1/namespaces/default/pods/example-controller-0/ephemeralcontainers -f csc.json

If we don’t want to create a file we can do a one-liner by using echo a piping it to the oc replace command and setting the file contents to stdin with -f -.

Now that we have added the Ephemeral Container we can confirm it is running looking at the description of the controller pod and going to the Ephemeral Containers section and checking the State:

$ oc describe pod example-controller-0

...

Ephemeral Containers:
  csc:
    Container ID:  docker://e52d25a53af77a6f660d171504aa9dc6c2c3d405a9af20451054fadba969c84a
    Image:         embercsi/csc:v1.0.0
    Image ID:      docker-pullable://embercsi/csc@sha256:5433e0042725398b9398be1b73d43cc96c77893cf4b77cafca77001fa533cd29
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
    State:          Running
      Started:      Thu, 13 Aug 2020 14:18:23 +0000
    Ready:          False
    Restart Count:  0
    Environment:
      CSI_ENDPOINT:  unix:///csi-data/csi.sock
    Mounts:
      /csi-data from socket-dir (rw)

When we have the shell container running we can run csc commands by attaching to the shell. For example to see the help:

$ oc attach -it example-controller-0 -c csc
If you don't see a command prompt, try pressing enter.
/ # csc
NAME
    csc -- a command line container storage interface (CSI) client

SYNOPSIS
    csc [flags] CMD

AVAILABLE COMMANDS
    controller
    identity
    node

Use "csc -h,--help" for more information

Warning

Just like with normal containers, once you add an Ephemeral Container to a pod you cannot remove it, so be sure to detach from the container and not exit the shell, or the container will no longer be running and you won’t be able to use it (you cannot run exec on an Ephemeral Container).

Note

To detach from the csc container shell you must type the escape sequence Ctrl+P followed by Ctrl+Q.

CRDs

Ember-CSI uses OpenShift/Kubernets etcd service to store metadata of its resources in the form of CRDs. Existing CRDs are:

  • Volume: Stores each volume’s status as well as the information necessary to locate them in the storage solution.
  • Snapshot: Stores the information necessary to locate each snapshot in the storage solution.
  • Connection: Stores the connection information needed for a node to connect to a volume.
  • KeyValue: Stores the connector information needed to map the volumes to the nodes on the storage solution.

These CRDs are just JSON dictionaries with all the information Ember-CSI needs to operate, and in some cases it can be useful to examine them to see internal information.