Backup ArangoDB on Kubernetes

In a recent project, our architect decided to use ArangoDB as a database. I am not familiar with that, but it seems like one of these fancy hipster NoSQL stuff - at least it is written in node.js. But who am i to judge?

As the responsible DevOps engineer i needed a solution for backuping the database. There is this fancy Backup Operator but unfortunately it is not available for the community version of ArangoDB. Of course, my customer, being a huge IT corporation with thousands of employees, did not want to pay money for the main data holding component of this very project. A very reasonable decision. So i needed another solution to backup this.

Fortunately, the ArangoDB container image comes with a handy little binary called arangodump which, surprise surprise, dumps the content of a database to the file system. From there on, we just need to grab it, and push it to the desired destination - in our case, a S3 instance, or Object Block Storage, as the hyperscaling provider of our choice calls it.

To put everything together, i created a kubernetes cron job. It uses a dedicated config map and a dedicated secret. This is the config map:

apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-config
  namespace: arangodb
data:
  admin-user: root
  db-url: ${endpoint}
  obs-endpoint: ${obs-endpoint}
  obs-bucket: ${obs-bucket}

Fill in the variables as designed. As you can maybe see, we are using Terraform to provision all the stuff. db-url is the service URL of the ArangoDB deployment, both obs variables are for the desired backup destination.

This is the associated secret:

apiVersion: v1
kind: Secret
metadata:
  name: backup-secret
  type: Opaque
  namespace: arangodb
data:
  access-key: ${accesskey}
  secret: ${secret}
  root-password: ${root-password}

In this case, access-key and secret are the API token of a technical user of the hyperscaler which is allowed to access the block storage and bucket defined in the config map. root-password is for the database access.

The cronjob itself consists of two containers. One will dump the DB content to a temporary folder (like described), one will pick up the content and push it to the S3 instance with the help of the MinIO client. Check it out:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: arangodb-backup-cronjob
  namespace: arangodb
spec:
  schedule: "22 3 * * *"
  jobTemplate:
    spec:
      template:
        metadata:
          name: arangodb-backup-job
          namespace: arangodb
        spec:
          initContainers:
            - name: dump-create
              image: "arangodb:latest"
              args:
                - "arangodump"
                - "--server.endpoint=$(ENDPOINT)"
                - "--server.username=$(USERNAME)"
                - "--server.password=$(PASSWORD)"
                - "--all-databases"
                - "--output-directory=/tmp/dump"
                - "--overwrite"
              volumeMounts:
                - name: dump
                  mountPath: /tmp/dump
              env:
              - name: "PASSWORD"
                valueFrom:
                  secretKeyRef:
                    name: backup-secret
                    key: root-password     
              - name: "USERNAME"
                valueFrom:
                  configMapKeyRef:
                    name: backup-config
                    key: "admin-user"              
              - name: "ENDPOINT"
                valueFrom:
                  configMapKeyRef:
                    name: backup-config
                    key: db-url
          restartPolicy: OnFailure
          containers:
            - name: db-dump-upload
              image: "${docker_registry}minio/mc"
              imagePullPolicy: IfNotPresent
              command: ["/bin/sh","-c"]
              args: ["mc alias set obs $OBS_ENDPOINT $ACCESSKEY $SECRETKEY; mc mirror /tmp/dump obs/$OBS_BUCKET/$(date -I)"]
              volumeMounts:
                - name: dump
                  mountPath: /tmp/dump
              env:
              - name: SECRETKEY
                valueFrom:
                  secretKeyRef:
                    name: backup-secret
                    key: secret                   
              - name: ACCESSKEY
                valueFrom:
                  secretKeyRef:
                    name: backup-secret
                    key: access-key
              - name: OBS_ENDPOINT
                valueFrom:
                  configMapKeyRef:
                    name: backup-config
                    key: obs-endpoint
              - name: OBS_BUCKET
                valueFrom:
                  configMapKeyRef:
                    name: backup-config
                    key: obs-bucket
          volumes:
            - name: dump
              emptyDir: {}

Go ahead and change your schedule, and of course the S3 retention policies of the bucket to your need - but that's it.

/* Syntax hightlighting prism.js stuff */