Hi all!

I hope that you are doing well, safe and healthy!

In this journal, I would like to share and explain a case study related to one of the great features provided by Google Kubernetes Engine (GKE). Yup, it is a Backup for GKE. As the name suggests, this feature is intended to back up and restore the workloads in the Kubernetes cluster we have on Google Cloud.

What can be backed up by?:

  • Configuration: Kubernetes resource including manifest and the cluster state.
  • Volume backups: Application data that correspond to PersistentVolumeClaim resources

There are many possible scenarios that we can choose to do back up and restore the cluster. For example, we can only back up the configuration and restore it to newly created cluster, or we can also back up the entire cluster and restore it to the source cluster for disaster recovery that may happen. Besides that, we can also set a scheduler job to run back up automatically. It will save our lives when shocking incidents unexpectedly happen .

From the main documentation here, there are two main components to focus on. A service and an agent. A service has ability to serve as a controller for Backup for GKE service, and the agent will run automatically in cluster where backups or restores are performed. Below is the diagram architecture:

Now. In this journal, we will do a simple backup and restore scenario. I have a GKE cluster in us-central1 named gke-cluster1-a. There, I deploy a wordpress application with a persistent disk provided by PersistentVolumClaim and execute backup regularly every hour. Then I create a new GKE cluster named gke-cluster1-b, and try to restore the backup created before to test and ensure that the backup runs smoothly and successfully.

What will I backup? GKE cluster configuration and wordpress data stored on persistent disk. So, when the restore is complete, wordpress is automatically loaded and can be accessed from outside.

Okay! Let’s jump in!

Here, I will use Google Cloud SDK instead of Google Cloud Console. So, I will define the environment variables first like below:

# Define env variables
export PROJECT_ID=$(gcloud config get-value project)
export PROJECT_USER=$(gcloud config get-value core/account)
export PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
export IDNS=${PROJECT_ID}.svc.id.goog

export GCP_REGION="us-central1"
export GCP_ZONE="us-central1-c"

export NETWORK_NAME="default"

Make sure the configuration is correct, and then set the default region and zone based on the variables defined before.

gcloud config set compute/region $GCP_REGION
gcloud config set compute/zone $GCP_ZONE

We can verify with the command below:

gcloud config list

We need to enable three APIs that will be used for each service. Containers API will be used by GKE for computing, storage API will be used by GKE for persistent disk, and also GKE Backup for backup and restore tasks.

gcloud services enable compute.googleapis.com \
    container.googleapis.com \
    storage.googleapis.com \
    gkebackup.googleapis.com

We can verify the enabled services with command below:

gcloud services list --enabled

Next. We have to create a GKE cluster. I create it as a public cluster with a multi-zones configuration in us-central1 region. Do not forget to add –addons=BackupRestore to install Backup for GKE agent in this cluster.

export CLUSTER_NAME="gke-central1-a"
gcloud beta container clusters create $CLUSTER_NAME \
    --project=$PROJECT_ID  \
    --region=$LOCATION \
    --addons=BackupRestore \
    --num-nodes=1 \
    --enable-autoupgrade --no-enable-basic-auth \
    --no-issue-client-certificate --enable-ip-alias \
    --metadata disable-legacy-endpoints=true \
    --workload-pool=$IDNS  

Verify the cluster and make sure the state of all workers is Ready.

gcloud container clusters get-credentials $CLUSTER_NAME \
--region $GCP_REGION --project $PROJECT_ID

kubectl get nodes -o wide

For wordpress application, I am using the tutorial from here. You can follow along, or use your own application instead.

# create Kustomize file
cat > ./kustomization.yaml << EOF
secretGenerator:
- name: mysql-pass
  literals:
  - password=gkebackup2022
EOF

# download manifests
curl -LO https://k8s.io/examples/application/wordpress/mysql-deployment.yaml
curl -LO https://k8s.io/examples/application/wordpress/wordpress-deployment.yaml 

# update Kustomize file (note >> which appends original file)
cat >> ./kustomization.yaml << EOF
resources:
  - mysql-deployment.yaml
  - wordpress-deployment.yaml
EOF

# deploy (using built-in kustomize feature of kubectl)
kubectl apply -k ./

Make sure the pod is running, the external IP in the service appears, and the volume state is bound.

kubectl get pods
kubectl get svc
kubectl get pvc

Try to access the application from browser like below.

Now, we are going to do back up strategy. We will back up the entire kubernetes cluster environment, including data volume and secrets.

Before that, make sure the location can provide it, because not all regions can using command below:

gcloud alpha container backup-restore locations list \
    --project $PROJECT_ID

For the backup strategy, we have to create a BackupPlan first. I will create it using environment variables that defined below

# Define env variables for GKE Backup
export BACKUP_PLAN="gke-central1-a-backup"
export LOCATION="us-central1"
export CLUSTER="projects/$PROJECT_ID/locations/$GCP_REGION/clusters/$CLUSTER_NAME"
export RETAIN_DAYS="3"

And then, create a BackupPlan which will back up all namespaces, including secret and volume data. I also set the cron scheduler that will automatically back up every hour and set backup retention only for 3 days.

gcloud alpha container backup-restore backup-plans create $BACKUP_PLAN \
    --project=$PROJECT_ID \
    --location=$LOCATION \
    --cluster=$CLUSTER \
    --all-namespaces \
    --include-secrets \
    --include-volume-data \
    --cron-schedule="0 * * * *" \
    --backup-retain-days=$RETAIN_DAYS \
    --locked

Verify the backup plan newly created:

gcloud alpha container backup-restore backup-plans list \
--project=$PROJECT_ID \
--location=$LOCATION

We can also verify from the Cloud Console too,

How do we back up using BackupPlan created before? Using the command below will create a manually backup named manual-backup1 with the appropriate BackupPlan, location, and project.

export BACKUP=manual-backup1
gcloud alpha container backup-restore backups create $BACKUP \
    --project=$PROJECT_ID \
    --location=$LOCATION \
    --backup-plan=$BACKUP_PLAN \
    --wait-for-completion

Wait and ensure the backup is successful.

All right. After we do the back up the task. We need to test using restore to ensure that the backup is running correctly.

First, I will create a second GKE cluster with the same specification like the cluster one.

export CLUSTER_NAME2="gke-central1-b"
gcloud beta container clusters create $CLUSTER_NAME2 \
    --project=$PROJECT_ID  \
    --region=$LOCATION \
    --addons=BackupRestore \
    --num-nodes=1 \
    --enable-autoupgrade --no-enable-basic-auth \
    --no-issue-client-certificate --enable-ip-alias \
    --metadata disable-legacy-endpoints=true \
    --workload-pool=$IDNS   

Same as BackupPlan, I define the environment variables:

export RESTORE_LOCATION="us-central1"
export RESTORE_PLAN="gke-central1-b-restore"
export CLUSTER2="projects/$PROJECT_ID/locations/$GCP_REGION/clusters/$CLUSTER_NAME2"
export CLUSTER_RESOURCE_CONFLICT_POLICY="use-backup-version"
export NAMESPACED_RESOURCE_RESTORE_MODE="delete-and-restore"
export VOLUME_DATA_RESTORE_POLICY="restore-volume-data-from-backup"

d

And do the manual restore with appropriate restore plan and the backup created before.

export RESTORE="manual-restore1"
gcloud alpha container backup-restore restores create $RESTORE \
    --project=$PROJECT_ID \
    --location=$GCP_REGION \
    --restore-plan=$RESTORE_PLAN \
    --backup=projects/$PROJECT_ID/locations/$LOCATION/backupPlans/$BACKUP_PLAN/backups/$BACKUP \
    --wait-for-completion

Wait until the restore is successful. Verify the pods is running, PVC are bounds, and the external IP is accessible

Cheers!