Choose the backup strategy

Consider the strategies for backing up your ThoughtSpot cluster. Depending on your situation and your goals, you can choose to use either a snapshot or a backup.


ThoughtSpot Training

  • For best results when considering backup strategies and processes, we recommend that you take the following ThoughtSpot U course: Snapshots and Backups.

  • See other training resources at ThoughtSpot U.


Contact ThoughtSpot Support for help restoring from a snapshot or backup.

Snapshots

A snapshot is a point-in-time image of your running cluster. Snapshots are both taken on and restored to a cluster while it is running. Each cluster has a periodic snapshot configuration enabled by default. This configuration instructs the system to periodically take snapshots. Creation of a snapshot could take as little as 20 seconds, but depends on the number of objects in your cluster. After creation, a snapshot persists on disk in the cluster’s HDFS.

You can also create a snapshot manually. You should create a snapshot before making any changes to your cluster’s environment, loading a large amount of new data, or changing the structure of a table. A snapshot may only be restored to the same cluster on which it was taken. The cluster software release version must match the snapshot release version.

If you need to move data between clusters or restore to a cluster that was updated to a new release, contact ThoughtSpot Support^.

In summary, here are the important considerations when choosing the snapshot strategy:

Purpose

Restore a cluster to particular point in time.

Storage

In the cluster’s HDFS.

Advantages

Can be taken on, or restored to, a running cluster.

Fastest create and restore.

Limitations

Includes all data, state, and metadata created between snapshot create and restore.

Snapshots do not copy over anything that is in the home directories or root partitions of an instance. If you routinely add flat files or scripts directly, make separate copies of these flat files and scripts.

Lost if the HDFS name node fails, if you lose multiple disks, or if the entire cluster is destroyed.

Can be restored only to the cluster on which the snapshot was taken.

Backups

A backup is a procedure that stores a snapshot outside of a ThoughtSpot cluster. Backups are stored in a directory on a local or network file system. You can store all of the data associated with a snapshot, a portion of that data, or only metadata.

There is no default configuration enabled for backing up a cluster. You can configure a periodic backup policy yourself, or you can take backups manually. Backing up periodically protects your company from losing data and/or user work.

You can use a backup to restore a cluster to a prior state or to a differently configured appliance. You can also use a backup to move a cluster from an appliance to a virtual cluster, or vice versa.

In summary, here are the important considerations when choosing the backup strategy:

Purpose

Restore a cluster to a prior state.

Move a cluster to a different hardware, cloud, or VMware appliance.

Change a cluster that is not High Availability (HA) to a High Availability cluster, or change a High Availability cluster to a cluster that is not High Availability.

Restore to a cluster that runs a different release from the one where the backup was taken.

Storage

Several options:

  • Outside the cluster on a local disk

  • Outside the cluster on an NAS disk

  • You can back up an AWS cluster using an S3 bucket.

  • You can back up a GCP cluster using a GCS bucket.

Advantages
  • Very stable.

  • Can be used to recover from data loss or corruption, even if the cluster is destroyed.

  • Can be full, lightweight, or dataless.

Limitations
  • Requires deleting the existing cluster first.

  • You are responsible for validating your backup configuration as viable for restoring a cluster.

  • Backups do not copy over anything that is in the home directories or root partitions of an instance. If you routinely add flat files or scripts directly, make separate copies of these flat files and scripts.

  • Best practice recommends you to maintain multiple backups.

  • Typically, very large in memory size.

Offline backup cluster

The most robust strategy for backup and recovery requires having a backup cluster offline that is kept in sync with the production cluster. Then, if the production cluster fails, the backup cluster can be drafted to take its place with minimal loss of work and disruption to operations.

Details on this architecture, and instructions on setting it up, are available in the ThoughtSpot Disaster Recovery Guide.

Use secondary disks or your NAS bucket for backups and snapshots. Do NOT use the primary disk, at locations such as /tmp, /home/admin or /export/home/admin.