Data protection and business continuity

This article describes ThoughtSpot’s data protection and business continuity strategy for ThoughtSpot Cloud.

There are several methods ThoughtSpot employs to protect your cluster data and ensure business continuity in the event of a hardware or software failure or a catastrophic event.

Recovery Point Objective and Recovery Time Objective

A Recovery Point Objective (RPO) is the frequency of which you take a backup of a system. ThoughtSpot’s RPO is 24 hours. Every 24 hours, ThoughtSpot automatically takes a backup and stores it in Amazon S3, in the same region as the cluster, but in a different Availability Zone.

A Recovery Time Objective (RTO) is the targeted duration of time between an event of failure and the return to operation. ThoughtSpot’s RTO is 2 hours.

Snapshots and backups

ThoughtSpot has a concept of both snapshots and backups for data protection and backing up your ThoughtSpot cluster.

Snapshots

A snapshot is a point-in-time image of your running cluster. Snapshots are taken on a running cluster, and later restored to the same cluster. ThoughtSpot takes snapshots of your cluster once an hour and stores them in persistent storage attached to your cluster. If an error occurs while making any changes to your cluster’s environment, or changing the structure of a table, you can file an issue with ThoughtSpot Support to restore the cluster from snapshots.

Backups

A backup is a procedure that stores a snapshot outside a ThoughtSpot cluster. ThoughtSpot takes dataless backups of your cluster once a day, at midnight local cluster time, and stores them in an Amazon S3 bucket in the same region as the cluster. ThoughtSpot can use a backup to restore a cluster to a prior state. ThoughtSpot can restore the cluster in any Availability Zone and in the same region as the original cluster.

Backups contain the following information:

  • Metadata:

    • Users, groups, answers, Liveboards, visualizations, worksheets, data modeling settings, row level security filters, tables, columns

    • Configuration of connections to external cloud data warehouses

  • Scheduled jobs (for example, scheduled Liveboards)

  • Cluster details: ID, name, version

  • AWS configuration (label, region, S3 bucket name)

  • Cluster manager configuration (e.g., backup policy), and service configuration (e.g., service enabled or disabled, memory limits for service)

  • End-user license agreement (EULA) policy and file

  • ThoughtSpot Software artifacts: version, checksum of binaries

  • Hadoop layout

  • Firewall configuration

  • mailname and mailfromname

  • SAML configuration

  • Consumption pricing user activity

Backups DO NOT include the following information:

  • Search index tokens

  • Usage information

  • Traces

Automation of data protection

ThoughtSpot’s data protection procedure is fully automated. You do not need to file an issue with ThoughtSpot to request recovery. ThoughtSpot Support receives an alert after 15 minutes of cluster unavailability, and immediately investigates it. However, if you have an issue with cluster availability, and the cluster does not recover automatically, you also have the option to open a support request with ThoughtSpot Support. ThoughtSpot automatically takes snapshots of your cluster once an hour and stores them on EBS disks attached to your VM instance. ThoughtSpot automatically takes backups of your cluster once a day and stores them in Amazon S3.

If a cluster is not available, ThoughtSpot Support receives an alert after 15 minutes of unavailability. Then, ThoughtSpot Support investigates the cluster to determine the problem and tries to fix it. If necessary, ThoughtSpot Support restores the cluster from a snapshot or backup.