Alerts code reference

Learn about the alerts ThoughtSpot may generate.

This reference identifies the messages that can appear in the System Health  Overview  Critical Alerts and in the Alerts dashboard.

Informational alerts

TASK_TERMINATED

Msg: Task {{.Service}}.{{.Task}} terminated on machine {{.Machine}}

Type: INFO

This alert is raised when a task terminates.

DISK_ERROR

Msg: Machine {{.Machine}} has disk errors

Type: INFO

Raised when a machine has disk errors.

ZK_AVG_LATENCY

Msg: Average Zookeeper latency is more than {{.Num}} msec

Type: INFO

Raised when average Zookeeper latency is above a threshold.

ZK_MAX_LATENCY

Msg: Max Zookeeper latency is more than {{.Num}} msec

Type: INFO

Raised when max Zookeeper latency is above a threshold.

ZK_MIN_LATENCY

Msg: Min Zookeeper latency is more than {{.Num}} msec

Type: INFO

Raised when min Zookeeper latency is above a threshold.

ZK_OUTSTANDING_REQUESTS

Msg: Number of outstanding Zookeeper requests exceeds {{.Num}}

Type: INFO

Raised when there are too many outstanding Zookeeper requests.

ZK_NUM_WATCHERS

Msg: Msg: Number of Zookeeper watchers exceeds {{.Num}}

Type: INFO

Raised when there are too many Zookeeper watchers.

MASTER_ELECTION

Msg: {{.Machine}} elected as Orion Master

Type: INFO

Raised when a new Orion Master is elected.

PERIODIC_BACKUP

Msg: {{.Process}} periodic backup for policy {{.Name}} failed.

Type: INFO

Raised when periodic backup fails.

PERIODIC_SNAPSHOT

Msg: {{.Process}} periodic snapshot {{.Name}} failed.

Type: INFO

Raised when a periodic snapshot fails.

HDFS_CORRUPTION

Msg: HDFS root directory is in a corrupted state.

Type: INFO

Raised when HDFS root directory is corrupted.

APPLICATION_INVALID_STATE

Msg: {{.Service}}.{{.Task}} on {{.Machine}} at location {{.Location}}

Type: INFO

Raised when Application raises invalid state alert.

UPDATE_START

Msg: Starting update of ThoughtSpot cluster {{.Cluster}}

Type: INFO

Raised when update starts.

UPDATE_END

Msg: Finished update of ThoughtSpot cluster {{.Cluster}} to release {{.Release}}

Type: INFO

Raised when update completes.

Errors

TIMELY_JOB_RUN_ERROR

Msg: Job run {{.Message}}

Type: ERROR

Raised when a job run fails.

TIMELY_ERROR

Msg: Job manager {{.Message}}

Type: ERROR

Raised when a job manager runs into an inconsistent state.

Warnings

DISK_SPACE

Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free

Type: WARNING

Raised when a disk is low on available disk space. Valid only in the 3.2 version of ThoughtSpot.

ROOT_DISK_SPACE

Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on root partition

Type: WARNING

Raised when a machine is low on available disk space on root partition.

BOOT_DISK_SPACE

Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on boot partition

Type: WARNING

Raised when a machine is low on available disk space on boot partition.

UPDATE_DISK_SPACE

Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on update partition

Type: WARNING

Raised when a machine is low on available disk space on update partition.

EXPORT_DISK_SPACE

Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on export partition

Type: WARNING

Raised when a machine is low on available disk space on export partition.

HDFS_NAMENODE_DISK_SPACE

Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on HDFS namenode drive

Type: WARNING

Raised when a machine is low on available disk space on HDFS namenode drive.

MEMORY

Msg: Machine {{.Machine}} has less than {{.Perc}}% memory free

Type: WARNING

Raised when a machine is low on free memory.

OS_USERS

Msg: Machine {{.Machine}} has more than {{.Num}} logged in users

Type: WARNING

Raised when a machine has too many users logged in.

OS_PROCS

Msg: Machine {{.Machine}} has more than {{.Num}} processes

Type: WARNING

Raised when a machine has more too many processes.

SSH

Msg: Machine {{.Machine}} doesn’t have an active SSH server

Type: WARNING

Raised when a machine has more than 600 processes.

DISK_ERROR_EXTERNAL

Msg: Machine {{.Machine}} has disk errors

Type: WARNING

Raised when more than 2 disk errors happen in a day.

ZK_FD_COUNT

Msg: Zookeeper has more than {{.Num}} open file descriptors

Type: WARNING

Raised when there are too many open Zookeeper files.

ZK_EPHEMERAL_COUNT

Msg: Zookeeper has more than {{.Num}} ephemeral files

Type: WARNING

Raised when there are too many Zookeeper ephemeral files.

HOST_DOWN

Msg: {{.Machine}} is down

Type: WARNING

Raised when a host is down.

TASK_UNREACHABLE

Msg: {{.ServiceDesc}} on {{.Machine}} is unreachable over HTTP

Type: WARNING

Raised when a task is unreachable over HTTP.

TASK_NOT_RUNNING

Msg: {{.ServiceDesc}} is not running

Type: WARNING

Raised when a service task is not running on any machine in the cluster.

Critical alerts

TASK_FLAPPING

Msg: Task {{.Service}}.{{.Task}} terminated {{._actual_num_occurrences}} times in last {{._earliest_duration_str}}

Type: CRITICAL

This alert is raised when a task is crashing repeatedly. The service is evaluted across the whole cluster. So, if a service crashes 5 times in a day across all nodes in the cluster, this alert is generated.

OREO_TERMINATED

Msg: Oreo terminated on machine {{.Machine}}

Type: CRITICAL

This alert is raised when the Oreo daemon on a machine terminates due to an error. This typically happens due to an error accessing Zookeeper, HDFS, or a hardware issue.

HDFS_DISK_SPACE

Msg: HDFS has less than {{.Perc}}% space free

Type: CRITICAL

Raised when a HDFS cluster is low on total available disk space.

ZK_INACCESSIBLE

Msg: Zookeeper is not accessible

Type: CRITICAL

aised when Zookeeper is inaccessible.

PERIODIC_BACKUP_FLAPPING

Msg: Periodic backup failed {{._actual_num_occurrences}} times in last {{._earliest_duration_str}}

Type: CRITICAL

This alert is raised when a periodic backup failed repeatedly.

PERIODIC_SNAPSHOT_FLAPPING

Msg: Periodic snapshot failed {{._actual_num_occurrences}} times in last {{._earliest_duration_str}}

Type: CRITICAL

This alert is raised when periodic snapshot failed repeatedly.

APPLICATION_INVALID_STATE_EXTERNAL

Msg: {{.Service}}.{{.Task}} on {{.Machine}} at location {{.Location}}

Type: CRITICAL

Raised when Application raises invalid state alert.