Connections

Using Connections, you can perform live queries on external databases.

If your company stores source data externally in data warehouses, you can use ThoughtSpot Connections to directly query that data and use ThoughtSpot’s analysis and visualization features, without moving the data into ThoughtSpot.

You can establish direct connections to the following external databases:

logo redshift
logo synapse
logo databricks
logo dremio

Dremio New!

logo gcp
logo oracle

Oracle New!

logo sap
logo snowflake
logo starburst
logo teradata

How it works

You create a connection to the external database, choosing the columns from each table that you want to explore in your live query. Primary key and foreign key relationships are imported along with the primary and foreign key tables. If there are any joins in the tables of your connection, they are also imported. After your connection is complete, it becomes a linked data source in ThoughtSpot that allows you to query the external database directly. It’s easy to apply transformations and filter the data also.

Key benefits

  • Set up and deploy ThoughtSpot faster by connecting directly to the external database.

  • Eliminate the need to move data into ThoughtSpot for analysis.

  • Centralize data management and governance in the external database.

  • Save significant time and money by avoiding ETL pipelines.

  • Connect to multiple external databases.

The following sections contain the supported and recommended instance types for direct data connections of ThoughtSpot to deployments in AWS, Azure, and GCP. When setting up your cluster, use the information here to select an instance type, configure the number of instances required for the storage you need, and add data volumes to your cluster.

AWS

VMs with EBS-only persistent storage

EBS-only persistent storage: Recommended capacity and volume, based on user data capacity, on a "per VM" basis
User data capacity[1] Instance type CPU/RAM Premium SSD Managed Disk volume[1] Required boot volume

Up to 1B rows

r4.4xlarge, r5.4xlarge

16/122, 16/128

2X 400 GB

200 GB for each node

Up to 4B rows

r5.8xlarge

32/256

2X 400 GB

200 GB for each node

4B+ rows

r5.16xlarge

64/512

2x 1 TB

200 GB for each node

VMs with EBS and S3 persistent storage

EBS and S3 persistent storage: Recommended capacity and volume, based on user data capacity, on a "per VM" basis
User data capacity[1] Instance type CPU/RAM EBS volume[1] Required boot volume

Up to 1B rows

r4.4xlarge, r5.4xlarge

16/122, 16/128

1x 500 GB

200 GB for each node

Up to 4B rows

r5.8xlarge

32/256

1x 500 GB

200 GB for each node

4B+ rows

r5.16xlarge

64/512

1x 500 GB

200 GB for each node

Azure

Recommended capacity and volume, based on user data capacity, on a "per VM" basis
User data capacity[1] Instance type CPU/RAM Premium SSD Managed Disk volume[1] Required boot volume

Up to 1B rows

E16s_v3

16/128

2X 400 GB

200 GB for each node

Up to 4B rows

E32s_v3

32/256

2X 400 GB

200 GB for each node

4B+ rows

E64s_v3

64/432

2x1 TB

200 GB for each node

GCP

VMs with Persistent Disk-only storage

Persistent Disk storage: recommended capacity and volume, based on user data capacity, on a "per VM" basis
User data capacity[1] Instance type CPU/RAM Zonal Persistent SSD Disk volume[1] Required boot volume

Up to 1B rows

n1-highmem-16

16/122

2X 400 GB

200 GB for each node

Up to 4B rows

n1-highmem-32

32/208

2X 400 GB

200 GB for each node

4B+ rows

n1-highmem-64

64/416

2x 1 TB

200 GB for each node

VMs with Persistent Disk and Google Cloud storage

Persistent Disk and Google Cloud storage: recommended capacity and volume, based on user data capacity, on a "per VM" basis
User data capacity[1] Instance type CPU/RAM Zonal Persistent SSD Disk volume[1] Required boot volume

Up to 1B rows

n1-highmem-16

16/122

1X 500 GB

200 GB for each node

Up to 4B rows

n1-highmem-32

32/208

1X 500 GB

200 GB for each node

4B+ rows

n1-highmem-64

64/416

1X 500 GB

200 GB for each node

Limitations

ThoughtSpot does not support joins across connections.

Feature availability

The following matrix compares the features that are available in our internal high-performance database, Falcon, and the ones available through Connections:

Feature Name Falcon Connections

Simple Search and Complex searches: Versus, Inline Subquerying, Growth

Search Suggestions for column names and values

Headlines that summarize tables

All chart types and configurations

Spot IQ: Analyze

Table and Column remapping through TML files

Custom calendar

Materialized view

Function availability

The following matrix compares the specific function support across the different databases you can access through Connections. Functions not listed here have full support.

Function Snowflake Amazon Redshift Google BigQuery Azure Synapse Teradata SAP HANA

SOUNDS_LIKE

STRING_ MATCH_SCORE

EDIT_DISTANCE_WITH_CAP

APPROX_SET_CARDINALITY

COUNT_NOT_NULL

SPELLS_LIKE

EDIT_DISTANCE

MEDIAN

PERCENTILE

Data type availability

The following matrix captures the specific data type support limitations across the different databases accessible through Connections. Data types not listed here have full support.

Data Type Snowflake Amazon Redshift Google BigQuery Azure Synapse Teradata SAP HANA

BINARY

VARBINARY

GEOMETRY

BYTES

DATETIMEOFFSET

Additional specific exceptions

The following list captures the specific limitations across the different databases supported through Connections. Databases not listed here have full support.

General for all databases
Sample values

ThoughtSpot does not internationalize sample values in tables.

Google BigQuery
Join support

Google BigQuery does not support PK-FK joins. Therefore, when using Connections, you must create joins explicitly in ThoughtSpot.

Partitioned tables

When running a query on a partitioned table with the Require partition filter option enabled, you must specify the WHERE clause. Without a WHERE clause specified, queries generate an error. To ensure that the query on such tables honors the partition condition, you must create a worksheet filter in ThoughtSpot.

Azure Synapse

Azure Synapse supports up to 10 IF THEN ELSE statements in a single query.

Azure Synapse does not support foreign keys, so no PK-FK joins can be defined in Synapse.

Teradata

Teradata does not support the function AGGREGATE_DISTINCT.

Teradata does not support the following data types: JSON, INTERVAL, VARBYTE, BLOB, CLOB, PERIOD, XML, GEOSPATIAL.

SAP HANA

SAP HANA does not support the following functions: PERCENTILE, AGGREGATE_DISTINCT, SPELLS_LIKE, EDIT_DISTANCE.

SAP HANA does not support the following data types: BLOB, CLOB, NCLOB, TEXT, POINT.

SAP HANA does not support calculation views with mandatory input parameters. If you need to use calculation views in ThoughtSpot, you must remove the mandatory parameter requirement.

Next steps


1. per VM