Optimizing ThoughtSpot Workloads with Databricks

To enhance performance and cost-efficiency for BI workloads, a strategic approach involving serverless clusters, Databricks' Photon engine, and Delta Cache is essential. The following document outlines key recommendations to optimize BI workloads effectively.

Recommendations

Utilize Serverless SQL Warehouse Clusters

Benefits

Instant Start: Serverless SQL Warehouse clusters have a startup time of seconds compared to minutes for general-purpose, non-serverless clusters.
Elastic Scaling: Automatically adjusts to the workload with options for minimum and maximum workers.
Fully Managed Service: Simplifies operations with no need for manual cluster management or software updates.

Strategy

Auto Stop: Set clusters to auto-stop after n minutes of inactivity to prevent unnecessary costs.
Concurrency Tuning: Scale between a minimum of 2 and a maximum of 10 workers, depending on the workload. Monitor and tune accordingly.
Engage Databricks Team: Collaborate with the Databricks account team to fine-tune SQL Warehouses for optimal performance and cost.

Leverage Photon

Benefits

High Performance: Utilizes CPU-level optimization and effective memory management for increased speed.
Optimized Parquet Writing: With a C++ Parquet writer, operations involving Parquet and Delta files are expedited.
Serverless Integration: Available by default with serverless clusters, enhancing performance without additional configuration.

Implement Delta Cache

Benefits

Faster Access: Keeps frequently accessed data on worker SSDs, significantly reducing query times.
Automatic Inclusion: Standard with SQL Serverless warehouses, requiring no extra setup.

Usage Tip

Preload Data: Use CACHE SELECT * FROM table at the start of an endpoint to preload "hot" tables, ensuring rapid access.

Be Cognizant of Other Tunables

Lazy Evaluation: Important for Data Engineering and writing pipelines, although not directly impacting ThoughtSpot workloads.
Z-Order Optimize: Regularly employ Z-Ordering to co-locate related data, which accelerates queries and decreases cloud storage costs through more efficient data reads.

Related information

Add a Databricks connection

Edit a Databricks connection

Remap a Databricks connection

Delete a table from a Databricks connection

Delete a table with dependent objects from a Databricks connection

Delete a Databricks connection

Configure OAuth for a Databricks connection

Configure OAuth with AAD for a Databricks connection

Enabling an AWS PrivateLink between ThoughtSpot Cloud and your Databricks data warehouse

Connection reference for {connection}

Passthrough functions for Databricks

Was this page helpful?Give us feedback!

Optimizing ThoughtSpot Workloads with Databricks

Recommendations

Utilize Serverless SQL Warehouse Clusters

Leverage Photon

Implement Delta Cache

Be Cognizant of Other Tunables

Product

By Role

By Department

Solutions

About Us

Connect