Optimizing ThoughtSpot Workloads with Databricks
To enhance performance and cost-efficiency for BI workloads, a strategic approach involving serverless clusters, Databricks' Photon engine, and Delta Cache is essential. The following document outlines key recommendations to optimize BI workloads effectively.
Recommendations
Utilize Serverless SQL Warehouse Clusters
- Benefits
-
- Instant Start
-
Serverless SQL Warehouse clusters have a startup time of seconds compared to minutes for general-purpose, non-serverless clusters.
- Elastic Scaling
-
Automatically adjusts to the workload with options for minimum and maximum workers.
- Fully Managed Service
-
Simplifies operations with no need for manual cluster management or software updates.
- Strategy
-
- Auto Stop
-
Set clusters to auto-stop after n minutes of inactivity to prevent unnecessary costs.
- Concurrency Tuning
-
Scale between a minimum of 2 and a maximum of 10 workers, depending on the workload. Monitor and tune accordingly.
- Engage Databricks Team
-
Collaborate with the Databricks account team to fine-tune SQL Warehouses for optimal performance and cost.
Leverage Photon
- Benefits
-
- High Performance
-
Utilizes CPU-level optimization and effective memory management for increased speed.
- Optimized Parquet Writing
-
With a C++ Parquet writer, operations involving Parquet and Delta files are expedited.
- Serverless Integration
-
Available by default with serverless clusters, enhancing performance without additional configuration.
Implement Delta Cache
- Benefits
-
- Faster Access
-
Keeps frequently accessed data on worker SSDs, significantly reducing query times.
- Automatic Inclusion
-
Standard with SQL Serverless warehouses, requiring no extra setup.
- Usage Tip
-
- Preload Data
-
Use
CACHE SELECT * FROM
table at the start of an endpoint to preload "hot" tables, ensuring rapid access.
Be Cognizant of Other Tunables
- Lazy Evaluation
-
Important for Data Engineering and writing pipelines, although not directly impacting ThoughtSpot workloads.
- Z-Order Optimize
-
Regularly employ Z-Ordering to co-locate related data, which accelerates queries and decreases cloud storage costs through more efficient data reads.
Related information