Aggregate-aware Model switching
Aggregate-aware Model switching in ThoughtSpot is a transparent query optimisation feature where Search Data automatically switches from detailed Models to associated aggregate Models. This provides a mechanism to manage and reduce cloud spend and improve query latency whilst maintaining insight accuracy.
When a user queries the data, ThoughtSpot automatically switches between Models based on the granularity of the query — the user does not need to manually switch between Models or be aware that switching has occurred.
Aggregate-aware query execution is designed to:
-
Reduce resource consumption — Minimize the number of rows scanned and reduce the required size or capacity of the underlying data warehouse for high-volume queries.
-
Improve query performance — Achieve a measurable reduction in average query latency across key platform components (Liveboards, Search Data, Spotter, Agents) by automatically routing queries to smaller aggregate Models.
-
Ensure transparency — Enable automatic query switching across all major ThoughtSpot consumption surfaces with no visible change to the user experience.
To enable this feature contact ThoughtSpot support.
How it works
Aggregate awareness is achieved through ThoughtSpot’s proprietary search. When a user submits a query:
-
ThoughtSpot evaluates the tokens in the query (columns, date buckets, filters, etc.).
-
If an associated aggregate Model has a matching token set that can fully answer the query, ThoughtSpot re-routes the query to that Model.
-
If the query cannot be answered by an aggregate Model — for example, if the user drills down to a transactional level — ThoughtSpot falls back to the detailed Model automatically.
This allows the user to follow their train of thought from aggregate-level analysis to detailed and vice versa. The analytical path is not constrained.
Example
Consider a Sales Model where the detailed data has one row per order line item (product, quantity, revenue). This dataset may contain hundreds of millions of records. Aggregate Models can serve common query patterns far more efficiently:
| Query pattern | Model used | Approximate row count |
|---|---|---|
Total revenue by month for the last 7 years |
Sales Monthly Aggregate |
86 rows |
Revenue by department and product category |
Sales Company Aggregate |
10s–100s of millions |
Show me individual order line items for last week |
Sales (detailed) |
100s of millions |
Setup
-
Create the detailed Model.
-
Create a ThoughtSpot Model at the detailed level, including all columns and formulas required to meet the business case. This is your primary Model.
-
-
Create aggregate Models.
-
Create one or more aggregate Models in ThoughtSpot. These models must:
-
Use the same column names as defined in the primary Model (query switching is based on column name matching).
-
Have formulas defined to return accurate results at the aggregated grain (e.g.,
SUM(revenue)notrevenueat line-item level). -
Have the same semantic rules applied as the primary Model.
Aggregate Models don’t need to be defined in the same connection as the primary Model.
-
-
-
Define the association in TML.
-
In the primary Model’s TML file, add the
aggregated_modelsblock to declare which aggregate Models are associated and at what granularity each operates.
-
TML definition
Add the aggregated_models key to your primary Model’s .model.tml file. Each entry specifies an aggregate Model by its id and, optionally, the date column and bucket it is optimised for.
Syntax
aggregated_models:
- id: <model_id>
date_aggregation_info:
- column_id: <date_column_id>
bucket: <BUCKET_LEVEL>
Properties:
| Property | Required | Description |
|---|---|---|
|
Yes |
The unique name of the associated aggregate Model. |
|
No |
Declares which date column and bucket this Model is optimised for. Omit for dimension-only or company-level aggregates. |
|
Yes (if |
The date column in the source Model to match on. Must exactly match the column name in the aggregate Model. |
|
Yes (if |
The aggregation granularity. Supported values: |
Example: Three aggregate Models on a Sales Model
The following TML snippet defines three aggregate Models. One optimised for monthly queries, one for yearly queries, and one as a general company-level aggregate with no date bucketing:
aggregated_models:
- id: Sales Monthly Aggregate
date_aggregation_info:
- column_id: order_date
bucket: MONTHLY
- id: Sales Yearly Aggregate
date_aggregation_info:
- column_id: order_date
bucket: YEARLY
- id: Sales Company Aggregate
How ThoughtSpot selects the Model at query time:
| User query | Model selected |
|---|---|
Revenue by month last year |
Sales Monthly Aggregate |
Total sales by year |
Sales Yearly Aggregate |
Revenue by region (no date dimension) |
Sales Company Aggregate |
Order line items for last week |
Sales (detailed) — fallback |
Considerations
-
Column name matching
-
Query switching is based on column names. Column names in the aggregate Models must exactly match the column names in the primary Model, including case.
-
-
Schema differences
-
The associated Models will not have the same logical table structure as the primary Model. The primary Model may be a complex star or chasm trap schema, whereas the associated aggregate Models are typically single logical tables.
-
-
Connection flexibility
-
Aggregate Models do not have to be defined in the same connection as the primary Model. This enables:
-
Independent sizing: The detailed Model may require large instances where aggregate Models may only need small instances, which reduce cost.
-
ThoughtSpot cache layer: Aggregate Models can be stored in ThoughtSpot’s caching layer, removing computation costs from the database layer entirely.
Data latency must be considered when using the cache layer. Cached data should be synchronised with the underlying database on an appropriate schedule.
-
-
-
Semantic rule parity
-
The data analyst must ensure that all semantic rules applied to the detailed Model are also applied to the associated aggregate Models. This includes:
-
Formula correctness: Formulas must be defined at the correct grain for the aggregate. An
AVGthat is additive at the line-item level may need to be treated differently at a pre-aggregated level. -
Row-level security (RLS): Apply the same RLS rules to prevent unauthorised data access.
-
Column-level security (CLS): Apply the same CLS rules to restrict sensitive column visibility.
-
-