Basic use of group aggregation
In this article, we walk through examples of basic aggregation. To test for yourself, use the data set we provide in prerequisites.
Prerequisites
Before attempting the following practice examples, follow these steps to upload the sample dataset:
Click the dropdown to see the steps to upload the data.
-
Click here to download the sample data set.
-
Sign in to your ThoughtSpot account, and click the Data tab.
-
Click the more menu icon in the top-right of the screen. From the menu, select Upload data.
-
Upload the data file, either by dragging and dropping it into the field provided, or by clicking Browse your files and selecting Meetings_tiny.csv.
-
To answer Are the column names already defined in the file header?, select Yes.
-
To answer Are the fields separated by?, select Comma (,).
-
-
Select Next at the bottom right of the screen to proceed.
-
Select Next to proceed without changing the column names. For the sake of the formulas used in the examples, column names should not be changed.
-
Review the column types in Step 3 and select Upload to finish.
-
Click the search icon on the final screen of the upload process to begin a search based on the sample data set.
Basic aggregation: Aggregate the sum of hours at account level of detail
First, let’s try a formula that aggregates sum(hours) at the account level. If the search level of detail is coarser than account, it "aggregates up" by averaging the aggregation at the account level.
To try using this formula, follow these steps:
-
Begin your search, using Meetings_tiny.csv as your data source.
-
Enter
hours
account
in the search bar and click Go. -
Click the more options icon and select Create formula.
-
In the formula pop-up window, name your formula. Here, we name the formula hours per account.
-
Enter the following formula and click Save:
average( group_aggregate( sum(hours), {account}, {} ) )
-
In the search bar, enter the following terms:
account
hours
hours per account
sort by account
. -
The resulting table shows that the search level aggregation of hours (Total hours) is the same as the aggregation specified by our formula, hours per account. We see this result because both columns aggregate to the account level.
Table 1. Example 1.1 Account Total hours Hours per account Amazon
11
11
Oracle
18
18
Redshift
6
6
Snowflake
7
7
Aggregation with unspecified level of detail in search
Using the same formula shown in the previous example, let’s see what changes when our search does not specify any level of detail:
-
Search for:
hours
hours per account
. -
The following table results:
Table 2. Example 1.2 Total hours Hours per account 42
10.5
In this example, we do not specify a dimension in the search bar, so the search defines the level of detail as the whole table. In other words, our search is at the coarsest level of detail while the formula specifies an aggregation at a finer level of detail (the account level of detail). The first column of the table above displays the aggregation of all hours to the table (coarsest) level.
The second column, which shows the result of our formula, computes as follows:
-
First, aggregate using the inner_aggregation function (sum) to the account level of detail, which gives results like those shown in example 1.1 (values as 7, 6, 11, and 18 for the account level hours).
-
Then, aggregate to the coarser search level of detail by using the outer aggregation function, which is Average. That average gives us (7 + 6 + 11 + 18)/4, which results in the 10.5 value shown above.
-
Aggregating when the search level of detail is finer than formula
In this example, we use the same formula as above. This time, we use a search that specifies a level of detail that is finer than the account level.
-
Search for:
account
activity classification
hours
hours per account
sort by account
.Activity classification is a finer level of detail because there are multiple activity classes for each account. In this case, our formula aggregates to the account level and then "copies down" or "replicates" to the finer level of detail.
-
The resulting table looks like this:
Table 3. Example 1.3 Account Activity classification Total hours Total hours per account Amazon
meeting
11
11
Oracle
meeting
18
18
Redshift
meeting
5
6
Redshift
call
1
6
Snowflake
meeting
2
7
Snowflake
call
5
7
Aggregating with multiple levels of detail in your formula
If we want to specify multiple levels of detail within the formula, we can do that by using a comma-separated list of dimensions. Our small data set makes coming up with a "useful" example difficult, but it can still demonstrate this functionality.
In the example below, our inner aggregation aggregates to the account and activity classification level of detail.
-
Enter
hours
account
in the search bar and click Go. -
Click the more options icon and select Create formula.
-
In the formula pop-up window, name your formula. Here, we name the formula average of average by account and activity classification.
-
Enter the following formula and click Save:
average( group_aggregate( average(hours), {account, activity classification}, {} ) )
-
When the search specifies
account
andactivity classification
as the level of detail, this formula generates the same values as those generated by the simple average hours aggregation. You can see this with the following search:-
Search for:
account
activity classification
average hours
average of average by account and activity classification
sort by account
sort by activity classification
The following table results:
Table 4. Example 1.4 - Supporting Account Activity Classification Average Hours Average of average by account and activity classification Amazon
meeting
2.75
2.75
Oracle
meeting
6
6
Redshift
call
1
1
Redshift
meeting
2.5
2.5
Snowflake
call
1.67
1.67
Snowflake
meeting
2
2
-
-
If we aggregate to the table level by removing account and activity classification from the search, our formula "aggregates" up, resulting in a table with a single value that represents the average of the above values. If we put both average hours and our formula in this table level table, we can see that they result in different values.:
-
Search for:
average of average by account and activity classification
average hours
.The following table results:
-
-
Average of average by account and activity classification | Average hours |
---|---|
2.65 |
3 |
+ We receive this result because our formula adds the six values that represent the average hours by account and activity classification and then divides by six, while "average hours" sums the hours value from each of the 14 lines in the original table, and then divides by 14.
There are many business use cases that require very specific aggregation; understanding the subtleties of level of detail calculations and how to implement them with group_aggregate functions key to being successful at implementing these more complex aggregations.
For more examples of aggregation, see Using "query_groups() +/-".