"Querygroups() +" and "querygroups() -"
In this article, we walk through examples of aggregation using "querygroups() +" and "querygroups() -". To test for yourself, use the data set we provide in the prerequisite.
Prerequisites
Before attempting the following practice examples, follow these steps to upload the sample dataset:
Click on the dropdown to see the steps to upload the data.
-
Click here to download the sample data set.
-
Sign in to your ThoughtSpot account, and click the Data tab.
-
Click the more menu icon in the top-right of the screen. From the menu, select Upload data.
-
Upload the data file, either by dragging and dropping it into the field provided, or by clicking Browse your files and selecting Meetings_tiny.csv.
-
To answer Are the column names already defined in the file header?, select Yes.
-
To answer Are the fields separated by?, select Comma (,).
-
-
Select Next at the bottom right of the screen to proceed.
-
Select Next to proceed without changing the column names. For the sake of the formulas used in the examples, column names should not be changed.
-
Review the column types in Step 3 and select Upload to finish.
-
Click the search icon on the final screen of the upload process to begin a search based on the sample data set.
Aggregating across multiple levels of detail
In the basic aggregation examples, we described basic usage for group_aggregate functions and how to specify a level of detail. When using group_aggregate(), the first argument specifies the type of aggregation that should be performed, and the second argument specifies the level of detail at which it should be performed. Your group_aggregate function should be enclosed in an outer aggregation function that specifies the secondary aggregation that should be used when aggregating to a search level of detail that is coarser than the level of detail specified within the group_aggregate function.
Performing a secondary level of aggregation across disjoint levels of detail may give nonsensical results. For example, consider aggregating to the activity classification level and then "aggregating up" to the account level.
Multiple accounts have multiple classes. When you aggregate to the activity classification
level, you are aggregating without including the account
level of detail. You are creating a small virtual table with two rows, one for meetings and one for calls. The row for meetings provides the total number of meetings and the row for calls provides the total number of calls.
There is no way to then "aggregate" up to the account level of detail because this virtual table is now at a level of detail that is disjoint from the account level of detail. The results you get in this case will not make sense. In this situation, where your data includes a hierarchy that needs to be maintained in order for your query to make sense, you must use the "query_groups() +" syntax.
Using "query_groups() +"
When you use "query_groups() +", you specify that you want to use the level of detail defined in the search bar and possibly add to it.
If you use query_groups() without the "+", your group_aggregate function uses exactly the level of detail specified in the search bar. This is less useful since you could get the same result without using group_aggregate(). Add arguments after the "+" to specify the additional attributes that you want to add to the level of detail specified in the search bar.
Use "query_groups() +" to add level of detail
To practice using "query_groups () +", complete the following steps:
-
Begin your search, using Meetings_tiny.csv as your data source.
-
Enter
hours
account
in the search bar and click Go. -
Click the more options icon and select Create formula.
-
In the formula pop-up window, name your formula. Here, we have named the formula average hours per activity classification.
-
Enter the following formula and click Save:
average( group_aggregate( sum(hours), query_groups() + {activity classification}, {} ) )
In this example, the search bar specifies the "account" level of detail. The formula adds "activity classification", thus summing hours to the {account, activity classification} level of detail. The outer level of aggregation then averages the values computed at the {account, activity classification} level of detail to the account level of detail. -
Search for:
account
average hours per activity classification
sort by account
. -
The following table results:
Table 1. Example 2.1.1 Account Hours per activity classification averaged Amazon
11
Oracle
18
Redshift
3
Snowflake
3.5
One way to understand this result is to create a chart without the formula, using the following search:
-
Search for:
account
hours
activity classification
sort by account
sort by activity classification
. -
This creates the table shown below:
Table 2. Example 2.1.2 Account Activity Classification Total Hours Amazon
meeting
11
Oracle
meeting
18
Redshift
call
1
Redshift
meeting
5
Snowflake
call
5
Snowflake
meeting
2
The portion of the hours per activity formula within the "group_aggregate" argument aggregates (sums) hours to the {account, activity classification} level of detail:
group_aggregate( sum(hours), query_groups() + {activity classification}, {} )
This is comparable to what is shown in Table 2.1.2 above.
The second level of aggregation specified by the average surrounding the group_aggregate averages the values associated with each account.:
average( group_aggregate( sum(hours), query_groups() + {activity classification}, {} )
In this case, the function performs the following calculations:
Snowflake: (5 + 2)/2 = 3.5
Redshift: (1 + 5)/2 = 3
Oracle: 18/1 = 18
Amazon: 11/1 = 11
This results in the values computed by the hours per activity classification averaged formula shown in Table 2.1.1 above.
Using "query_groups() -"
In this section we explore how to create a formula that is based off of the level of detail in the search bar, but which subtracts out certain dimensions specified in the search bar. We do this by using the "query_groups() -" syntax. This syntax means, "use the level of detail specified by the enclosing context, which is the search bar, and then remove the portion specified by text following the "-"."
Use "query_groups() -" to remove level of detail
-
Begin your search, using Meetings_tiny.csv as your data source.
-
Enter
account
activity date
in the search bar and click Go. -
Click the more options icon and select Create formula.
-
In the formula pop-up window, name your formula. Here, we name the formula hours ignoring activity date.
-
Enter the following formula and click Save:
average( group_aggregate( average(hours), query_groups() - {activity date}, {} ) )
In this example, the search bar specifies the "account" and "activity date" level of detail. The formula removes the activity date and averages hours to the account level of detail. The outer level of aggregation has no effect in this case, but it is good practice to always provide the outer level of aggregation. -
Search for:
account
activity date
hours ignoring activity date
sort by account
sort by activity date
. -
The following table results:
Table 3. Example 3.1 Account Activity Date Hours ignoring activity date Amazon
04/01/2020
2.75
Amazon
04/05/2020
2.75
Amazon
04/16/2020
2.75
Amazon
04/18/2020
2.75
Oracle
04/01/2020
6.00
Oracle
04/15/2020
6.00
Oracle
04/16/2020
6.00
Redshift
02/01/2020
2.00
Redshift
04/05/2020
2.00
Redshift
04/06/2020
2.00
Snowflake
02/01/2020
1.75
Snowflake
04/01/2020
1.75
Snowflake
04/04/2020
1.75
Snowflake
04/15/2020
1.75
Note that in the table above, the Hours value for each row of an account is the same. For example, the four rows for Snowflake all show an Hours value of 1.75. Similarly, the Hours value for Amazon (2.75) is the same in each of the Amazon rows. This is because Activity Date is being ignored and Hours ignoring activity date is computed at the Account level of detail.
Aggregate hours ignoring activity date
In this example we use the same formula as that used in Example 3.1.
In Example 3.1, our search bar contained account
and activity date
. Here we add activity classification
, so we are now computing Hours ignoring activity date at the {Account, Activity Classification} level of detail.
-
Search for:
account
activity classification
activity date
hours ignoring activity date
sort by account sort by activity date
. -
The following table results:
Table 4. Table 3.2 Account Activity Classification Activity Date Hours ignoring activity date Amazon
meeting
04/01/2020
2.75
Amazon
meeting
04/05/2020
2.75
Amazon
meeting
04/16/2020
2.75
Amazon
meeting
04/18/2020
2.75
Oracle
meeting
04/01/2020
6.00
Oracle
meeting
04/15/2020
6.00
Oracle
meeting
04/16/2020
6.00
Redshift
meeting
02/01/2020
2.50
Redshift
meeting
04/05/2020
2.50
Redshift
call
04/06/2020
1.00
Snowflake
meeting
02/01/2020
2.00
Snowflake
call
04/01/2020
1.67
Snowflake
call
04/04/2020
1.67
Snowflake
call
04/15/2020
1.67
In the table above, note that the three rows for Snowflake where the Activity Classification is "call" have an Hours value of 1.67, while the line for Snowflake where the activity classification is "meeting" has an Hours value of 2.
Related information: