"Querygroups() +" and "querygroups() -"

In this article, we walk through examples of aggregation using "querygroups() +" and "querygroups() -". To test for yourself, use the data set we provide in the prerequisite.

Prerequisites

Before attempting the following practice examples, follow these steps to upload the sample dataset:

Click the dropdown to see the steps to upload the dataset.

Click here to download the sample data set.
Sign in to your ThoughtSpot account, and click the Data tab.
Click the more menu icon in the upper-right of the screen. From the menu, select Upload data.
Upload the data file, either by dragging and dropping it into the field provided, or by clicking Browse your files and selecting Meetings_tiny.csv.
1. To answer Are the column names already defined in the file header?, select Yes.
2. To answer Are the fields separated by?, select Comma (,).
Select Next at the lower right of the screen to proceed.
Select Next to proceed without changing the column names. For the sake of the formulas used in the examples, column names should not be changed.
Review the column types in Step 3 and select Upload to finish.
Click the search icon on the final screen of the upload process to begin a search based on the sample data set.

Aggregating across multiple levels of detail

In the basic aggregation examples, we describe basic usage for group_aggregate functions and how to specify a level of detail. When using group_aggregate(), the first argument specifies the type of aggregation that should be performed, and the second argument specifies the level of detail at which it should be performed. Your group_aggregate function should be enclosed in an outer aggregation function that specifies the secondary aggregation that should be used when aggregating to a search level of detail that is coarser than the level of detail specified within the group_aggregate function.

Performing a secondary level of aggregation across disjoint levels of detail may give nonsensical results. For example, consider aggregating to the activity classification level and then "aggregating up" to the account level.

Multiple accounts have multiple classes. When you aggregate to the activity classification level, you are aggregating without including the account level of detail. You are creating a small virtual table with two rows, one for meetings and one for calls. The row for meetings provides the total number of meetings and the row for calls provides the total number of calls.

There is no way to then "aggregate" up to the account level of detail because this virtual table is now at a level of detail that is disjoint from the account level of detail. The results you get in this case will not make sense. In this situation, where your data includes a hierarchy that needs to be maintained in order for your query to make sense, you must use the "query_groups() +" syntax.

Using "query_groups() +"

When you use "query_groups() +", you specify that you want to use the level of detail defined in the search bar and possibly add to it.

If you use query_groups() without the "+", your group_aggregate function uses exactly the level of detail specified in the search bar. This is less useful since you could get the same result without using group_aggregate(). Add arguments after the "+" to specify the additional attributes that you want to add to the level of detail specified in the search bar.

Use "query_groups() +" to add level of detail

To practice using "query_groups () +", complete the following steps:

Begin your search, using Meetings_tiny.csv as your data source.
Enter hours account in the search bar and select Go.
Click the more options icon and select Add formula.
In the formula pop-up window, name your formula. Here, we have named the formula average hours per activity classification.

Enter the following formula and select Save:

	average(
	  group_aggregate(
	    sum(hours),
	    query_groups() + {activity classification},
 	    {}
	  )
	)

In this example, the search bar specifies the "account" level of detail. The formula adds "activity classification", thus summing hours to the {account, activity classification} level of detail. The outer level of aggregation then averages the values computed at the {account, activity classification} level of detail to the account level of detail.

Search for: account average hours per activity classification sort by account.
The following table results:

Table 1. Example 2.1.1

Account Hours per activity classification averaged

Amazon

11

Oracle

18

Redshift

3

Snowflake

3.5

Table 1. Example 2.1.1
Account	Hours per activity classification averaged
Amazon	11
Oracle	18
Redshift	3
Snowflake	3.5

One way to understand this result is to create a chart without the formula, using the following search:

Search for: account hours activity classification sort by account sort by activity classification.
This creates the following table:

Table 2. Example 2.1.2

Account Activity Classification Total Hours

Amazon

meeting

11

Oracle

meeting

18

Redshift

call

1

Redshift

meeting

5

Snowflake

call

5

Snowflake

meeting

2

The portion of the hours per activity formula within the "group_aggregate" argument aggregates (sums) hours to the {account, activity classification} level of detail:
```
group_aggregate(
	sum(hours),
	query_groups() + {activity classification},
	{}
	)
```
This is comparable to what is shown in Table 2.1.2.

The second level of aggregation specified by the average surrounding the group_aggregate averages the values associated with each account.:
```
average(
	group_aggregate(
		sum(hours),
		query_groups() + {activity classification},
		{}
		)
    )
```
In this case, the function performs the following calculations:

Snowflake: (5 + 2)/2 = 3.5

Redshift: (1 + 5)/2 = 3

Oracle: 18/1 = 18

Amazon: 11/1 = 11

Table 2. Example 2.1.2
Account	Activity Classification	Total Hours
Amazon	meeting	11
Oracle	meeting	18
Redshift	call	1
Redshift	meeting	5
Snowflake	call	5
Snowflake	meeting	2

This results in the values computed by the hours per activity classification averaged formula shown in Table 2.1.1.

Using "query_groups() -"

In this section we explore how to create a formula that is based off of the level of detail in the search bar, but which subtracts out certain dimensions specified in the search bar. We do this by using the "query_groups() -" syntax. This syntax means, "use the level of detail specified by the enclosing context, which is the search bar, and then remove the portion specified by text following the "-"."

Use "query_groups() -" to remove level of detail

Begin your search, using Meetings_tiny.csv as your data source.
Enter account activity date in the search bar and select Go.
Click the more options icon and select Add formula.
In the formula pop-up window, name your formula. Here, we name the formula hours ignoring activity date.

Enter the following formula and select Save:

average(
	group_aggregate(
		average(hours),
		query_groups() - {activity date},
		{}
		)
	)

In this example, the search bar specifies the "account" and "activity date" level of detail. The formula removes the activity date and averages hours to the account level of detail. The outer level of aggregation has no effect in this case, but it is good practice to always provide the outer level of aggregation.

Search for: account activity date hours ignoring activity date sort by account sort by activity date.

The following table results:

Table 3. Example 3.1
Account	Activity Date	Hours ignoring activity date
Amazon	04/01/2020	2.75
Amazon	04/05/2020	2.75
Amazon	04/16/2020	2.75
Amazon	04/18/2020	2.75
Oracle	04/01/2020	6.00
Oracle	04/15/2020	6.00
Oracle	04/16/2020	6.00
Redshift	02/01/2020	2.00
Redshift	04/05/2020	2.00
Redshift	04/06/2020	2.00
Snowflake	02/01/2020	1.75
Snowflake	04/01/2020	1.75
Snowflake	04/04/2020	1.75
Snowflake	04/15/2020	1.75

Note that in the preceding table, the Hours value for each row of an account is the same. For example, the four rows for Snowflake all show an Hours value of 1.75. Similarly, the Hours value for Amazon (2.75) is the same in each of the Amazon rows. This is because Activity Date is being ignored and Hours ignoring activity date is computed at the Account level of detail.

Aggregate hours ignoring activity date

In this example we use the same formula as that used in Example 3.1.

In Example 3.1, our search bar contained account and activity date. Here we add activity classification, so we are now computing Hours ignoring activity date at the {Account, Activity Classification} level of detail.

Search for: account activity classification activity date hours ignoring activity date sort by account sort by activity date.

The following table results:

Table 4. Table 3.2
Account	Activity Classification	Activity Date	Hours ignoring activity date
Amazon	meeting	04/01/2020	2.75
Amazon	meeting	04/05/2020	2.75
Amazon	meeting	04/16/2020	2.75
Amazon	meeting	04/18/2020	2.75
Oracle	meeting	04/01/2020	6.00
Oracle	meeting	04/15/2020	6.00
Oracle	meeting	04/16/2020	6.00
Redshift	meeting	02/01/2020	2.50
Redshift	meeting	04/05/2020	2.50
Redshift	call	04/06/2020	1.00
Snowflake	meeting	02/01/2020	2.00
Snowflake	call	04/01/2020	1.67
Snowflake	call	04/04/2020	1.67
Snowflake	call	04/15/2020	1.67

In the preceding table, note that the three rows for Snowflake where the Activity Classification is "call" have an Hours value of 1.67, while the line for Snowflake where the activity classification is "meeting" has an Hours value of 2.

Related information:

Filters in group_aggregate()

Was this page helpful?Give us feedback!

ThoughtSpot is the Modern Analytics Cloud company. Our Live Analytics services deliver personalized, actionable insights at the point of impact for every user, at every level.

Product

By Role

By Department

Solutions

About Us

Connect

(800) 508-7008