"Querygroups() +" and "querygroups() -"

In this article, we walk through examples of aggregation using "querygroups() +" and "querygroups() -". To test for yourself, use the data set we provide in the prerequisite.

Prerequisites

Before attempting the following practice examples, follow these steps to upload the sample dataset:

Click the dropdown to see the steps to upload the dataset.
  1. Click here to download the sample data set.

  2. Sign in to your ThoughtSpot account, and click the Data tab.

  3. Click the more icon more 10px menu icon in the top-right of the screen. From the menu, select Upload data.

  4. Upload the data file, either by dragging and dropping it into the field provided, or by clicking Browse your files and selecting Meetings_tiny.csv.

    1. To answer Are the column names already defined in the file header?, select Yes.

    2. To answer Are the fields separated by?, select Comma (,).

  5. Select Next at the bottom right of the screen to proceed.

  6. Select Next to proceed without changing the column names. For the sake of the formulas used in the examples, column names should not be changed.

  7. Review the column types in Step 3 and select Upload to finish.

  8. Click the search icon on the final screen of the upload process to begin a search based on the sample data set.

Aggregating across multiple levels of detail

In the basic aggregation examples, we describe basic usage for group_aggregate functions and how to specify a level of detail. When using group_aggregate(), the first argument specifies the type of aggregation that should be performed, and the second argument specifies the level of detail at which it should be performed. Your group_aggregate function should be enclosed in an outer aggregation function that specifies the secondary aggregation that should be used when aggregating to a search level of detail that is coarser than the level of detail specified within the group_aggregate function.

Performing a secondary level of aggregation across disjoint levels of detail may give nonsensical results. For example, consider aggregating to the activity classification level and then "aggregating up" to the account level.

Multiple accounts have multiple classes. When you aggregate to the activity classification level, you are aggregating without including the account level of detail. You are creating a small virtual table with two rows, one for meetings and one for calls. The row for meetings provides the total number of meetings and the row for calls provides the total number of calls.

There is no way to then "aggregate" up to the account level of detail because this virtual table is now at a level of detail that is disjoint from the account level of detail. The results you get in this case will not make sense. In this situation, where your data includes a hierarchy that needs to be maintained in order for your query to make sense, you must use the "query_groups() +" syntax.

Using "query_groups() +"

When you use "query_groups() +", you specify that you want to use the level of detail defined in the search bar and possibly add to it.

If you use query_groups() without the "+", your group_aggregate function uses exactly the level of detail specified in the search bar. This is less useful since you could get the same result without using group_aggregate(). Add arguments after the "+" to specify the additional attributes that you want to add to the level of detail specified in the search bar.

Use "query_groups() +" to add level of detail

To practice using "query_groups () +", complete the following steps:

  1. Begin your search, using Meetings_tiny.csv as your data source.

  2. Enter hours account in the search bar and click Go.

  3. Click the more options icon icon more 10px and select Create formula.

  4. In the formula pop-up window, name your formula. Here, we have named the formula average hours per activity classification.

  5. Enter the following formula and click Save:

    	average(
    	  group_aggregate(
    	    sum(hours),
    	    query_groups() + {activity classification},
     	    {}
    	  )
    	)
    In this example, the search bar specifies the "account" level of detail. The formula adds "activity classification", thus summing hours to the {account, activity classification} level of detail. The outer level of aggregation then averages the values computed at the {account, activity classification} level of detail to the account level of detail.
  6. Search for: account average hours per activity classification sort by account.

  7. The following table results:

    Table 1. Example 2.1.1
    Account Hours per activity classification averaged

    Amazon

    11

    Oracle

    18

    Redshift

    3

    Snowflake

    3.5

One way to understand this result is to create a chart without the formula, using the following search:

  1. Search for: account hours activity classification sort by account sort by activity classification.

  2. This creates the table shown below:

    Table 2. Example 2.1.2
    Account Activity Classification Total Hours

    Amazon

    meeting

    11

    Oracle

    meeting

    18

    Redshift

    call

    1

    Redshift

    meeting

    5

    Snowflake

    call

    5

    Snowflake

    meeting

    2

    The portion of the hours per activity formula within the "group_aggregate" argument aggregates (sums) hours to the {account, activity classification} level of detail:

    group_aggregate(
    	sum(hours),
    	query_groups() + {activity classification},
    	{}
    	)

    This is comparable to what is shown in Table 2.1.2 above.

    The second level of aggregation specified by the average surrounding the group_aggregate averages the values associated with each account.:

    average(
    	group_aggregate(
    		sum(hours),
    		query_groups() + {activity classification},
    		{}
    		)
        )

    In this case, the function performs the following calculations:

    Snowflake: (5 + 2)/2 = 3.5

    Redshift: (1 + 5)/2 = 3

    Oracle: 18/1 = 18

    Amazon: 11/1 = 11

This results in the values computed by the hours per activity classification averaged formula shown in Table 2.1.1 above.

Using "query_groups() -"

In this section we explore how to create a formula that is based off of the level of detail in the search bar, but which subtracts out certain dimensions specified in the search bar. We do this by using the "query_groups() -" syntax. This syntax means, "use the level of detail specified by the enclosing context, which is the search bar, and then remove the portion specified by text following the "-"."

Use "query_groups() -" to remove level of detail

  1. Begin your search, using Meetings_tiny.csv as your data source.

  2. Enter account activity date in the search bar and click Go.

  3. Click the more options icon icon more 10px and select Create formula.

  4. In the formula pop-up window, name your formula. Here, we name the formula hours ignoring activity date.

  5. Enter the following formula and click Save:

    average(
    	group_aggregate(
    		average(hours),
    		query_groups() - {activity date},
    		{}
    		)
    	)
    In this example, the search bar specifies the "account" and "activity date" level of detail. The formula removes the activity date and averages hours to the account level of detail. The outer level of aggregation has no effect in this case, but it is good practice to always provide the outer level of aggregation.
  6. Search for: account activity date hours ignoring activity date sort by account sort by activity date.

  7. The following table results:

    Table 3. Example 3.1
    Account Activity Date Hours ignoring activity date

    Amazon

    04/01/2020

    2.75

    Amazon

    04/05/2020

    2.75

    Amazon

    04/16/2020

    2.75

    Amazon

    04/18/2020

    2.75

    Oracle

    04/01/2020

    6.00

    Oracle

    04/15/2020

    6.00

    Oracle

    04/16/2020

    6.00

    Redshift

    02/01/2020

    2.00

    Redshift

    04/05/2020

    2.00

    Redshift

    04/06/2020

    2.00

    Snowflake

    02/01/2020

    1.75

    Snowflake

    04/01/2020

    1.75

    Snowflake

    04/04/2020

    1.75

    Snowflake

    04/15/2020

    1.75

    Note that in the table above, the Hours value for each row of an account is the same. For example, the four rows for Snowflake all show an Hours value of 1.75. Similarly, the Hours value for Amazon (2.75) is the same in each of the Amazon rows. This is because Activity Date is being ignored and Hours ignoring activity date is computed at the Account level of detail.

Aggregate hours ignoring activity date

In this example we use the same formula as that used in Example 3.1.

In Example 3.1, our search bar contained account and activity date. Here we add activity classification, so we are now computing Hours ignoring activity date at the {Account, Activity Classification} level of detail.

  1. Search for: account activity classification activity date hours ignoring activity date sort by account sort by activity date.

  2. The following table results:

    Table 4. Table 3.2
    Account Activity Classification Activity Date Hours ignoring activity date

    Amazon

    meeting

    04/01/2020

    2.75

    Amazon

    meeting

    04/05/2020

    2.75

    Amazon

    meeting

    04/16/2020

    2.75

    Amazon

    meeting

    04/18/2020

    2.75

    Oracle

    meeting

    04/01/2020

    6.00

    Oracle

    meeting

    04/15/2020

    6.00

    Oracle

    meeting

    04/16/2020

    6.00

    Redshift

    meeting

    02/01/2020

    2.50

    Redshift

    meeting

    04/05/2020

    2.50

    Redshift

    call

    04/06/2020

    1.00

    Snowflake

    meeting

    02/01/2020

    2.00

    Snowflake

    call

    04/01/2020

    1.67

    Snowflake

    call

    04/04/2020

    1.67

    Snowflake

    call

    04/15/2020

    1.67

    In the table above, note that the three rows for Snowflake where the Activity Classification is "call" have an Hours value of 1.67, while the line for Snowflake where the activity classification is "meeting" has an Hours value of 2.

Related information: