Basic use of group aggregation

In this article, we walk through examples of basic aggregation. To test for yourself, use the data set we provide in prerequisites.

Prerequisites

Before attempting the following practice examples, follow these steps to upload the sample dataset:

Click the dropdown to see the steps to upload the dataset.
  1. Click here to download the sample data set.

  2. Sign in to your ThoughtSpot account, and click the Data tab.

  3. Click the more More options menu icon menu icon in the upper-right of the screen. From the menu, select Upload data.

  4. Upload the data file, either by dragging and dropping it into the field provided, or by clicking Browse your files and selecting Meetings_tiny.csv.

    1. To answer Are the column names already defined in the file header?, select Yes.

    2. To answer Are the fields separated by?, select Comma (,).

  5. Select Next at the lower right of the screen to proceed.

  6. Select Next to proceed without changing the column names. For the sake of the formulas used in the examples, column names should not be changed.

  7. Review the column types in Step 3 and select Upload to finish.

  8. Click the search icon on the final screen of the upload process to begin a search based on the sample data set.

Basic aggregation: Aggregate the sum of hours at account level of detail

First, let’s try a formula that aggregates sum(hours) at the account level. If the search level of detail is coarser than account, it "aggregates up" by averaging the aggregation at the account level.

To try using this formula, follow these steps:

  1. Begin your search, using Meetings_tiny.csv as your data source.

  2. Enter hours account in the search bar and select Go.

  3. Click the more options icon More options menu icon and select Add formula.

  4. In the formula pop-up window, name your formula. Here, we name the formula hours per account.

  5. Enter the following formula and select Save:

    average(
      group_aggregate(
        sum(hours),
        {account},
        {}
        )
      )
  6. In the search bar, enter the following terms: account hours hours per account sort by account.

  7. The resulting table shows that the search level aggregation of hours (Total hours) is the same as the aggregation specified by our formula, hours per account. We see this result because both columns aggregate to the account level.

    Table 1. Example 1.1
    Account Total hours Hours per account

    Amazon

    11

    11

    Oracle

    18

    18

    Redshift

    6

    6

    Snowflake

    7

    7

Using the same formula shown in the previous example, let’s see what changes when our search does not specify any level of detail:

  1. Search for: hours hours per account.

  2. The following table results:

    Table 2. Example 1.2
    Total hours Hours per account

    42

    10.5

    In this example, we do not specify a dimension in the search bar, so the search defines the level of detail as the whole table. In other words, our search is at the coarsest level of detail while the formula specifies an aggregation at a finer level of detail (the account level of detail). The first column of the preceding table displays the aggregation of all hours to the table (coarsest) level.

    The second column, which shows the result of our formula, computes as follows:

    • First, aggregate using the inner_aggregation function (sum) to the account level of detail, which gives results like those shown in example 1.1 (values as 7, 6, 11, and 18 for the account level hours).

    • Then, aggregate to the coarser search level of detail by using the outer aggregation function, which is Average. That average gives us (7 + 6 + 11 + 18)/4, which results in the 10.5 value shown in the preceding example.

Aggregating when the search level of detail is finer than formula

In this example, we use the same formula as we used earlier in the article. This time, we use a search that specifies a level of detail that is finer than the account level.

  1. Search for: account activity classification hours hours per account sort by account.

    Activity classification is a finer level of detail because there are multiple activity classes for each account. In this case, our formula aggregates to the account level and then "copies down" or "replicates" to the finer level of detail.

  2. The resulting table looks like this:

    Table 3. Example 1.3
    Account Activity classification Total hours Total hours per account

    Amazon

    meeting

    11

    11

    Oracle

    meeting

    18

    18

    Redshift

    meeting

    5

    6

    Redshift

    call

    1

    6

    Snowflake

    meeting

    2

    7

    Snowflake

    call

    5

    7

Aggregating with multiple levels of detail in your formula

If we want to specify multiple levels of detail within the formula, we can do that by using a comma-separated list of dimensions. Our small data set makes coming up with a "useful" example difficult, but it can still demonstrate this functionality.

In the following example, our inner aggregation aggregates to the account and activity classification level of detail.

  1. Enter hours account in the search bar and select Go.

  2. Click the more options icon More options menu icon and select Add formula.

  3. In the formula pop-up window, name your formula. Here, we name the formula average of average by account and activity classification.

  4. Enter the following formula and select Save:

    average(
      group_aggregate(
        average(hours),
        {account, activity classification},
        {}
        )
      )
    1. When the search specifies account and activity classification as the level of detail, this formula generates the same values as those generated by the simple average hours aggregation. You can see this with the following search:

      1. Search for: account activity classification average hours average of average by account and activity classification sort by account sort by activity classification

        The following table results:

        Table 4. Example 1.4 - Supporting
        Account Activity Classification Average Hours Average of average by account and activity classification

        Amazon

        meeting

        2.75

        2.75

        Oracle

        meeting

        6

        6

        Redshift

        call

        1

        1

        Redshift

        meeting

        2.5

        2.5

        Snowflake

        call

        1.67

        1.67

        Snowflake

        meeting

        2

        2

    2. If we aggregate to the table level by removing account and activity classification from the search, our formula "aggregates" up, resulting in a table with a single value that represents the average of the preceding values. If we put both average hours and our formula in this table level table, we can see that they result in different values.:

      1. Search for: average of average by account and activity classification average hours.

        The following table results:

        Table 5. Example 1.4
        Average of average by account and activity classification Average hours

        2.65

        3

        We receive this result because our formula adds the six values that represent the average hours by account and activity classification and then divides by six, while "average hours" sums the hours value from each of the 14 lines in the original table, and then divides by 14.

There are many business use cases that require very specific aggregation; understanding the subtleties of level of detail calculations and how to implement them with group_aggregate functions key to being successful at implementing these more complex aggregations.

For more examples of aggregation, see Using "query_groups() +/-".


Was this page helpful?