Modeling semi-additive measure data for slowly-changing dimensions

Sometimes, you want to model data for search using a model that captures both the current value and the historical values of an attribute. You need to plan for the data you expect to see as a visualization; Do you expect to see the current value of an attribute and apply that across all historical transactions, or do you expect to see the attribute at the time of the transaction?

Surrogate keys

We recommend using surrogate keys to model slowly-changing dimensions. Surrogate keys are when fact and dimensional tables are joined on the key column rather than on the product unique identifier. For the following table, the surrogate key is the Productscdkey column:

Productscdkey Productid Product Name Partition Date

1_1

1

name1_1

01/07/2023

1_2

1

name1_2

01/08/2023

1_3

1

name1_3

01/15/2023

1_4

1

name1_4

01/31/2023

2_1

2

name2_1

01/31/2023

3_1

3

name3_1

01/29/2023

3_2

3

name3_2

01/31/2023

The surrogate key uniquely identifies the dimension row that was valid at that transaction date. For example, this table shows three products, with IDs 1, 2, and 3. The names of products 1 and 3 changed over time. For product 1, name1_1 was valid until the 7th, name 1_2 was valid from the 8th to the 15th, name1_3 was valid from the 15th to the 31st, etc.

In order to make tables of slowly-changing dimensions easier to read, we recommend including a zeroed transaction so that the balance for date periods (weekly, monthly, quarterly, yearly) displays only the balance for the most recent record. If you do not zero the transactions, ThoughtSpot displays the value of the balance at each attribute change.

To zero the transactions for this table, you would enter a balance of zero for each change of the Productscdkey column, for the date after which each product name was changed. This shows a zeroed balance for visualizations, without affecting the data for the actual attribute.

If you are unable to change the underlying data of a table to zero out transactions, you can use range joins rather than surrogate keys. In this method, you join fact and dimensional tables via a range join for the transaction date and product identifier. For example, you join fact.productid to dim.productid and join fact.date to dim.date with fact.date >=dim.fromdate and fact.date <= dim.todate.

For date periods (weekly, monthly, quarterly, yearly), ThoughtSpot displays the value for each of the attributes in the period.

ThoughtSpot does not display the balance for the end of the period, but rather the last balance for the value that changed in the period.

We do not recommend data models which include slow-changing dimension attributes on the fact table, causing the fact grain to be lower than the foreign key columns.

Consider the following table:

Date Productid Storeid Balance Currentprice Markdownflag Regularprice

01/01/2023

1

1

10

9

c

10

01/02/2023

1

1

10

{Null}

{Null}

10

01/03/2023

1

1

9

9

c

10

01/04/2023

1

1

0

9

c

10

01/05/2023

1

1

20

5

d

10

01/06/2023

1

1

5

5

d

10

01/07/2023

1

1

2

5

d

10

The correct result for the last value of this balance on a weekly granularity is:

Weekly Transaction Date Product Name Total Balance Partition Date

12/26/2022

name1

10

01/01/2023

01/02/2023

name1

2

01/07/2023

Here, ThoughtSpot displays the weekly transaction date as the first date of the week.

However, if you include the attributes current price markdown flag regular price in your search, the following table results:

Weekly Transaction Date Product Name Current Price Markdown flag Regular Price Total Balance

12/26/2022

name1

9

c

10

10

01/02/2023

name1

{Null}

{Null}

10

10

01/02/2023

name1

9

c

10

0

01/02/2023

name1

5

d

10

2

Note that due to the attribute changes, the table records multiple balances for the week beginning on 01/02/2023.

We recommend capturing the current price, markdown flag, and regular price as slowly-changing dimensions with a surrogate key. If you cannot modify the underlying data structure, you can either create formulas to ignore the attributes in a table that give you repeating balances for the same date or write last_value formulas for the attribute columns.

Formulas that ignore attributes in a fact table

To create formulas that ignore the attributes in a table, you could use the following base formula:

balance = last_value(sum(balance),query_groups(),{transaction date})

In this example, the query groups would be the weekly date, the product name, the current price, the markdown flag, and the regular price. The formula returns the value for each combination of attributes, like the table above.

To modify the formula so it ignores the attribute grouping, you can use the following formula:

balance ignoring attributes grouping = last_value(sum(balance),query_groups() -(currentprice, markdownflag, regularprice),{transaction date})

This formula returns the correct value for the balance at the end of the week, assuming you haven’t applied any filters.

To modify the formula so it ignores attribute grouping and filters, use the following formula:

Balance ignore attribute filters = group_aggregate(sum(last_value(sum(balance), query_groups()-(currentprice, regularprice,markdownflag),{transaction date})), query_groups(), query_filters() - (currentprice, regularprice, markdownflag))

Formulas for attributes in a fact table

If you want to instead view the last value of attributes in a given period, you can create formulas like the following:

Last Current Price = max(last_value(max(currentprice), query_groups(), {transaction date}))

Last Regular Price = max(last_value(max(Regularprice), query_groups(), {transaction date}))

Last Markdown Flag = max(last_value(max(Markdownflag), query_groups(), {transaction date}))

Limitations

  • Semi-additive functions cannot span multiple fact tables.

  • Semi-additive functions cannot contain only constant expressions. For example, if you create a formula with no references to a column, such as last_value(sum(1), {}, {true}), ThoughtSpot will not support the function.

  • You cannot combine different partitioning and ordering clauses in different semi-additive functions from the same table, in the same query. That is, a case where formula 1 partitions on Date and Product, and formula 2 partitions on Date, Product, and Client.

  • Average, Variance, Standard Deviation and Unique Count do not work with semi-additive functions across an attribution query. That is, multiple fact tables with at least one non-shared attribute.

  • We do not support semi-additive functions and unique count functions from the same table. Note that a work-around exists by wrapping the unique count in a group_aggregate function. For example, group_aggregate(unique_count(product),query_groups()+{},query_filters()).

  • Advanced aggregates (group, cumulative, moving and rank) cannot be used within the definition of semi-additive functions. Note they can be used to wrap these functions as outer aggregation.



Was this page helpful?