Reaggregation Scenario 4: Average period value for semi-additive numbers II

Semi-additive numbers may be aggregated across some, but not all, dimensions. They commonly apply to specific time positions. In this scenario, we have daily position values for home loans, and therefore cannot aggregate on the date dimension.

Here, we consider a somewhat different situation than in reaggregation-seimi-additive-1.adoc.

In some financial circumstances, the average daily balance has to be calculated, even if the balance does not exist. For example, if a banking account was opened on the 15th of June, business requirements have to consider all the days in the same month, starting with the 1st of June. Importantly, we cannot add these ‘missing’ data rows to the data set; note that the solution used in Scenario 3 returns an average only for the period that has data, such as June 15th to 30th, not for the entire month of June. The challenge is to ensure that in the daily average formula, the denominator returns the total days in the selected period, not just the days that have transactions:

sum(loans) / sum(days_in_period)

To solve for this, consider the data model:

  • The fact table transactions reports the daily position for each account, and uses a loan column.

  • The dimension table date tracks information for each date, starting with the very first transaction, all the way through the most recent transaction. This table includes the expected date column, and days_in_period column that has a value of 1 in each row.

  • Worksheets use the date column with keywords such as weekly, monthly, yearly to change the selected period.

  • When users run a search with the monthly keyword, the denominator must reflect the number of days in each month.

Valid solution

A valid query that meets our objective may look something like this:

The following code in the denominator definition returns the total number of days for the period, regardless whether there are transactions, or what filters apply:

group_aggregate (sum(days_in_period),{Date},{})

Resolution

  1. The sum(days_in_period) function aggregates to:

    • {Date}

      No other search columns appear.

    • {}

      We require the entire period, so there are no filters.

      The date keywords yearly, quarterly, monthly, and weekly apply because we use the same column in both the search and the aggregation function.

So, the function results in the following output when it runs with the yearly keyword in search:

Year Result

2016

366

2017

365

2018

365

2019

365

2020

366

  1. This data is not reaggregated because we want to return the result at the appropriate date level.

Alternate Solution

To return only the number of days that have existing transactions, use the following code in the denominator:

sum(days_in_period)