Reaggregation Scenario 4: Average period value for semi-additive numbers II
Semi-additive numbers may be aggregated across some, but not all, dimensions. They commonly apply to specific time positions. In this scenario, we have daily position values for home loans, and therefore cannot aggregate on the date dimension.
Here, we consider a somewhat different situation than in Reaggregation Scenario 3: Average period value for semi-additive numbers I.
In some financial circumstances, the average daily balance has to be calculated, even if the balance does not exist. For example, if a banking account was opened on the 15th of June, business requirements have to consider all the days in the same month, starting with the 1st of June. Importantly, we cannot add these ‘missing’ data rows to the data set; note that the solution used in Scenario 3 returns an average only for the period that has data, such as June 15th to 30th, not for the entire month of June. The challenge is to ensure that in the daily average formula, the denominator returns the total days in the selected period, not just the days that have transactions:
sum(loans) / sum(days_in_period)
To solve for this, consider the data model:
The fact table
transactionsreports the daily position for each account, and uses a
The dimension table
datetracks information for each date, starting with the very first transaction, all the way through the most recent transaction. This table includes the expected
days_in_periodcolumn that has a value of 1 in each row.
Worksheets use the
datecolumn with keywords such as weekly, monthly, yearly to change the selected period.
When users run a search with the monthly keyword, the denominator must reflect the number of days in each month.
A valid query that meets our objective may look something like this:
The following code in the denominator definition returns the total number of days for the period, regardless whether there are transactions, or what filters apply:
sum(days_in_period)function aggregates to:
No other search columns appear.
We require the entire period, so there are no filters.
datekeywords yearly, quarterly, monthly, and weekly apply because we use the same column in both the search and the aggregation function.
So, the function results in the following output when it runs with the yearly keyword in search:
This data is not reaggregated because we want to return the result at the appropriate