Introduction
The choice to use calculated in columns in DAX often comes down to model performance. Experienced developers will prefer coding dynamic solutions to adding unnecessarily to the model’s memory footprint.
Then there are times when judicious use of calculated columns can save hours of aggravation and massively boost performance. At other times, things may simply not be possible without a well placed calculated column.
Introducing EARLIER
EARLIER is an interesting function, it’s very simple, but when we come understand this function, we are beginning to understand DAX.
Although EARLIER has mostly been superseded with the addition of variables, it remains a worthwhile function to know for calculated columns. We will look at a simple example from the Contoso database available here. Then explain what this function is actually doing, and why it works.
Adding a Column
We have deleted the relationship between Channel and the Sales table. Instead of having this as a dimension, we will embed the values into the fact table with help from EARLIER.
Let’s click on the Sales table and add a new calculated column.
Next we will add the measure below.
ChannelName = CALCULATE(LASTNONBLANK(Channel[ChannelName],1),
Channel[Channel] = EARLIER(Sales[channelKey]))
Like magic, ChannelName is now embedded in our fact table.
A function like MAX, MIN, or AVERAGE would be also be fine to use here depending what we plan to return. In this case we have used LASTNONBLANK to return the last non-blank value from the column, where the expression is true. Since we specify “1” for the expression, that effectively means out of all existing column values.
Note that LASTNONBLANK will traverse in the original order that data was sorted when it was loaded so sometimes we would have to narrow the field with additional filter terms when using this approach.
Other Use Cases
A real use case for this might be if you have data that is the wrong granularity to key to a table. Another might be if we want to summarize customer baskets. Still another is where we have a list of promotions with end dates, but no date key and we need to somehow indicate which products were on offer.
Below is a measure that will calculate a cumulative sum of Contoso Sales for each date, we don’t have to embed this into a fact table, it could be a calculated table used as a helper for calculations.
DailyContoso = CALCULATE(SUM(Sales[SalesAmount]),
Sales[DateKey] = EARLIER(Sales[DateKey]), 'Product'[BrandName] = "Contoso")
The $304.12 value above is the sales amount for Contoso on March 22, 2013. It will only be embedded on lines containing Contoso, and each date will have its own cumulative total.
To understand why this works we need understand a bit about DAX evaluation contexts.
DAX Contexts
DAX has two primary types of evaluation context.
- Row Context: An algorithm parses a table one row at a time.
- Filter Context: Calculations are evaluated with the specified filters engaged.
When we specify a calculated column the formula engine will traverse each row in the table, in sequence, and perform our calculation. When we use CALCULATE that will tell DAX to exit the row context and evaluate the expression on the table, after applying the filters we have specified.
With CALCULATE the sum without filters will end up being the entire column from the bottom to the top, that is a complete unconstrained sum of the column.
This is where EARLIER comes into play. When we use this function we tell the DAX formula engine to go back to the context it just exited when we called CALCULATE, that is the original row context.
The column name passed to EARLIER then tells the formula engine to find the value, in the row it was just iterating. In our original example, we asked EARLIER to return the value of ChannelKey.
ChannelName = CALCULATE(LASTNONBLANK(Channel[ChannelName],1),
Channel[Channel] = EARLIER(Sales[channelKey]))
In the above example, we are filtering the column Channel[Channel], which is the foreign key, to be only the value of the ChannelKey in the current row context. Then we return the last value that is not blank, and that becomes the new value for the row in the calculated column.
Conclusion
There is a trade off between performance, and model memory, and recognizing when these type of techniques are appropriate is an important part of PowerBI development. We can often achieve performance boosts with judicious use of calculated columns, or tables, while still adhering to good design principals.
EARLIER captures the spirit of DAX language, and contemplating this useful little function can help us gain a better understanding of evaluation contexts in PowerBI.
References
Russo, Marco; Ferrari, Alberto. The Definitive Guide to Dax, 2nd Ed. Microsoft Press, 2019.
Recent Post
Peeking inside the basket with lists
- 31 December 2024
- 5 min read
Streamline Workflows in R Studio
- 23 November 2024
- 6 min read
Customer Clusters with Gaussian Mixed Models
- 22 October 2024
- 8 min read
Text Sentiment Analysis with Hugging Face
- 28 September 2024
- 4 min read
Product Graph Analytics
- 21 August 2024
- 11 min read