Tuesday, June 23, 2009

Look forward to look back - Make your MDX problem part of the DB

A recent issue I encountered with a customer of mine was a performance issue with the MDX that had been created to handle a particular type of last year comparisons. For the sakes of simplicity, I will change the scenario slightly. There is a newsgroup post about this issue on http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/ff4f2fcb-ae33-46e1-aca4-06504e23743e/?prof=required

The problem is to do when you are comparing sales of a group of shops to the sales of the same shops last year. Imagine you had a set of the shop that are coloured as follows
Shop Colour
A Green
B Green
C Green
D Red
E Red
F Blue

Now if you want to compare the sales of shops that are blue with the sales for last year, initial you might try something like; (I am doing a simple Last year Variance, i.e. This year less Last Year)

([Measures].[Sales],[Shops].[Colour].&[Blue])

-

(parallelperiod([Time].[Time].[Year],1,[Time].[Time].currentmember),[Measures].[Sales],[Shops].[Colour].&[Blue])

This compares my sales of blue shops to the sales of blue shops from last year. Right? Well in my customers case, wrong. It turns out that in their case, they want to compare sales of shops that are blue this year with the sales of those same shops last year.

So if my data set is

Shop Colour/Sales This Year Colour Last Year
A Green/1000 Green/1500
B Blue/2000 Blue/2100
C Green/1500 Green/1400
D Red/1250 Blue/1100
E Red/1500 Red/1750
F Blue/2100 Red/1500

Then my result is (2000 + 2100) - (2100 + 1500) = 500 ( (B+F)_ThisYear - (B+F)_LastYear)
and not (2000 + 2100) - (2100 + 1100) = 1,000 ( (B+F)_ThisYear - (B+D)_LastYear)

Now it turn out the new way of doing this is a bit more difficult than the original method. The MDX was correct but slow. For the current time member and colour, you would have to get the set of matching current stores. ({B,F} in my data set above.) Now for this set of shops, you have to perform a sum across this set of store. So you calculation looks something like

([Measures].[Sales],[Shops].[Colour].&[Blue])
-
sum(exists([Shops].[Shops].children, [Shops].[Colour].&[Blue]),
(parallelperiod([Time].[Time].[Year],1,[Time].[Time].currentmember),[Measures].[Sales]))

That sum and exists is a killer for performance and knocked the query into the slow Cell Evaluation mode and not the fast Block evaluation mode. All manner of rewrites of the MDX did little to change the performance of the query.

In the end, some of the solution had to be moved into the DB Layer. We created a table that had time and the shop as the composite key. With this key there was an attribute called "NextYearColour". This table was used to create a new dimension called ShopVersion with an Attributes call NextYearColour. The table was also used as measure group. This was important so we could join time to the table. Now our query looks like

([Measures].[Sales],[Shops].[Colour].&[Blue])
-
(parallelperiod([Time].[Time].Currentmember,[Time].[Year].[Year],[Measures].[Sales],[ShopVersion].[NextYearColour].&[Blue]))

Now that the sum and the exists are gone, we are back to fast performance again.

I guess the moral of the story here is to always consider putting something into the DB layer and making it real in the cube. MDX is great and you can almost always get MDX to get you the result, but the trade off may often be in performance.

I hope this blog post (my first as it happens!) proves to be useful to someone.

Mark