Analysis Services Partitions

Partitions are a great feature in Analysis Services that allow you to split measure groups in to separate physical files. All measure groups by default have one partition but by splitting that partition you will gain improved query and processing performance.

Partitions can be split in any way you see fit. Many people choose to separate measure group data by date. This makes sense because just about every fact table is going to have a date dimension associated with it. It also makes for a clear way to draw a line in the sand where each partition can be separated. For example, your company has three years worth of sales data so you decide to split the sales measure group into three partitions, one for each year. Depending on the sales volume you may decide to split it even further down to each quarter or even daily.

Multiple partitions can only be deployed to a server running SQL Server enterprise edition. However, if you are developing on a machine that uses standard but will be deploying to enterprise then you can impersonate developing in enterprise by right-clicking on the project file in the Solution Explorer and clicking Properties. Here you can change to the edition of SQL Server you are deploying to and this will also change the restrictions in BIDS on what you can and cannot do.

1

So why do you want to add partitions to your cube? Well the leading reason is to increase performance. So why does adding partitions actually help performance? It’s going to help in several ways.

Query Performance

Query performance will be increased because rather than querying an entire measure group Analysis Services can isolate a single partition to search. It can also search multiple partitions in parallel if need be.

Processing Performance

Partitions will also increase processing performance. Processing performance is optimized for the same reason it was while querying with partitions. Instead of processing the entire measure group you can just process your most current data skipping over your partitions that store data that is years old and doesn’t need to be reprocessed. Analysis Services will also allow you to process partitions in parallel. When processing the cube you can

Recommendations

My recommendation for setting up partitions is to not let a measure group exceed 50 million rows before creating a second partition. This is very subjective though and 50 million rows is the maximum you would ever want to see a partition. There is nothing wrong with creating a partition before you reach this number. Again, this is very subjective you could decide to create a new partition after one million rows. I also generally recommend using the default MOLAP storage mode.

2

Here are the very basic differences between the different storage modes:

Storage Mode Description
MOLAP

· Data and aggregations are stored in multi-dimensional format. Makes for slower processing time but faster querying time.

· Cube must be reprocessed to get updated data.

Scheduled MOLAP

· Same as MOLAP but cube refreshes every 24 hours

Automatic MOLAP · Same as MOLAP but updates in the relational database raise events that trigger cube refresh.
Medium Latency MOLAP · Same as MOLAP but updates in the relational database will trigger a switch to Real-time ROLAP while cube is processing. When cube completes processing returns back to MOLAP. Default latency is 4 hours.
Low Latency MOLAP · Same as Medium Latency. Default latency is 30 minutes.
Real-time HOLAP · Data is stored in the relational data source but aggregations are stored in multi-dimensional format. Fast for processing but slow querying.
Real-time ROLAP · Data and aggregations are stored in multi-dimensional format. Fastest for processing but very slow querying.
· No data latency.

Example

Now that you know what partitions are and why they are helpful let’s go through the steps of splitting a measure group into two new partitions. In this example you want to place internet sales that occur before 2003 into a partition called Internet Sales Old and everything after that date should go into a partition called Internet Sales New.

Step One – Restrict Rows to Original Partition

Remember all measure groups have at least one partition by default. Before creating a new partition you must first change the old partition to restrict what rows are brought back. If you don’t do this before trying to create a new partition you will get the following warning:

3

Click on the Source for the original partition so you can restrict the rows that it stores.

4

· Change the Binding type from Table Binding to Query Binding. You could leave this as Table Binding if you separate your measure group data into multiple tables or views.

· When you change this property to Query Binding you will see that it automatically provides you the query that will return back the table with a blank WHERE clause at the end. Remember we want to have a partition with all data prior to 2003 so the WHERE clause needs to be changed to only return data prior to that date (Ex. WHERE OrderDateKey <= 20021231). Once the query has been changed hit OK.

5

Step Two – Create a Second Partition

Rows are now being restricted from your first partition so you can now create a new one.

· Select New Partition under the measure group that is ready for a second partition and hit Next to start the Wizard.

6

· Check the table(s) used for this measure group that should be used in this partition then hit Next.

· Check the box the reads Specify a query to restrict rows and add to the WHERE clause like you did in step one to bring back all the dates after 2002 (WHERE OrderDateKey > 20021231). Be very careful when writing these WHERE clauses. If you accidently did >= instead on > then there would be overlapping sales data for 20021231. I could also accidently exclude a day if on the first partition I used < and the second partition >. This would exclude one days worth of data from my measure group. On the bottom of the dialog box you will see a warning describing the possibility of overlap and missing days. Hit Next when you finish typing the query.

7

· You can select a storage location other then the default if you would like then hit Next.

· On the last screen you can give the partition a name like Internet Sales Old and decide whether you want to design an aggregation now or later. I will write a second article explaining aggregations so select Design aggregations later. After you have named the partition hit Finish.

· Rename the original partition Internet Sales Old

8

Now you can make optional change to Storage Settings that were discussed earlier in the article under the recommendations section. You may also find an option called Enable Proactive Caching in the Storage Settings, which is an option that will also be discussed in a future article. This is a great first step to Performance Tuning your cube.

Analysis Services Dynamic Security

Imagine you have a requirement to capture the user accounts that log into your cube and based on their username they will only be permitted to see specific data.  This can be accomplished by creating dynamic or table based security in Analysis Services which is much less difficult to setup than it sounds like.  In this article I will walk you through the setup needed on the relational database side as well as Analysis Services side to get this working in your environment.

You will start by identifying what attributes you want to base your security off of.  For example, maybe I want User1 to only be able to see the sales for the states Florida, Georgia and Louisiana, while User2 can see California, Oregon and Washington.   Typically, if I were to create this security I would create a role for each grouping of states and add the appropriate users.  An easier way to maintain security like this example would be to store all this information in relational tables that way a DBA that may not be as familiar with Analysis Services can easily handle adding new users or modifying existing ones.  This table structure would look something like this:

1

FactSales is storing all the company sales, while DimState has all the possible distinct states that sales can occur in.  This is a very small scale example, but here’s in detail what DimUser and UserStateBridge tables store:

DimUser

This table simply has a key which is an identity column in this case and an account column which is the user account that you wish to grant access.  So example records would look like this:

UserID Account
1 Domain\User1
2 Domain\User2

UserStateBridge

This is a table that has a many-to-many relationship between the DimUser table and DimState.  This many-to-many relationship allows for one user to see many states and one state to accept many users.  Here’s a sample of what this table may look like:

UserID StateID
1 5
1 27
1 8
1 14
2 42
2 27
2 6

I’m not going to show you what the State dimension would look like because this is just a typical dimension you see day to day.  The StateID here is the identity column from the state table.

Once this table structure is created your next step is to bring it into Analysis Services.  Here are the steps to follow inside BIDS to leverage these tables (I’ll assume you’ve already created a basic cube first):

· Add these new tables (DimUser and UserStateBridge) to you Data Source View.

· Create a new dimension based off the DimUser table.  Hide all the columns by changing AttributeHierarchyVisible to False.  You don’t want users to be able to view this dimension.  Process this dimension.

· Create a new Measure Group in the cube based off the UserStateBridge.  Delete all measure it creates except for the automatically generated Count measure.  Select the single measure left to be hidden by selecting it and changing the Visible property to False.  This will hide the entire Measure Group from your users but you can still use it when writing MDX.

· Ensure these two have a Regular relationship setup in the Dimension Usage table of the cube browser.  Often this relationship is created automatically.

2

· In the Solution Explorer, right-click on Roles and select New Role.  Give Read access in the General, Data Sources and Cubes tabs.  On the Membership tab add any groups or user accounts this Role will effect.

· On the Dimension Data tab select the State dimension from the dropdown list.  Then select the StateID(you would likely have this column hidden in the dimension) in the Attribute Hierarchy dropdown list.

· Inside the Dimension Data tab select the Advanced tab.  In the Allowed member set area enter this MDX:

Template Sample

NonEmpty (

[Dimension To Secure Name].[Dimension Key].Members,

(

StrToMember ("[User Dimension].[User Key].[" + UserName () + "]"),

[Measures].[Bridge Table Count Measure]

)

)

NonEmpty (

[State].[State ID].Members,

(

StrToMember ("[User].[User ID].[" + UserName () + "]"),

[Measures].[User State Bridge Count]

)

)

The UserName() function will grab the user account that attempts to access the cube.  StrToMember converts a literal string to a dimension member.

· Last in the bottom left of your current screen check the box called Enable Visual Totals.

3

· Reprocess the entire SSAS database now.  When you navigate to the cube browser now you can test and make sure that your security is working properly by emulating one of the users you’ve added to the role.

4

After verifying you security in the browser if you still find problems with the results then your next step is to check the user and bridge table.  Again this is a very basic example of creating dynamic security so if you did something more complicated like based a role off of two bridge tables then you may run into conflicts between the two.  For example, say you not only want to base your security off of state but also on the store the items was sold at.  In that case you would create two bridge tables.  One for state, which we’ve already done, and a second for store built the same way the first one was.  So image your fact table had some results similar below:

StateID StoreID SalesAmount
3 4546 1000
23 6434 1200
14 1554 900
43 7664 500
3 4578 1100
35 6789 3000

When using multiple bridge tables in a single role you need to make sure that if User1 has rights to StateID 3 in the UserStateBridge table then he also needs rights to StoreID 4546 and 4578 in the store bridge table to be able to see each record when browsing.  This means the bridge tables can actually over rule each other if they do not have permission for both keys.  This is especially important if you’re using inferred members in you data warehouse using some kind of indicator like -1.  If there is a store in an unknown store and you assign unknowns to a -1 key value when you will want to make sure your users have access to that -1 store.

Adding new accounts is very simple now.  Add them to the DimUser table and then reprocess.  If you need additional states associated with users simply add them to the bridge table then reprocess the cube.

After reading this article if you still need more information on this topic I highly recommend the book Expert Cube Development with SSAS Multidimensional Models.  This book shows many different types of dynamic security including security based off of a stored procedure. 

MDX Time Calculations Built to Scale

When creating time calculations in Analysis Services you may take several approaches. For people that have just begin learning SSAS they likely have spent several days writing calculations that return things like year to date, rolling twelve month, and period over period for every measure they have.

For the more adventurous developer they have tried using the Business Intelligence wizard which automates the process of writing time calculations on selected measures. It creates an attribute in the date dimension that can then be used to view calculations. This sounds like a great idea but it is very inflexible when changes are needed and is still specific to the measures that were selected during the wizard configuration. Another problem with the BI wizard is it can only be run against one date dimension hierarchy at a time.With many dimensional models date can mean many different things, it can be a calendar date, fiscal date, ship date, order date, and many others. There is likely only one date table in the data warehouse but plays the role of many different dates making it a role playing dimension. If you wanted those same time calculations for each role playing dimension using the BI wizard you would have to run through the wizard multiple times and then enjoy managing that mess later.

The best way to handle time calculation is detailed in the book Expert Cube Development with SSAS Multidimensional Models. You may have heard me sing praises of this book before because the content is that useful for real world problems. What I’m going to show you in this article is my variation of what they show in that book. The benefit of this method is multifold. It is a lot easier to manage than the BI wizard because you have total control of it. It’s dynamic because the formula will work no matter what measure is being viewed. Also, it is easy to filter just the calculations you want because it’s just like any other attribute that you would filter by.

To follow this example you can download the sample database from www.codeplex.com. When you download and install this sample database you also receive sample files for deploying an Analysis Services cube that can be found here C:\Program Files\Microsoft SQL Server\100\Tools\Samples.

To start this example create a SQL Server view that will simply store the name of the calculation that you want to create and an ID column.

Create VIEW [dbo].[DateCalcs] AS

SELECT ID = 1, Calculation = ‘Actuals’

UNION ALL

SELECT ID = 2, Calculation = ‘Calendar Year To Date’

UNION ALL

SELECT ID = 3, Calculation = ‘Fiscal Year To Date’

UNION ALL

SELECT ID = 4, Calculation = ‘Calendar Previous Period’

UNION ALL

SELECT ID = 5, Calculation = ‘Fiscal Previous Period’

UNION ALL

SELECT ID = 6, Calculation = ‘Calendar Previous Period Diff’

UNION ALL

SELECT ID = 7, Calculation = ‘Fiscal Previous Period Diff’

The resulting view looks like this:

ID Calculation

1 Actuals

2 Calendar Year To Date

3 Fiscal Year To Date

4 Calendar Previous Period

5 Fiscal Previous Period

6 Calendar Previous Period Diff

7 Fiscal Previous Period Diff

Create as many time values for the Calculation column as you will need time calculations in the cube. Here I have created a set of calculations for Calendar and Fiscal because it is likely I will be required to have these calculations for both my Calendar and Fiscal dates.

Once this is created add the view to your data source view. Notice after adding the view that it has no relationships to any other object, which is fine because no relationships are needed.

Create a new dimension based off the DateCalcs view with the ID column as the Key column and the Calculation column as a regular attribute. Select the ID attribute and change the AttributeHierarchyVisible property to False to hide it from the users. Select the Calculation attribute and change the IsAggregatable property to False. This will remove the All level you are used to seeing in dimensions which isn’t necessary for the dimension.

1

The last step is to select the Calculation attribute and change the DefaultMember property to the Actuals member [Date Calcs].[Calculation].&[Actuals]. When all these changes have been made you can process this dimension. After the processing completes add the new dimension to be used in this cube by going to the Cube Structure tab in the cube designer. To add the dimension right click in the Dimensions pane in the bottom left and select Add Cube Dimension. Select Date Calcs from the list to make it usable to this cube. You will notice if you view the Dimension Usage tab that it has no relationship to any Measure Group, which is not a problem.

2

With the Date Calcs dimension prepped you can now open the Cube designer and open the Calculations tab. Here you will be using the mdx SCOPE statement to handle each of the calculations you wish to use. It does not matter what order we write each of these calculation but the first one we will tackle is Calendar Year To Date. Hit the New Script Command button ScriptIcon to get started.

Calendar Year To Date

SCOPE ([Date Calcs].[Calculation].[Calendar Year To Date]);

THIS =

AGGREGATE (

YTD ([Order Date].[Calendar Date].CurrentMember),

[Date Calcs].[Calculation].[Actuals]

);

END SCOPE

Fiscal Year To Date (Using a different method to calculate YTD)

SCOPE ([Date Calcs].[Calculation].[Fiscal Year To Date]);

THIS =

Aggregate(PeriodsToDate(

[Order Date].[Fiscal Date].[Fiscal Year],

[Order Date].[Fiscal Date].CurrentMember

),

([Date Calcs].[Calculation].[Actuals])

)

;

END SCOPE

Calendar Previous Period

SCOPE ([Date Calcs].[Calculation].[Calendar Previous Period]);

THIS =

([Order Date].[Calendar Date].CurrentMember.PrevMember,

[Date Calcs].[Calculation].[Actuals]);

END SCOPE

Fiscal Previous Period

SCOPE ([Date Calcs].[Calculation].[Fiscal Previous Period]);

THIS =

([Order Date].[Fiscal Date].CurrentMember.PrevMember,

[Date Calcs].[Calculation].[Actuals]);

END SCOPE

Calendar Previous Period Diff

SCOPE ([Date Calcs].[Calculation].[Calendar Previous Period Diff]);

THIS =

([Order Date].[Calendar Date].CurrentMember,

[Date Calcs].[Calculation].[Actuals])-

([Order Date].[Calendar Date].CurrentMember.PrevMember,

[Date Calcs].[Calculation].[Actuals]);

END SCOPE

Fiscal Previous Period Diff

SCOPE ([Date Calcs].[Calculation].[Fiscal Previous Period Diff]);

THIS =

([Order Date].[Fiscal Date].CurrentMember,

[Date Calcs].[Calculation].[Actuals])-

([Order Date].[Fiscal Date].CurrentMember.PrevMember,

[Date Calcs].[Calculation].[Actuals]);

END SCOPE

This gives you a good start on a Time Calculation dimension. Once these calculations are written you will need to process the cube. You may have noticed that in each of these calculations I never actually identify a measure that exists in the cube. That is actually the beautiful thing about this technique is that it works across all measures now! So instead of having to create a calculation for Sales, Profit, and all the other measures in the cube you only have to create this calculation once and it will work on all measures. Try it out yourself!

3

Understanding Analysis Services Relationships using Dimension Usage

As a part of designing an Analysis Services cube you must define how each dimension is related to each measure group in your cube through the Dimension Usage tab.  The Dimension Usage tab is found when editing the cube in the cube designer.  The screenshot below shows dimensions on rows and measure groups on columns.  The points at which they intersect define how they are related.  These relationships are automatically defined when the two objects (measure groups and dimensions) are added to the cube but to manually change the relationships click the ellipses intersecting relationship boxes.

DimUsage1

The first type of relationship option you should know about really is not a relationship at all.  This type is called No Relationship as shown in the image below.  Anywhere you find this relationship type it will appear as the greyed out box at the intersecting points on the Dimension Usage tab.  This relationship type would be used when a dimension has no relationship to a measure group.  For example, if you have a measure group that holds nothing but sales that occur on the internet and a dimension that has employee data then you are likely going to use the No Relationship type.  Sales that transpire on the internet likely have no employee associated with them so in this case it makes sense that the cube has no relationship defined.

DimUsage7

Keep in mind that having no relationship can cause confusion to end users if they try to browse two unrelated objects.  That’s why it is important to properly train users how to browse the cube.  The measure group property IgnoreUnrelatedDimensions can also help end users from getting confused when browsing unrelated cube objects.

The Regular relationship type is the most traditional relationships that exist in data warehouses.  Think of this as a typical one-to-many relationship.  For example, a product dimension is full of a unique list of products but the fact table that it relates to has many instances of that product for each time it is sold.  In a traditional data warehouse design the product dimension has a unique key that represents each distinct instance of a product, while the fact table may store that product key several times for each transaction that sold the same product.  In this case the product key is a primary key in the product dimension and a foreign key in the fact table.

The screenshot below shows the FactInternetSales measure group related to the DimCustomer dimension using the CustomerKey.  The diagram in the image depicts the relationship with the yellow table as the fact table and the blue table as the dimension table.

DimUsage2

          The Fact relationship type is used when a measure group is also used as a dimension.  Sounds a little confusing right? Sometimes there are business cases when you not only want to aggregate data that is in a measure group but you also want to slice by values in it as well.  When this is a requirement you create the measure group and dimensions separately and then relate them in the Dimension Usage tab with a Fact relationship.  This type of relationship is also known as a degenerate dimension.

          An example when you may use this relationship type is with a measure group that stores sales orders.  In this case not only do you want to aggregate sales order data but you also want to slice by Order Numbers in the same table.

DimUsage3

          Referenced relationship types are often used when your data warehouse design utilizes a snowflake schema design.  Without getting into a lengthy data warehouse design discussion snowflake design is different from a star schema design because a snowflake design is a more normalized view of the data with dimensions that “branch” off of each other.  A star schema is a more denormalized view of the data with dimensions all directly relating to the fact table.

DimUsage9

          Imagine you run a data warehouse for a movie store (yes I realize movie stores are pretty much extinct).  The fact table (measure group) holds sales that occur in the store and a dimension will all the movies that could possibly be sold.  There is another dimension with the movie genre that is related to the movie dimension instead of the fact table.  If I want to have an independent dimension in my cube for genre then I must use a Referenced relationship type to relate it to the appropriate measure groups.    There is no direct relationship between genre and the fact table which is where the “referenced” part of this relationship becomes relevant.  The cube “hops” through the movie dimension to simulate a direct relationship between genre and the fact table.  The dotted line in the diagram on the below images represents this.  Referenced dimensions are not optimal for performance and if you can optionally avoid them you should do so.

DimUsage4

          Using a Many-to-Many relationship is done when a data warehouse design implements a bridge table between dimensions to appropriately represent all combinations of data.  For example, a dating website has members and members have hobbies.  A member can have many hobbies and a hobby can be associated with multiple members.  To accomplish this in a data warehouse design a bridge table can be used to relate each combination of hobby and member.  This bridge table is often called a factless fact table because when it is brought into the cube it is necessary to define it as a measure group.

DimUsage8

          When defining this relationship in the cube you the factless fact table is the intermediate dimension that is used to relate the two dimensions to the measure group.

DimUsage5

          The rarely used Data Mining relationship relates a mining model that was created from a cube dimension to the appropriate measure group.

DimUsage6

          If you have any other questions about data warehouse design or cube creation feel free to email me at dknight@pragmaticworks.com

Choosing the Right Analysis Services: MOLAP vs. Tabular Recording and Q&A

I hope you were able to attend my free webinar on Choosing the Right Analysis Services: MOLAP vs. Tabular on February 11, 2014 that I delivered with Dustin Ryan.  If you weren’t you can now watch the recording here.

Traditionally following a webinar I try to post a quick Q&A session for everything I was not able to answer in the allotted time.

Q: Since Tabular solutions are many ways better than Multidimensional..then my question is when to go for Multidimensional solution

You would likely still choose Multidimensional for things that Tabular does not have full feature parity.  For example, Tabular does not have the ability do writeback. 

Some still choose Multidimensional because their data model is very complex and Multidimensional is more forgiving in this regard.  Last, Multidimensional is a more scalable for larger datasets because it is less (still uses memory, CPU, etc.. excessively) reliant on pure memory to get the performance you need, unlike Tabular.  As we mentioned in the webinar, Yahoo chose Multidimensional because it would better scale their 26 terabyte cube.

Q: When deciding whether to use Tabular or Multidimensional, do you have to have SSAS installed / set up specifically for one or the other?

Yes, when setting up SSAS Multidimensional or Tabular they are separate installs.  Meaning they can’t be installed at the same time with the SQL Server installer.  Because of their resource demands you traditionally want to install these instances on separate servers if possible.

Q: How do i link if column have more than one column is key column in tabular?

Great question.  Usually, if you have tables that need to be joined together on multiple columns you have create a key column that combine the values into a single column either through DAX or through the source query extract.

SQL Server Analysis Services Webinar

Join me tomorrow, April 2. 2013, at 11:00 PM EST for a free webinar which is part of an entire month of Back to Basics themed free training.  This webinar is for anyone trying to understand the basics behind Analysis Services and how to get started design a cube.  With the release of SQL Server 2012 there are now two forms of Analysis Services so we’ll also spend time discussing Multidimensional and Tabular and why you would choose one over the other.

Also, during this webinar I reference a Decision Matrix file to help you with making the decision on which Analysis Services is right for you.  You can find that file here.  If you would like the PowerPoint slides for this session you can download them here.

Register for the webinar here and if you won’t be able to make it at this time all of the webinars are recorded and can be watched at a later date by using the same link.

OLAP PivotTable Extensions Update

OLAP PivotTable Extensions is a really cool free add-in for Excel that’s been around for quite a while now.  It allows you to have some exposure to what’s going on under the covers (Among many other features) while you browse a cube using an Excel PivotTable. This weekend there has been an updated version and you should check out the features and download it now!

http://olappivottableextend.codeplex.com

The new features in this release are:

  • Support for Excel 2013
  • Upgrading to the AS2012 version of ADOMD.NET to now support connections to .bism files. (This upgrade required adding the .NET Framework v3.5 SP1 as a prerequisite. Use a prior version of OLAP PivotTable Extensions if this prerequisite is a problem for your organization.)
Follow

Get every new post delivered to your Inbox.

Join 37 other followers