SSAS Dimension Attribute Discretization Methods

Discretization (one of my favorite tech words to say J) is a really great feature of Analysis Services that is often forgotten. Basically what discretization does is it can take a dimension attribute and automatically place its members in buckets for you. For example, if you have a product dimension that has an attribute for price you could create buckets for price instead of listing every distinct value.

To set this property you select the attribute to configure and open the properties menu. The properties you’re concerned with are DiscretizationMethod and DiscretizationBucketCount.

The DiscretizationMethod property controls the way the buckets are created. The below chart, provided by technet, breaks down what each option does.

DiscretizationMethod Setting	Description
None	Displays the members.
Automatic	Selects the method that best represents the data: either the EqualAreas method or the Clusters method.
EqualAreas	Tries to divide the members in the attribute into groups that contain an equal number of members.
Clusters	Tries to divide the members in the attribute into groups by sampling the training data, initializing to a number of random points, and running several iterations of the Expectation-Maximization (EM) clustering algorithm. This method is useful because it works on any distribution curve, but is more expensive in terms of processing time.

The DiscretizationBucketSize controls the number of buckets to be created. When it’s set to 0 the buckets are automatically created.

So if this is such a great feature why is it not used more often? Well like many things it’s all a matter of control. Using these discretization properties you have some control of how the buckets are created and displayed but you may have a specific bucket size requirement and these properties don’t permit specifications other than the ones highlighted earlier to get the exact bucket size you may require.

So what alternative do you have? You can manually create your buckets in the actual table, in a view using a case statement, or in the data source view also with a case statement. That way you have complete control over the buckets. This is what I see most commonly done but it’s nice to know you have options if you don’t have specific bucket requirements.