Understanding Analysis Services Relationships using Dimension Usage

As a part of designing an Analysis Services cube you must define how each dimension is related to each measure group in your cube through the Dimension Usage tab.  The Dimension Usage tab is found when editing the cube in the cube designer.  The screenshot below shows dimensions on rows and measure groups on columns.  The points at which they intersect define how they are related.  These relationships are automatically defined when the two objects (measure groups and dimensions) are added to the cube but to manually change the relationships click the ellipses intersecting relationship boxes.


The first type of relationship option you should know about really is not a relationship at all.  This type is called No Relationship as shown in the image below.  Anywhere you find this relationship type it will appear as the greyed out box at the intersecting points on the Dimension Usage tab.  This relationship type would be used when a dimension has no relationship to a measure group.  For example, if you have a measure group that holds nothing but sales that occur on the internet and a dimension that has employee data then you are likely going to use the No Relationship type.  Sales that transpire on the internet likely have no employee associated with them so in this case it makes sense that the cube has no relationship defined.


Keep in mind that having no relationship can cause confusion to end users if they try to browse two unrelated objects.  That’s why it is important to properly train users how to browse the cube.  The measure group property IgnoreUnrelatedDimensions can also help end users from getting confused when browsing unrelated cube objects.

The Regular relationship type is the most traditional relationships that exist in data warehouses.  Think of this as a typical one-to-many relationship.  For example, a product dimension is full of a unique list of products but the fact table that it relates to has many instances of that product for each time it is sold.  In a traditional data warehouse design the product dimension has a unique key that represents each distinct instance of a product, while the fact table may store that product key several times for each transaction that sold the same product.  In this case the product key is a primary key in the product dimension and a foreign key in the fact table.

The screenshot below shows the FactInternetSales measure group related to the DimCustomer dimension using the CustomerKey.  The diagram in the image depicts the relationship with the yellow table as the fact table and the blue table as the dimension table.


          The Fact relationship type is used when a measure group is also used as a dimension.  Sounds a little confusing right? Sometimes there are business cases when you not only want to aggregate data that is in a measure group but you also want to slice by values in it as well.  When this is a requirement you create the measure group and dimensions separately and then relate them in the Dimension Usage tab with a Fact relationship.  This type of relationship is also known as a degenerate dimension.

          An example when you may use this relationship type is with a measure group that stores sales orders.  In this case not only do you want to aggregate sales order data but you also want to slice by Order Numbers in the same table.


          Referenced relationship types are often used when your data warehouse design utilizes a snowflake schema design.  Without getting into a lengthy data warehouse design discussion snowflake design is different from a star schema design because a snowflake design is a more normalized view of the data with dimensions that “branch” off of each other.  A star schema is a more denormalized view of the data with dimensions all directly relating to the fact table.


          Imagine you run a data warehouse for a movie store (yes I realize movie stores are pretty much extinct).  The fact table (measure group) holds sales that occur in the store and a dimension will all the movies that could possibly be sold.  There is another dimension with the movie genre that is related to the movie dimension instead of the fact table.  If I want to have an independent dimension in my cube for genre then I must use a Referenced relationship type to relate it to the appropriate measure groups.    There is no direct relationship between genre and the fact table which is where the “referenced” part of this relationship becomes relevant.  The cube “hops” through the movie dimension to simulate a direct relationship between genre and the fact table.  The dotted line in the diagram on the below images represents this.  Referenced dimensions are not optimal for performance and if you can optionally avoid them you should do so.


          Using a Many-to-Many relationship is done when a data warehouse design implements a bridge table between dimensions to appropriately represent all combinations of data.  For example, a dating website has members and members have hobbies.  A member can have many hobbies and a hobby can be associated with multiple members.  To accomplish this in a data warehouse design a bridge table can be used to relate each combination of hobby and member.  This bridge table is often called a factless fact table because when it is brought into the cube it is necessary to define it as a measure group.


          When defining this relationship in the cube you the factless fact table is the intermediate dimension that is used to relate the two dimensions to the measure group.


          The rarely used Data Mining relationship relates a mining model that was created from a cube dimension to the appropriate measure group.


          If you have any other questions about data warehouse design or cube creation feel free to email me at dknight@pragmaticworks.com

Using PowerShell to Build your Azure VM Environment

In a previous post I wrote about Using a Hyper-V VHD in an Azure VM. Today I’d like to show you what my next steps were.  My goal with using Azure is to create an lab environment for student that fits the following needs:

  • Easy to create new virtual environments
  • Easy to power down virtual environments
  • Easy to power up virtual environments
  • Easy to delete all virtual environments

The point of this post is to guide you through how to easily create virtual machines for multiple students or employees in a quick and easy script. By the way, I have previously posted how to solve the problem of powering up and powering down the virtual lab environments in my blog about Setting up Automation in Azure.

Creating an Image

These steps will guide you through how to take what you learned in my blog about Using a Hyper-V VHD in an Azure VM and create and image from your VHD you uploaded. You would also create an image from a Virtual Machine that you created from the Azure Gallery. Once an image is created you can then spin off as many virtual machines as you would like from that image.

  • Login to the Azure Management Portal.
  • Navigate to the Virtual Machines page and select Images on the top of the screen.
  • Select Create on the bottom of the screen to build a new image.


  • Name the image and provide the VHD url location where you are storing your vhd. If you followed my previous blog on uploading you local VHD then this would be the storage account and container that you uploaded to. You must run Sysprep on your virtual machine if you want to create an image from it.


PowerShell to Create your VMs

The the image created you can now either manually create new virtual machines from the image or use PowerShell to scale your solution better. For me Powershell made most sense because I’m not trying to build just one virtual machine. In my case I actually need to build 15 identical virtual machines (for 15 students) based off the same image.

  • If you haven’t already download and Install the Windows Azure PowerShell module http://go.microsoft.com/fwlink/p/?linkid=320376&clcid=0×409
  • Launch Windows Azure PowerShell
  • Before you start using the Windows Azure PowerShell cmdlets, you need to configure connectivity between your machine and Windows Azure. One way to do this is by downloading the PublishSettings file from Windows Azure and importing it. Using the PowerShell prompt enter the following commands:
    • Get-AzurePublishSettingsFile
    • Import-AzurePublishSettingsFile “C:\SubscriptionCredentials.publishsettings”
  • Download my PowerShell script called CreateVMs.ps1. Essentially you point this script to your image and cloud service and tell it how many VMs you want and it loops over your image until all the VMs you asked for are created.

param([Int32]$vmcount = 3)# Create Azure VMs for Class
# run in Powershell window by typing .\CreateVMs.ps1 -vmcount 3

$startnumber = 1
$vmName = “VirtualMachineName”
$password = “pass@word01″
$adminUsername = “Student”
$cloudSvcName = “CloudServiceName”
$image = “ImageName”
$size = “Large”
$vms = @()
for($i = $startnumber ; $i -le $vmcount; $i++)
$vmn = $vmName + $i
New-AzureVMConfig -Name $vmn -InstanceSize $size -ImageName $image |
Add-AzureEndpoint -Protocol tcp -LocalPort 3389 -PublicPort 3389 -Name “RemoteDesktop” |
Add-AzureProvisioningConfig -Windows -AdminUsername $adminUsername -Password $password |
New-AzureVM -ServiceName $cloudSvcName

  • Modify any of my generic parameters I’ve provided in the file.  You’ll likely need to modify the default values of the following
    • $vmName
    • $password
    • $adminUsername
    • $cloudSvcName
    • $image
    • $size (Optional, right now mine creates large VMs so you may want to adjust this)
  • This script also accepts one parameter that you can key in the value for externally. The –VMCount parameter allows you to specify exactly how many virtual machines you would like based on your image. To run this script use this command in your PowerShell window. Using the number 3 here will produce 3 virtual machines for me.

.\CreateVMs.ps1 -vmcount 3

  • If I were to run this script without changing any of the embedded parameters it would produce 3 virtual machines with the following names:
    • VirtualMachineName1
    • VirtualMachineName2
    • VirtualMachineName3

Once your virtual machines are created your next step is to consider if you want to keep them running all the time or only during business hours. Keep in mind you only pay for the time your virtual machines are started. For my scenario I wanted to turn my machines off during non business hours and back on in the morning, which can be automated through Azure Automation. Read this post on Setting up Azure Automation.

Reducing the Disk Size of a Hyper-V Virtual Machine

Have you ever created a Hyper-V virtual machine and later needed to change the size of the disk on the VM? When you initially create a virtual machine you provide a size for the virtual hard disk (as shown below).


It’s pretty straight forward to expand the size of this virtual disk but if you need to shrink the disk size there’s a few extra steps to make the change.

Here’s the simple steps for expanding a virtual disk:

To expand a disk you can go into the Settings of your virtual machine when it is attached to you Hyper-V Manager.


By clicking Edit on the Hard Disk you can select Expand and then provide the size that you would like to expand the virtual disk.


You’ll notice here there is no shrink option. Compact is not going to do the job because if you read the description it tells you that the storage capacity is not impacted by compacting it. So let’s focus on how shrink our disk.

Shrink the Disk Volume on the Virtual Machine

  • Connect and login to your virtual machine operating system.
  • Launch the Disk Management utility by doing a Run diskmgmt.msc.


  • Right-click on the volume you wish to shrink and select Shrink Volume.


  • Carefully fill out the property called Enter the amount of space to shrink in MB. I say carefully because it is easy to assume that this number is what you want to new disk size to be, but in reality this is the size to reduce your disk by.


  • Every time I tried this I got an error on the first try. If this happens to you try to shrink it a again and it will likely work.
  • Notice below I now have roughly 20 GB unallocated.


  • With this change done you can now shut down the virtual machine.
  • In the Hyper-V Manager right-click on the virtual machine and select Settings.
  • Find your virtual hard disk and select Edit.


  • Now that we’ve created the extra disk space on the volume inside the virtual machine you can select Shrink here. If you look back to the third screenshot of this blog you’ll notice the Shrink option was not available. It took us first reducing the volume of the disk inside the VM to make this appear.  Select Shrink then click Next.


  • Provide the new desired disk (notice the minimum). Then click Finish


  • You can verify the disk size change by click Inspect.


I needed to personally do this because I was migrating many of my VMs to Azure.  Unfortunately Azure will only let you use OS disk that is128 GB or less so this helped me change my disk size to something Azure could handle.  Here’s my post on Using a Hyper-V VHD in an Azure VM.

Setting up Automation in Azure

As I’ve started digging deeper into using Azure the need to automate tasks (turn off all VMs, turn on all VMs, automatically resize all VMs, etc…) has becomes very apparent.  In April 2014 Microsoft announced the release of Microsoft Azure Automation to handle such operations.

Activate Azure Automation

This new feature is still in beta but you can sign up your Windows Azure account to try the feature by going here:




Once the Automation feature is enabled for your Azure account you will see it listed with all other available feature on your account.


Microsoft has some documentation on what to do next:


This link gives you a basic “Hello World” example, but thought I would take you through a more interesting example that I’m using to turn off all my VMs at a certain time. The high level steps are:

  1. Create an Automation Account
  2. Create a Management Certificate
  3. Upload the Management Certificate
  4. Create a Connection Asset
  5. Create a Credential Asset
  6. Upload a Runbook
  7. Schedule the Runbook

Create an Automation Account

  • Log in to the Azure Management Portal.
  • Navigate to the Automation feature on the side pane and select Create an Automation Account.
  • I’m creating this automation to turn off and back on Azure VMs that I use for training classes so I’ll name my account appropriately.


Create a Management Certificate

Azure Automation authenticates to Microsoft Azure subscriptions using certificate-based authentication. There are several ways to create a new management certificate. I used Visual Studio’s makecert utility to create my certificate but if you don’t have Visual Studio there are other methods for creating a certificate.

  • Launch the Visual Studio Command Prompt with Run as Administrator.


  • Run the following command to produce the certificate:

makecert -sky exchange -r -n “CN=<CertificateName>” -pe -a sha1 -len 2048 -ss My “<CertificateName>.cer”

Microsoft Documentation on this command: http://msdn.microsoft.com/en-us/library/azure/gg551722.aspx

  • Next you’ll need to export the private key and certificate from the Certificate Manager. You can launch this by typing certmgr.msc from your start menu.
  • Once the Certificate Manager launches open the Personal and then Certificates folder.
  • Right-click on the certificate that you created previously and select All Tasks and then Export.
  • Let’s start by exporting the private key. In the Certificate Export Wizard select Yes, export the private key.


  • Leave the default selection of Personal Information Exchange and then click Next.


  • Provide a password to the private key. You will be required to type this in later in Azure.


  • Name the private key and store it in a local drive for now.  It will be imported into Azure later.


  • Click Finish to execute the export.


  • We still need to export the certificate file. Right-click on the certificate in the Certificate Manager again and select All Tasks and then Export.
  • This time select No, do not export the private key.


  • Keep the default selection of the format and then click Next.


  • Name the certificate and store it in a local drive for now.  It will be imported into Azure later.


Upload the Management Certificate

With the certificate now created we’re ready to upload it to Azure.


  • Select Management Certificates from the Settings page.
  • Then select Upload from the bottom of the page.
  • Browse to the certificate file we created in the previous steps and select the Azure subscription you would like to apply it to. Click the check to upload the certificate.


Create a Connection Asset

The next step is to create a connection asset. Doing this allows you to easily pass in information about your Azure subscription into your scripts we will create later called runbooks.This also helps as information about your Azure subscription changes you only have to change it one place instead of all your runbooks.

  • Go back to the Automation page in the Azure Management Portal.
  • Click the arrow on the Automation account created earlier.
  • Select Assets from the top of the account page.
  • Then select Add Setting from the bottom of the Assets view.


  • Select Add Connection.


  • In the Configure connection page select Azure as the Connection Type and then provide a name for the connection. Click the arrow in the bottom right to move to the next configuration step.


  • On the next page you must provide a name for the automation certificate name. You can use any name you want but you must remember this exact name for our next step so I’d recommend copying somewhere for later use.
  • You’re also required to provide the Subscription ID that you wish to apply this connection to. You can find the Subscription id under the Settings page. Click the check to complete the connection asset.


Create a Credential Asset

Before we can create an automation to shutdown VMs automatically we have to add credentials. This is required to authenticate to the subscription that is hosting the virtual machine.

  • Select Add Setting again.


  • Select Add Credential.


  • Chose Certificate for the Credential Type and ensure that you name the credential the exact same name you provided in the connection asset found in the last section of steps. Click the arrow to continue.


  • Browse to and select the private key (.pfx) created earlier and then provide the password you entered when creating the private key. Click the check to complete the creation of the certificate asset.


Upload a Runbook

We are now ready to scheduling scripts like turning on and off VMs through runbooks.

There is are few lines of code that are used to connect a runbook to your Azure subscription through the certificate and connection assets we just created. To make this easier the Azure Automation team has consolidated this script into a runbook called Connect-Azure. There are many template runbooks available for you to download at the Azure Automation Script Center. The great thing about how runbooks work is you can call other runbooks from inside your code.

Let’s start by downloading and publishing the Connect-Azure runbook provided by the Micorosoft team to your Azure portal.

  • Go to the script center and download the Connect-Azure runbook template.
  • Select your automation account by clicking on the arrow next to the name of the account.
  • Select Runbooks to import the Connect-Azure runbook.


  • At the bottom of the page click Import and browse to the location you saved the Connect-Azure.ps1 file.



  • On the Runbooks page click on the newly imported Connect-Azure runbook.
  • Then click the Author tab and then select Publish on the bottom of the page. This makes this runbook ready to use. You’ll notice you can also test and even edit the script from inside the Azure Portal.


  • With this fundamental runbook imported we can start creating our own. In my case I wish to turn off all the VMs that are part of a cloud service so I select Import to another runbook I’ve created to do this.
  • If you would like you can download my PowerOffVM.ps1 I’m using for this demonstration and then Import it as shown below.


  • **Note that the PowerShell script must be within a workflow to be consumed as an automation. I’m new to PowerShell so here’s the TechNet description of what a workflow is “A workflow is a sequence of programmed, connected steps that perform long-running tasks or require the coordination of multiple steps across multiple devices or managed nodes. The benefits of a workflow over a normal script include the ability to simultaneously perform an action against multiple devices and the ability to automatically recover from failures.”Here’s what the inside of my PowerShell Script looks like in case your curious:

Automates the process of powering down all you Azure VMs in a particular cloud services

Author: Devin Knight
Last Updated: 6/16/2014

workflow PowerOffVMs {

# Optional parameter of type string.
# If you do not enter anything, the default value of Name
# will be MyCloudService
[String]$cloudSvcName = “MyCloudService”,
[String]$vmName = “*”


# Select the subscription to use
Connect-Azure -AzureConnectionName “AzureAutomationConnection”
Select-AzureSubscription -SubscriptionName “AzureAutomationConnection”

# Turn off Virtual Machines
Stop-AzureVM -ServiceName $cloudSvcName -Name $vmName -Force


  • You’ll notice that the name of the VM is being populated by a wildcard “*” to return back all VMs that are part of my cloud service. If you’re just finding this script and attempting to use it without Automation you do not need a workflow. You can simply run the following:

Stop-AzureVM -ServiceName “MyCloudService” -Name “*” -Force

  • With the runbook imported you will see your runbook now available.  Click on the arrow next to the name of the workflow


  • Go to the Author page. You will find here you can do things like modify, test and publish the script. Select Publish to make this runbook available to schedule.


Schedule the Runbook

  • Next click the Schedule page and select Link to a New Schedule
  • Provide a name for the schedule then click the arrow in the bottom right to continue


  • I need my VMs to power down everyday at 8:00 PM so I’ve configured the schedule to shut down daily at 20:00. Then click the check to complete the configuration of the schedule.


  • This runbook has two parameters so you can type the values that should be used when executing on this schedule. I provided my cloud service that my VMs are located on and a ‘*’ as a wildcard to include all VMs in the cloud service.


Congratulations! You’ve just automated the powering down of all your Azure VMs in a cloud service. You can of course create another runbook for powering up all you VMs in the morning. If you’d like you can download my runbook called PowerOnVMs.ps1 for doing this.


I was sent a brilliant update to my PowerOnVMs script from Ken Ross (@kzhen | http://kzhendev.wordpress.com).  You can download my script above again and now it will skip Saturday and Sunday when it automatically turns on your VMs.  Thanks Ken!

Using a Hyper-V VHD in an Azure VM

I’m in the process of moving some my lab environments that I use for teaching SQL Server classes to Azure. I’m pretty new to the Azure infrastructure world but I thought I could share some of my lessons learned in doing this. My first post will walk you through moving an existing Hyper-V virtual machine image to be hosted in Azure.


Before you start this process here’s a couple things to be aware of:

  • Virtual Machines in Azure must be .vhd files.  They cannot be .vhdx or other platforms. There are easy ways to convert to .vhd though.Supposedly the Add-AzureVHD command will automatically convert .vhdx files to traditional .vhd files however  when I attempted this it did not work, and produced the following error:

    Add-AzureVHD : ‘C:\VM\Virtual Hard disks\MyVM.vhdx’ is not a valid VHD file.

    No problem though. With a quick Windows PowerShell command I converted my .vhdx file to a .vhd file type.

    PS C:\>Convert-VHD –Path “C:\VM\Virtual Hard Disks\MyVM.vhdx” –DestinationPath “C:\VM\Virtual Hard Disks\MyNewVM.vhd”

  • image
  • You should also be aware that there are limits to the size of the VM that you can attach. I went through the process I’m about to describe to you only to find out that at my .vhd was too large.  The maximum supported OS disk size is 128 GB.  Any larger and you will see an error like the screen shot below. Notice this is just the OS disk size so that tells you that other disks that are larger can be attached later. Use this post to guide you through shrinking your virtual disk size Reducing the Disk Size of a Hyper-V Virtual Machine.
  • image

Uploading Your VM

These first set of steps will walk you through uploading an existing Hyper-V VHD into Azure.

  • If you haven’t already download and Install the Windows Azure PowerShell module http://go.microsoft.com/fwlink/p/?linkid=320376&clcid=0x409
  • Launch Windows Azure PowerShell
  • Before you start using the Windows Azure PowerShell cmdlets, you need to configure connectivity between your machine and Windows Azure. One way to do this is by downloading the PublishSettings file from Windows Azure and importing it. Using the PowerShell prompt enter the following commands:
    • Get-AzurePublishSettingsFile
    • Import-AzurePublishSettingsFile “C:\Subscription-6-5-2014-credentials.publishsettings”
  • Next setup source and destination parameters for your .vhd file.  Using the PowerShell prompt enter the following commands:
    • $sourceVHD = “C:\VM\Virtual Hard Disks\MyNewVM.vhd”
    • $destinationVHD =  “https://yourcontainer.blob.core.windows.net/vhds/MyNewVM.vhd”
  • Using these parameters us the Add-AzureVHD cmdlet to upload your .vhd to an Azure blob storage container.  You can also specify the number of threads you would like to use for the upload.
    • Add-AzureVHD –LocalFilePath $sourceVHD –Destination $destinationVHD –NumberOfUploaderThreads 3


Map the VHD to an Azure Virtual Machine Disk

  • Once the upload completes login to your Azure portal. https://manage.windowsazure.com
  • Click on the Virtual Machines page and then select Disks and then Create on the bottom of the screen.
  • Provide a Name and then click browse to find the VHD URL that you uploaded in the previous steps.
  • You should also Check the box indicating that The VHD contains an operating system.  Then select the appropriate OS from the Operating System Family property. Click OK.


Creating the Azure VM

With this complete you can now create an Azure Virtual Machine from you .vhd you upload from your local machine.

  • Click on the Virtual Machine page again and click New in the bottom of the screen.
  • Select From Gallery.


  • Click on My Disks and you will find the .vhd uploaded and assigned as a Virtual Machine disk.
  • Select your machine then click the arrow in the bottom right to continue.


  • Provide a name in the Virtual Machine Name property and select the resource plan you would like this virtual machine to use.
  • Click the arrow in the bottom right to continue.


  • Configure the DNS name for the virtual machine and which virtual group you would like to join it to then click the arrow in the bottom right to continue.


  • Optionally you could add third-party additional extensions to the VM. Click the check to confirm your configuration of the new VM.


Your VM will be provisioned and should be available immediately once these steps are complete.  Keep in mind a lot of these steps that I did through the management portal could certainly be done using PowerShell as well.

Power Pivot as the Core to your Power BI Solution

I’ve noticed a recent trend with many presentations and articles on Power BI and Excel functionality. Many of these presentations show how quickly you can take tools like Power Query to find and discover data and then quickly present it through tools like Power View and Power Map. These are really impressive demonstrations of what you can do in Excel in a matter of a few minutes to start with no data to quickly making business decisions through some form of a reporting layer.

While I really enjoy these demonstrations my one gripe is it doesn’t tell the full story. Take the example I just gave of discovering data with Power Query then presenting in in Power View.  In the excitement of taking the results of Power Query into Power View we often forget there is another important tool involved. I’m certainly guilty of this as well but I thought it was worth some explanation of what’s really going on behind the scenes.

By now you have probably learned that Power Query provides the option to put your data into Power Pivot immediately after you have finish transforming the data by selecting Load to Data Model.


But what if you don’t choose this option? You go with the default Load to worksheet setting and go about your business. Your data gets imported into an Excel worksheet and then you have to make a decision on how to visualize it.

This is what is often skipped in presentations. As soon as the decision is made to present your data through either Power View or Power Map the data in this Excel worksheet is automatically pushed into Power Pivot.

Let’s get a bit of an expanded view of how all these parts work together:


Another way of looking at this is based on the tool:

Power View

Power View reports can be developed from either Excel 2013 or SharePoint (On Premises) 2010 SP1 with SQL Server 2012.

If using Excel 2013 at some point the data must be placed in Power Pivot to visualize it in Power View.

If designing reports through SharePoint you can design reports on top of a Power Pivot workbook deployed to SharePoint, a connection to Analysis Services Tabular deployed to SharePoint or a connection to Analysis Services Multidimensional deployed to SharePoint.

Power Map

Power Map can currently only be done within Excel 2013 and Office 365 (Power Map preview is still available for On Premises Office). The data in Power Map must be based on top of a Power Pivot workbook.

Power BI Q&A

Power BI Q&A can only be done through a Power BI site and Office 365 subscription. The Q&A interaction is all based on top of a Power Pivot workbook that has been deployed to a Power BI site. You can optimize the Q&A interaction by adding things like synonyms to your Power Pivot workbook.


What you can get from this diagram is that Power Pivot is central to creating a complete Power BI solution. A well designed data model can only increase the chances of your reporting layer being successful. A poor design data model can make it difficult to make appropriate visualizations. So as you start exploring the Power BI tools make it a point to have a well designed Power Pivot data model.

Power Query to HDFS – Remote Name Could Not Be Resolved

I wanted to share a quick resolution I had to a problem with connecting Power Query to data stored in HDFS.

Seems easy enough. Launch Excel and through Power Query select From Other Sources –> From Hadoop Files (HDFS)


Next you’re prompted to provide your Hadoop server name then click OK.  I’ll be using the Hortonworks Sandbox environment for this demo.


I then see all the files available on HDFS.


If I try to explore the data in these files by clicking Binary next to any of the files listed I’m presented with the following error: DataSource.Error: HDFS cannot connect to server ‘sandbox.hortonworks.com’. The remote name could not be resolved: ‘sandbox.hortonworks.com’.


The issue here is with the name resolution and is a configuration issue. To fix this you must put your Hadoop cluster details in the HOSTS file on your machine that’s running Power Query. 

To edit the HOSTS file (I recommend you take a backup of it first) you’ll need to launch a text editor as administrator then open the HOSTS file found in the following folder:


Provide the IP address and host name of the server as shown below then save.


Next time I try to connect to a file on HDFS everything now works properly.


Hope this helps!


Get every new post delivered to your Inbox.

Join 37 other followers