Power Query to HDFS – Remote Name Could Not Be Resolved

I wanted to share a quick resolution I had to a problem with connecting Power Query to data stored in HDFS.

Seems easy enough. Launch Excel and through Power Query select From Other Sources –> From Hadoop Files (HDFS)

image

Next you’re prompted to provide your Hadoop server name then click OK.  I’ll be using the Hortonworks Sandbox environment for this demo.

image

I then see all the files available on HDFS.

image

If I try to explore the data in these files by clicking Binary next to any of the files listed I’m presented with the following error: DataSource.Error: HDFS cannot connect to server ‘sandbox.hortonworks.com’. The remote name could not be resolved: ‘sandbox.hortonworks.com’.

image

The issue here is with the name resolution and is a configuration issue. To fix this you must put your Hadoop cluster details in the HOSTS file on your machine that’s running Power Query. 

To edit the HOSTS file (I recommend you take a backup of it first) you’ll need to launch a text editor as administrator then open the HOSTS file found in the following folder:

C:\Windows\System32\drivers\etc

Provide the IP address and host name of the server as shown below then save.

image

Next time I try to connect to a file on HDFS everything now works properly.

image

Hope this helps!

3 comments

  1. Hi,

    I am experience a similar issue, but with this error:

    DataSource.Error: HDFS cannot connect to server ‘192.168.56.102’. Unable to connect to the remote server.

    What did you change in the Hortonworks Sandbox 2.1 to get it to work?

    BR

    Lars

Leave a Reply