Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday 9 November 2012

Pass Summit 2012: Polybase What, Why, How.

[The original material links in this article are no longer available on the web and have been removed.] This was a talk in 2012 that I attended.]

This session was very interesting and shared the technical workings of Polybase. It was delivered by David DeWitt, Director at Microsoft from work in the Gray Systems Lab.  

He set the scene explaining the Hadoop ecosystem.

Then defined the main components.

This is a two universe world of both structured and unstructured data. The 2 alternative solutions are Sqoop which has limitations and Polybase which is a superior alternative.

 As stated on the Gray Systems Lab site "the goal of the Polybase project is to allow SQL Server PDW users to execute queries against data stored in Hadoop, specifically the Hadoop distributed file system (HDFS). Polybase is agnostic on both the type of the Hadoop cluster (Linux or Windows)"

The sessions discussed the approach and drawbacks of each method. There are several phases of delivery planned.
  • Phase 1: in PDW next year
  • Phase 2: working on
  • Phase 3: thinking about
 David DeWitt ended the presentation with
  • The world has changed
  • Map reduce really is not the right tool
  • Polybase for PDW is a first step.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.