[The original material links in this article are no longer available on the web and have been removed.] This was a talk in 2012 that I attended.]
This session was very interesting and shared the technical workings of Polybase. It was delivered by David DeWitt, Director at Microsoft from work in the Gray Systems Lab.
He set the scene explaining the Hadoop ecosystem.
Then defined the main components.
This is a two universe world of both structured and unstructured data. The 2 alternative solutions are Sqoop which has limitations and Polybase which is a superior alternative.
As stated on the Gray Systems Lab site "the goal of the Polybase project is to allow SQL Server PDW users to execute queries against data stored in Hadoop, specifically the Hadoop distributed file system (HDFS). Polybase is agnostic on both the type of the Hadoop cluster (Linux or Windows)"
The sessions discussed the approach and drawbacks of each method. There are several phases of delivery planned.
- Phase 1: in PDW next year
- Phase 2: working on
- Phase 3: thinking about
- The world has changed
- Map reduce really is not the right tool
- Polybase for PDW is a first step.