Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Wednesday 14 November 2012

SQLBits XI Announced

The SQLBits Committee announced the dates of the next SQLBits conference.  SQLBits XI will be taking place on May 2nd-4th 2013 at the East Midlands Conference Centre in Nottingham, UK. More details will be here  http://www.sqlbits.com/ over the next few weeks.

Friday 9 November 2012

Pass Summit 2012: Polybase What, Why, How.

[The original material links in this article are no longer available on the web and have been removed.] This was a talk in 2012 that I attended.]

This session was very interesting and shared the technical workings of Polybase. It was delivered by David DeWitt, Director at Microsoft from work in the Gray Systems Lab.  

He set the scene explaining the Hadoop ecosystem.

Then defined the main components.

This is a two universe world of both structured and unstructured data. The 2 alternative solutions are Sqoop which has limitations and Polybase which is a superior alternative.

 As stated on the Gray Systems Lab site "the goal of the Polybase project is to allow SQL Server PDW users to execute queries against data stored in Hadoop, specifically the Hadoop distributed file system (HDFS). Polybase is agnostic on both the type of the Hadoop cluster (Linux or Windows)"

The sessions discussed the approach and drawbacks of each method. There are several phases of delivery planned.
  • Phase 1: in PDW next year
  • Phase 2: working on
  • Phase 3: thinking about
 David DeWitt ended the presentation with
  • The world has changed
  • Map reduce really is not the right tool
  • Polybase for PDW is a first step.

Thursday 8 November 2012

The Data Lifecycle: Turning Data Into Business Value

The second PASS summit keynote was delivered by Quentin Clark, Corporate Vice President, Data Platform Group. The keynote took a journey through the data lifecycle. The data lifecycle being broken down into 5 areas.
  • Collaborate
  • Operationalize
  • Manage
  • Discover and refine
  • Visualize
The lifecycle starts with managing all the data. Combining any data wherever it lives, be it relational and non-relational, cloud & on-premises and model architecture. Then from that data finding all the relevant information and ways to unlock the value of the data with models and analysis. After which creating visualizations of the data and allowing collaboration on the data. The data requires governance and control,  transforming insights into repeatable business process. Self service is the key element being delivered by the new tools.  The demos showed many of the available tools including Data Explorer http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx and how AlwaysOn allows you to add Azure as an availability Replica in SSMS which adds scalability.

The session ended with the message that Microsoft are providing a complete platform solution to let you find what the data is trying to say and embracing the new value of data.

Wednesday 7 November 2012

Pass Summit 2012 Day One Keynote: Accelerate Insight on any Data

Today is the official start to the 2012 PASS summit. The PASS Summit 2012 Day One Keynote session was delivered by Microsoft corporate vice president of the Data Platform Group (DPG) Ted Kummert. The keynote was packed with announcements on changes in the database engines.

Service Pack 1

The first announcement made at PASS is the release of SQL Server 2012 SP1. Some of the new features include
  • Cross-Cluster Migration of AlwaysOn Availability Groups for OS Upgrade
  • Selective XML Index to improve performance
  • DBCC SHOW_STATISTICS works with SELECT permission
  • New dynamic management function which returns statistics properties (sys.dm_db_stats_properties)
  • Express now comes with the complete SQL Server Management Studio (SSMS)
  • SlipStream Full Installation
  • Management Object Support Added for Resource Governor DDL
To read more read what's new in SQL Server 2012 SP1 http://bit.ly/YKmft4

Ted Kummert continued saying the world of data is changing with superabundant supplies. It is approaching a tipping point with volume, variety, velocity, hardware innovation, software innovation and cloud.  There is also a change in architectural assumptions with big data providung new insights with new sources of data.  There is the need to accelerate business process and insight, manage any data, any size, anywhere and enable pervasive insight.


The second announcement is project code name ‘Hekaton’ to accelerate in-memory for business process and insight. Project Hekaton brings an in-memory engine process to the transactional world. After converting tables to then in-memory engine there is almost a 10x performance increase. It allows recompiles of stored procedures so they will run in memory,  which could also offer a 30 times performance increase.
This is in the next release of sql.
The Hekaton AMR tool is designed to help identify hot spots in the database application and provide assistance to migrate things such as tables and stored procedures.


For Non-relational data Microsoft has released to CTP HDinsight Server. Microsoft’s Apache Hadoop based solutions for Windows Server and Windows Azure. http://www.microsoft.com/en-us/download/details.aspx?id=35397

SQL Server 2012 Parallel Data Warehouse
Microsoft announced the new SQL Server 2012 Parallel Data Warehouse. Queries that previously took 20 minutes to run only take 20 seconds. It offers an up to 50x performance gain with an optimized architecture. Due out in H1 2013.


Polybase is a breakthrough in data processing. It integrates Hadoop data and relational data to allow combined TSQL queries with an optimized architecture. It will allow future expansion to other data sets. Polybase will unify the relational and non-relational world. Built for big data and coming in H1 2013.

Microsoft’s strategy is to design an Enterprise platform for  integrating data. We have to learn one new concept and that is about joining up data coming from multiple sources.

Additional annoucements mentioned
  • Clustered column store index with updatable tables
  • Power View and PowerPivot fully integrated in Excel, now a complete BI tool
  • Fully interactive maps inside Excel
  • DAX queries on top of molap cubes

In conclusion data is bringing about lots of changes which adds richness  to the solutions.

PASS is running the first ever Business Intelligence conference in spring 2013 in Chicago.