Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Sunday 30 September 2018

MVP Wall

At Microsoft Ignite 2018 Microsoft devoted an entire wall to list all the names of the MVPs. I felt very humbled to have my name on the MVP wall with so many amazing people. It is such a privilege to be a part of the Microsoft Data Community. #datafamily #MVPbuzz


And there is my name.

Tuesday 25 September 2018

SQLBits 2019

SQLBits 2019 has been announced. It is in the heart of Manchester. The last time it was in Manchester was in 2009. I am already excited about this next event. 

Monday 24 September 2018

Azure SQL Database Managed Instance GA

At Microsoft Ignite it was announced that Azure SQL Database Managed Instance will be general availability on October 1, 2018.

Azure SQL Database Managed Instance is a deployment model of Azure SQL Database.  This service enables customers to migrate existing databases to a fully managed PaaS cloud environment.  It is possible to use the Data Migration Service (DMS) in Azure to lift and shift their on-premises SQL Server.  This can be a useful tool to use for secure  databases that reduces the management overhead. including automatic patching and version updates, automated backups and high-availability. 

Azure SQL Database Managed Instance, General Purpose tier general availability
Azure Database Migration Service and tool updates – Ignite 2018

SQL Server 2019, Big Data and AI

At Microsoft Ignite SQL Server 2019 was launched. An amazing product for the future combining SQL Server 2019 with big data and analytics. It is great to see the combining of multiple tools in once place, a one stop shop for large and small data, structured and unstructured and from multiple sources.

There are 3 major components to SQL Server 2019.

The creation of a data virtualization layer that handles complexity of all data sources and format.  Enabling the integration of structured and unstructured data without moving the data.

The streamlining of data management with SQL Server 2019 big data clusters deployed in Kubernetes integrating HDFS and Spark. The architecture is explained in more depth here and looks like

The creation of a complete AI platform that can use Spark to analyse both structured and unstructured data anywhere, use SQL Server machine learning services and SparkML.

In summary SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS Docker containers running on Kubernetes.

Read More

Sunday 23 September 2018

Microsoft Ignite - watch live

Microsoft Ignite is happening this week. Unfortunately I won't be there but the keynote and some deep dive sessions will be streamed live. I am looking forward to seeing what Microsoft CEO Satya Nadella shares as his vision for the future of tech. I will be interested to see what other tools and technologies will play an important part in the next year and be excited to see how data fits into this forthcoming vision. 

To watch the live stream, the meeting invite is for

Start Time: 09:00 - 17:15 (UK time 14:00)
Date: Monday 24 September 2018
Time Zone: Eastern Time (US and Canada) 

Friday 14 September 2018

Azure Cosmos DB multi-model database

Azure Cosmos DB has to be one of my favorite databases due to the breadth of available database types, its choice of consistency models and elastic scale out.

An introduction can be read here.

A definition for each of these types of databases is given.

A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data. Key-value pairs are frequently used in lookup tables, hash tables and configuration files.

A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row.

Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data.That means records do not need to have a uniform structure, i.e. different records may have different columns. The types of the values ​​of individual columns can be different for each record. Columns can have more than one value (arrays). Records can have a nested structure. E.g. MongoDB

A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships. Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs.

The five consistency levels offer predictable low latency guarantees and multiple well-defined relaxed consistency models.

Consistency Levels and guarantees

Consistency Level
Linearizability. Reads are guaranteed to return the most recent version of an item.
Bounded Staleness
Consistent Prefix. Reads lag behind writes by at most k prefixes or t interval
Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads
Consistent Prefix
Updates returned are some prefix of all the updates, with no gaps
Out of order reads

There is a useful capacity planer that looks at request units throughput per second, request unit consumption and the amount of data storage needed by your application.

Thursday 13 September 2018

Hortonworks Data Analytics Studio and Open Hybrid Architecture

Hortonworks has announced the general availability of Hortonworks Data Analytics Studio (DAS). A new service to enable enhanced productivity of business analysts by delivering faster insights from data at scale. DAS is part of the Hortonworks DataPlane Service (DPS). DPS enables businesses to discover, manage, govern and now optimize their data spread across hybrid environments. DAS leverages open-source technologies such as Apache Hive to share and extend the value of a modern data architecture in heterogeneous environments. It includes a useful database heat map.

Hortonworks have also shared the Open Hybrid Architecture Initiative, designed to enable big data workloads to run in a hybrid manner across on-premises, multi-cloud and edge architectures.

The Open Hybrid Architecture initiative will

  • De-coupling storage, with both file system interfaces and an object-store interface to data.
  • Containerizing compute resources for elasticity and software isolation.
  • Sharing services for metadata, governance and security across all tiers.
  • Providing DevOps/orchestration tools for managing services/workloads via the “infrastructure is code” paradigm to allow spin-up/down in a programmatic manner.
  • Designating workloads specific to use cases such as EDW, data science, rather than sharing everything in a multi-tenant Hadoop cluster.

AI the art of the possible

During SQL Saturday Cambridge I attended a session by Terry McCann on using AI to write a session submission to SQL Saturday. It was a great session and I would recommend you attending it if you get a chance.

In the new data world it is important to understand the difference between AI, Machine Learning and Deep Learning.

Then breaking this down further the differences between how machine learning works and deep learning is shown here. Deep learning is really a black box.

Image : https://www.upwork.com/hiring/for-clients/log-analytics-deep-learning-machine-learning/

Terry mentioned a free book to read to learn more Neural Networks and Deep Learning. 

He mentioned also the book Harry Potter and the Portrait of what Looked Like a Large Pile of Ash which was written by an AI bot.  I hadn't come across this before but it lets you see the art of the possible in the future.

Image: https://imgur.com/gallery/gkLFz

Wednesday 12 September 2018

Azure Data Factory Data Flows

There is a preview for a new feature called 'Data Flows' that enables visual data transformation capability within Azure Data Factory (ADF). This is scaled out using Databricks. This is available in the ADF Resource Explorer. Data Flow should enable a deeper understanding of your data. Monitoring and inspecting pipelines can be done in ADF. The transformations occur inside Azure Databricks which uses the Spark engine to distribute data across partitions. 


Microsoft Azure Data Factory documentation for Data Flow during preview is available here: https://aka.ms/adfdataflowdocs

Tuesday 11 September 2018

Azure DevOps Services

Microsoft have announced new Azure DevOps services than span the breadth of the development lifecycle to help developers ship software faster and with higher quality. Azure DevOps includes:

CI/CD that works with any language, platform, and cloud. Connect to GitHub or any Git repository and deploy continuously. 
Powerful work tracking with Kanban boards, backlogs, team dashboards, and custom reporting. 
Maven, npm, and NuGet package feeds from public and private sources. 
Unlimited cloud-hosted private Git repos for your project. Collaborative pull requests, advanced file management, and more. 
All in one planned and exploratory testing solution. 

PowerBI Report Server

Power BI Report Server is similar to SQL Server Reporting Services (SSRS). Power BI Report Server is on premises, and hosts paginated reports in addition to the Power BI features. There are several types of reports
  • Paginated (standard SSRS type reports)
  • Interactive (PowerBI Desktop)
  • Mobile
  • Analytical (Excel)

PowerBI Pro enables sharing and collaboration with the following features 
  • Build dashboards that deliver a 360-degree, real-time view of the business
  • Keep data up-to-date automatically, including on-premises sources
  • Collaborate on shared data
  • Audit and govern how data is accessed and used
  • Package content and distribute to users with app

There is increased functionality within PowerBI Pro. A comparison between the free PowerBI desktop edition and Pro version is in the table below. Only Pro users can publish content to app workspaces, consume apps without Premium capacity, share dashboards and subscribe to dashboards and reports. Free users can connect to all data sources using connectivity options such as DirectQuery, live connection and use the data gateway. 

Licensing Power BI Report Server

Power BI Report Server is available in two different license modes: 
A Power BI Premium license enables creation of a hybrid deployment (cloud and on-premises).


The session delivered by Isabelle Van Campenhoudt at SQL Saturday Cambridge shared this useful capacity planning  paper. It is based on the sample Power BI Report Server topology using virtual machines. 

Monday 10 September 2018

Changing our concept of the world

In Ackoff’s Best His Classic Writings on Management (1999), he shares a way of thinking that as fascinated me for years. It can be used for problem solving and has helped me manage database systems for years. It is about looking at things in a holistic way.  This way of thinking, synthesis, often yields contradictory or conflicting results to traditional analytical thinking. Systems thinking, as it is known, helps expand your viewpoint considering the holistic nature of the environment. Ackoff shared the difference between the two modes of thinking: 

Machine Age Thinking (Analysis)
Synthesis (Systems Thinking)
Analysis precedes synthesis
Synthesis precedes analysis
Focuses on structure
Focuses on function
Reveals how things work
Reveals why things operate as they do
Yields knowledge
Yields understanding
Enables us to describe
Enables us to explain
Looks into things
Looks out of things
Iteration of the parts of the thing to be explained
Iteration of the parts of the thing to be explained, interactions between things in its environment and with its environment itself concerned with functional interaction of the parts of system
Break down large complex problems into solvable or manageable parts outputs assembled into a solution of the whole
Some of the best solution obtained from the parts taken separately is not the best solution of the whole.
The best performance of the whole can be reduced to the sum of the best performance of its parts taken separately
Asserts this is not possible
Reduces the focus of the investigator
Expands it focus of the investigator

As a result, I never look only at the immediate problem but consider all of the interconnecting parts.

Sunday 9 September 2018

SQLSaturday Cambridge 2018 - Britannia Edition!

I attended PASS SQL Saturday Cambridge 8 September 2018. This was held at The Møller Centre, Churchill College, Storey's Way, Cambridge. Another great training day packed full of lots of new and exciting data technology.

I attended some amazing sessions. These were

  • Data Classification in SQL Server and Azure SQL Database - Mark Pryce-Maher
  • Azure Cosmos DB - What you need to know to build globally distrib - Satya SK Jayanty
  • Using AI to write session submission to SQLSaturday - Terry McCann
  • Power BI server and Office Online server, modernize your on-premises BI approach. - Isabelle Van Campenhoudt
  • Bricking it: An Introduction to Azure Databricks - Simon Whiteley 
  • SQLOpsStudio Vs SSMS - There can be only one - Warwick Rudd
  • Branding Yourself for a Dream Job - Steve Jones

There was some very interesting learning, that I took away, which I will write about in a follow up blog.

Saturday 8 September 2018

SQL Relay: the travelling conference

Registration is open for SQL Relay. This is a conference with a difference. It tours round the UK for 5 consecutive days bringing international and MVP speakers to your local area. It is an amazing conference to attend with lots of sessions to choose from. This years events are at

Newcastle - Monday 8 October

Leeds - Tuesday 9 October

Birmingham - Wednesday 10 October

Reading - Thursday 11 October

Bristol - Friday 12 October