Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday, 9 November 2018

SQLBits 2019: The Great Data Heist

SQLBits 2019 registration is open. Next year it runs between 27 February 2019 and 02 March 2019 at Manchester Central. There are many amazing reasons to attend this data conference. Hope to see you there.

Thursday, 8 November 2018

PASS Summit 2018 Day 2 Keynote

The day 2 keynote today was given by Microsoft Data Platform CTO Raghu Ramakrishnan on the internals of our next evolution in engine architecture which will form the foundation for the next 25 years of the Microsoft data platform.

It covered Azure SQL DB Hyperscale. The changing landscape of data has many challenges. How to leverage unbounded storage and elastic compute as well as the perennial problems: size of data operations are slow with long painful recovery times while masking network latencies.

There isn’t one database system that can do it all well. Users need to move data across systems which is slow and complicates governance.

The most challenging for state management is ACID properties, transactional updates, high velocity of data changes and lowest response times. These issue lead to the SQL Hyperscale.

There are various technical themes: full separation of compute and storage, the quorum (log) is complex, uniquely skewed access pattern and network simply extends the memory hierarchy. He shared a newsflash about Multi-Version Timestamp CC rules resulting from 2 phase locking and MVCC (Hekaton) and lock free data structures.

Persistent Version Store (PVS)
This technical in depth talk was packed full of technical content about SQL Hyperscale and I would recommend watching the recording about this new product and era of database delivery.

Wednesday, 7 November 2018

PASS Summit 2018 Keynote Day 1

The first keynote of PASS summit was delivered by Rohan Kumar entitled SQL Server and Azure Data Services: Harness the ultimate hybrid platform for data and AI

Customer priorities for a modernized data estate are: modernizing on-premises, modernizing to cloud, build cloud native apps and unlocking insights.

The announcements follow:

SQL Server 2019
SQL Server 2019 Public Preview  is a great way to celebrate the 25th anniversary of SQL Server

There is the introduction of big data clusters which combines Apache Spark and Hadoop into a single data platform called SQL Server. This combines the power of Spark with SQL Server over the relational and non-relation data sitting in SQL Server, HDFS and other systems like Oracle, Teradata, CosmosDB.

There are new capabilities around performance, availability and security for mission critical environments along with capability to leverage hardware innovations like persistent memory and enclaves.

Hadoop, ApacheSpark, Kubernetes and Java are native capabilities in the database engine.

Accelerated data recovery (ADR) was demonstrated and is incredible. It is at public preview.  The benefits of ADR are
  • Fast and consistent Database Recovery
  • Instantaneous Transaction rollback
  • Aggressive Log Truncation

Azure HDInsight 4.0

HDInsight 4.0 is now available in public preview.

There are several Apache Hadoop 3.0 innovations. Hive LLAP (Low Latency Analytical Processing known as Interactive Query in HDInsight) delivers ultra-fast SQL queries. The Performance metrics provide useful insight.

Integration with Power BI direct Query, Apache Zeppelin, and other tools. To learn more HDInsight Interactive Query with Power BI.

Data quality and GDPR compliance enabled by Apache Hive transactions
Improved ACID capabilities handle data quality (update/delete) issues at row level. This means that GDPR compliance requirements can now be meet with the ability to erase the data at row level. Spark can read and write to Hive ACID tables via Hive Warehouse Connector.

Apache Hive LLAP + Druid = single tool for multiple SQL use cases

Druid is a high-performance, column-oriented, distributed data store, which is well suited for user-facing analytic applications and real-time architectures. Druid is optimized for sub-second queries to slice-and-dice, drill down, search, filter, and aggregate event streams. Druid is commonly used to power interactive applications where sub-second performance with thousands of concurrent users are expected.

Hive Spark Integration
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector

There are several Apache Hadoop 3.0 innovations. Hive LLAP (Low Latency Analytical Processing called Interactive Query in HDInsight) for ultra-fast SQL queries. The Performance metrics provide useful insight.

Integration with Power BI Direct Query, Apache Zeppelin, and other tools. To learn more watch HDInsight Interactive Query with Power BI.

Better data quality and GDPR compliance enabled by Apache Hive transactions
Improved ACID capabilities handle data quality (update/delete) issues at row level. GDPR compliance requirements can now be meet with the ability to erase the data at row level. Spark can read and write to Hive ACID tables via Hive Warehouse Connector

Apache Hive LLAP + Druid = single tool for multiple SQL use cases

Druid is a high-performance, column-oriented, distributed data store, which is suited for user-facing analytic applications and real-time architectures. Druid is optimized for sub-second queries to slice-and-dice, drill down, search, filter, and aggregate event streams. Druid is commonly used to power interactive applications where sub-second performance with thousands of concurrent users are expected.

Hive Spark Integration
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector.

Apache HBase and Apache Phoenix
Apache HBase 2.0 and Apache Phoenix 5.0 get new performance and stability features and all of the above have enterprise grade security.

Azure event hubs for Kafka is generally available
Azure Data Explorer is in public preview.

Azure Databricks Delta is in public preview
  • Connect data scientist and engineers
  • Prepare and clean data at massive scales
  • Build/train models with pre-configured ML

Azure Cosmos DB multi master replication was demoed with a drawing app, Azure Cosmos DB PxDraw
Azure SQL DB Managed Instances will be at General Availability (GA) on Dec 1st. This provides Availability Groups managed by Microsoft.

Power BI

The new Dataflows is an enabler for self-service data prep in Power BI

Power BI Desktop November Update
  • Follow-up questions for Q&A explorerIt is possible to ask follow-up questions inside the Q&A explorer pop-up, which take into account the previous questions you asked.
  • Copy and paste between PBIX files
  • New modelling view makes it easier to work with large models.
  • Expand and collapse matrix row headers

Friday, 2 November 2018

Future Decoded Day 2

Live stream updates had this  great picture summary of the keynote.

The Day 2 Keynote at Future Decoded by Satya Nadella was inspiring. He talked around this simple self-evident formula Tech intensity = (Tech adoption) ^ Tech capability and the Intelligent Cloud and Intelligent Edge in an era of digital transformation.

In any society you need three actors for growth. You need government, academia and entrepreneurs & the private sector.

The core areas to consider and build on in the future

We need to protect privacy as a basic fundamental human right. Trust and GDPR are important to achieve this.

We need to act with collective responsibility across the tech to help keep the world safe. Cyber Security threat detection and removal are core to have embedded in any platform. Microsoft have been leading the Tech Accord.

Ethical AI 
We need to ask ourselves not only what computers can do, but what computers should do

Thursday, 1 November 2018

Future Decoded The AI Future

Future Decoded in London ExCel 31 October - 1 November is an exciting place to be. The event is packed full of AI innovations.  AI is groundbreaking and will change the face of the market place. It needs substantial learning for business and people to maximise its capability. Three takeaways from today

Maximising the AI Opportunity

Artificial intelligence is changing the UK so fast that nearly half of today's business model won’t exist by 2023, a new Microsoft report has revealed. The article can be read here
UK companies at risk of falling behind due to a lack of AI strategy, Microsoft research reveals
and the report  Maximising the AI Opportunity shares insights on the potential of AI - including Skills & Learning - based on a survey & interviews with 1000s of UK leaders

Microsoft AI Academy
A new addition to Microsoft's commitment to advancing Digital skills in the UK, the Microsoft AI academy will run face-to-face and online training sessions for business and public sector leaders, IT professionals, developers and start ups.


Microsoft Research and Cambridge University

Some amazing news from Microsoft is that is partnering with the University of Cambridge to boost the number of AI researchers in the UK.The Microsoft Research-Cambridge University Machine Learning Initiative will provide support for Ph.D. students at the world-leading university, and offer a postdoctoral research position at Microsoft Research Lab, Cambridge . Our aim is to realise artificial intelligence’s potential in enhancing the human experience and to nurture the next generation of researchers and talent in the field.

Read More:
Microsoft Research and Cambridge University strengthen their commitment to AI innovation and the field’s future leaders

Wednesday, 31 October 2018

Always Encrypted with Secure Enclaves

SQL Server 2019 preview Always Encrypted uses an enclave technology called Virtualization Based Security (VBS) memory enclaves. A VBS enclave is an isolated region of memory within the address space of a user-mode process.

The capabilities this brings are

  • In-place encryption. Encrypt column, rotate a column encryption key, or change an encryption type of a column, without moving your data out of the database
  • Rich computations. The engine can delegate some operations on encrypted database columns to the enclave. It can decrypt the sensitive data and execute requested operations in a query on plain text values.

Always Encrypted with secure enclaves allows computations on plain text data inside a secure enclave on the server side. Microsoft define a secure enclave as a protected region of memory within the SQL Server process. It acts as a trusted execution environment for processing sensitive data inside the SQL Server engine. A secure enclave is a black box to SQL Server and other processes on the server. It is not possible to view any data or code inside the enclave from the outside, even with a debugger.

You can now try and evaluate Always Encrypted with secure enclaves in the preview of SQL Server 2019.

This shows what an admin would see when browsing the enclave memory using a debugger (note the question marks, as opposed to the actual memory content).


Always Encrypted with Secure Enclaves – Try It Now in SQL Server 2019 Preview! 
Always Encrypted with Secure Enclaves

Monday, 29 October 2018

Open Data Initiative

At Microsoft Ignite a groundbreaking partnership was announced with a new vision for renewable data and intelligent applications. It is a jointly developed vision by Adobe, Microsoft, and SAP to deliver unparalleled business insight from your behavioral, transactional, financial, and operational data. It provides a single view of data built on one data model, artificial intelligence driven insights and an open extensible platform.

Announcing the Open Data Initiative

Friday, 26 October 2018

Machine Learning on Azure

At Microsoft Ignite there were many data announcements. Azure AI is another such area that covers the next wave of innovation aimed at transforming business. There are 3 solution areas. 

Predictive models to optimise business process

These are a set of pretrained models for Azure Cognitive Services and ONNX (Open Neural Network Exchange) that enables model interoperability across frameworks. Machine Learning is available with Azure Databricks, Azure Machine Learning and Machine Learning VMs

AI powered apps to integrate vision, speech and language

There are now services specifically designed to help build AI powered apps & agents.

Knowledge mining to uncover insight from documents
There is valuable information hidden in documents, forms, pdfs and images. Azure Cognitive Search (in preview) adds Cognitive Services on top of Azure Search. 


Azure AI – Making AI real for business

Wednesday, 24 October 2018

Azure SQL Database Hyperscale in preview

Azure SQL Database Hyperscale is a new highly scalable service tier.  It adapts on demand to different workloads and auto-scales up to 100 TB per database. This eliminates the need to pre-provision storage resources. This new service tier provides the ability to scale compute and storage resources independently, giving the flexibility to optimize performance for  workloads. Azure SQL Database Hyperscale will initially be available for single database deployments. It is useful to not be limited by storage size for apps.

Further Reading
Announcing Azure SQL Database Hyperscale public preview

Saturday, 20 October 2018

Microsoft Learn

There is a new approach to learning, with hands-on training at Microsoft Learn.  At the Ignite keynote, Scott Guthrie announced the availability of Microsoft Learn. This new learning site says it will help you achieve your goals faster. Microsoft have launched more than 80 hours of learning for Azure, Dynamics 365, Power BI, PowerApps, and Microsoft Flow. IT is module based. The new learning platform should help up-level your skills, prepare for new role-based certification exams, and explore additional training offerings such as instructor-led training and Pluralsight.  There are 2 tracks so far. Learn Azure and Learn Business Applications. To save the progress of your learning you need to first logon and your username, display name, achievements and activities will be publically visible https://techprofile.microsoft.com/en-gb/

There are learning learning options. Microsoft Virtual Academy (MVA) which is video based. EDX also provides other courses where you can learn about Microsoft technologies and follow  different career paths.

Monday, 15 October 2018

The Data Relay Journey

This year I was Head of Marketing and Social Media for Data Relay, the conference previously known as SQL Relay. It is a privilege to be able to help put on an event which in 5 consecutive days, travels to 5 different UK cities across the breadth of the country. This year we invested a lot of time trying to improve the conference. We re-branded to Data Relay to be more in keeping with the breadth of the Microsoft Data Platform. We introduced a Code of Conduct to help encourage diversity at the conference and several other things. Our aim is to keep improving the conference.

I was the Bristol Venue Event Owner and this time delivered a session about the end to end process of data management, things to consider when improving data quality and data science in industry, based on findings from my PhD research. The session also covered data collection areas, that are often led by marketing teams. I shared details of my research findings about the complexity of managing database systems, the use of the Microsoft data platform for research and the possible future AI developments to help people manage database systems with greater ease.

Thursday, 11 October 2018

Ignite 2018 key announcements

The Microsoft Ignite 2018 news and highlights from the event.

Research skills for industry experts

It is great to see the name of the SQL Relay conference change to Data Relay. The conference encompasses the full breath of the Microsoft data platform and provides free Microsoft Data, AI & Analytics training conferences on your doorstep.

I am privileged to be speaking on Friday at Data Relay in Bristol to share my experience of providing high quality data analytics for my research using the Microsoft Data Platform. 

Monday, 1 October 2018

SQL Relay 2018

I am speaking at SQL Relay 2018 in Bristol, Friday 12 October. My session is: Research skills for industry experts.

It is becoming ever increasing the need to present analytic outcomes. Analytics are only ever as good, as the robustness of the data collection and analysis. This session will cover the raft of research skills that can be applied in industry to improve the quality of your investigative work.

Sunday, 30 September 2018

MVP Wall

At Microsoft Ignite 2018 Microsoft devoted an entire wall to list all the names of the MVPs. I felt very humbled to have my name on the MVP wall with so many amazing people. It is such a privilege to be a part of the Microsoft Data Community. #datafamily #MVPbuzz


And there is my name.

Tuesday, 25 September 2018

SQLBits 2019

SQLBits 2019 has been announced. It is in the heart of Manchester. The last time it was in Manchester was in 2009. I am already excited about this next event. 

Monday, 24 September 2018

Azure SQL Database Managed Instance GA

At Microsoft Ignite it was announced that Azure SQL Database Managed Instance will be general availability on October 1, 2018.

Azure SQL Database Managed Instance is a deployment model of Azure SQL Database.  This service enables customers to migrate existing databases to a fully managed PaaS cloud environment.  It is possible to use the Data Migration Service (DMS) in Azure to lift and shift their on-premises SQL Server.  This can be a useful tool to use for secure  databases that reduces the management overhead. including automatic patching and version updates, automated backups and high-availability. 

Azure SQL Database Managed Instance, General Purpose tier general availability
Azure Database Migration Service and tool updates – Ignite 2018

SQL Server 2019, Big Data and AI

At Microsoft Ignite SQL Server 2019 was launched. An amazing product for the future combining SQL Server 2019 with big data and analytics. It is great to see the combining of multiple tools in once place, a one stop shop for large and small data, structured and unstructured and from multiple sources.

There are 3 major components to SQL Server 2019.

The creation of a data virtualization layer that handles complexity of all data sources and format.  Enabling the integration of structured and unstructured data without moving the data.

The streamlining of data management with SQL Server 2019 big data clusters deployed in Kubernetes integrating HDFS and Spark. The architecture is explained in more depth here and looks like

The creation of a complete AI platform that can use Spark to analyse both structured and unstructured data anywhere, use SQL Server machine learning services and SparkML.

In summary SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS Docker containers running on Kubernetes.

Read More

Sunday, 23 September 2018

Microsoft Ignite - watch live

Microsoft Ignite is happening this week. Unfortunately I won't be there but the keynote and some deep dive sessions will be streamed live. I am looking forward to seeing what Microsoft CEO Satya Nadella shares as his vision for the future of tech. I will be interested to see what other tools and technologies will play an important part in the next year and be excited to see how data fits into this forthcoming vision. 

To watch the live stream, the meeting invite is for

Start Time: 09:00 - 17:15 (UK time 14:00)
Date: Monday 24 September 2018
Time Zone: Eastern Time (US and Canada) 

Friday, 14 September 2018

Azure Cosmos DB multi-model database

Azure Cosmos DB has to be one of my favorite databases due to the breadth of available database types, its choice of consistency models and elastic scale out.

An introduction can be read here.

A definition for each of these types of databases is given.

A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data. Key-value pairs are frequently used in lookup tables, hash tables and configuration files.

A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row.

Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data.That means records do not need to have a uniform structure, i.e. different records may have different columns. The types of the values ​​of individual columns can be different for each record. Columns can have more than one value (arrays). Records can have a nested structure. E.g. MongoDB

A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships. Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs.

The five consistency levels offer predictable low latency guarantees and multiple well-defined relaxed consistency models.

Consistency Levels and guarantees

Consistency Level
Linearizability. Reads are guaranteed to return the most recent version of an item.
Bounded Staleness
Consistent Prefix. Reads lag behind writes by at most k prefixes or t interval
Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads
Consistent Prefix
Updates returned are some prefix of all the updates, with no gaps
Out of order reads

There is a useful capacity planer that looks at request units throughput per second, request unit consumption and the amount of data storage needed by your application.

Thursday, 13 September 2018

Hortonworks Data Analytics Studio and Open Hybrid Architecture

Hortonworks has announced the general availability of Hortonworks Data Analytics Studio (DAS). A new service to enable enhanced productivity of business analysts by delivering faster insights from data at scale. DAS is part of the Hortonworks DataPlane Service (DPS). DPS enables businesses to discover, manage, govern and now optimize their data spread across hybrid environments. DAS leverages open-source technologies such as Apache Hive to share and extend the value of a modern data architecture in heterogeneous environments. It includes a useful database heat map.

Hortonworks have also shared the Open Hybrid Architecture Initiative, designed to enable big data workloads to run in a hybrid manner across on-premises, multi-cloud and edge architectures.

The Open Hybrid Architecture initiative will

  • De-coupling storage, with both file system interfaces and an object-store interface to data.
  • Containerizing compute resources for elasticity and software isolation.
  • Sharing services for metadata, governance and security across all tiers.
  • Providing DevOps/orchestration tools for managing services/workloads via the “infrastructure is code” paradigm to allow spin-up/down in a programmatic manner.
  • Designating workloads specific to use cases such as EDW, data science, rather than sharing everything in a multi-tenant Hadoop cluster.