Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Saturday, 16 March 2019

Data Relay Session Submission is Open

It is that time of year again already. Data Relay session submission is open. There is a great blog post talking about Why to speak at Data RelaySubmit your sessions and start your journey.

Sunday, 10 March 2019

Woman in Data Science (WiDS) Scotland

The Stanford University's Women In Data Science  (WiDS) initiative, The Data Lab, Turing's Testers along with their primary sponsor Mudano, have created an event that celebrates women, tech, innovation and codebreaking! This event brings together women data scientists and school girls to showcase what a data career looks like, and inspire the female data leaders of the future.  The aim is to inspire school girls to consider STEM and data related careers by bringing together the girls with inspiring women working in the field of data science and to expose them to some fun activities that are powered by data.  I have the privilege to attending the event, on behalf of my employer CGI, to participate as a mentor and to speak to the girls to share the wonders of working with data.

Women in Data Science is on 11 March 2019 at the National Museum of Scotland in Edinburgh. It is one of the fringe events of the UK’s first two week festival of Data Innovation in Scotland from the 11th to 22nd March 2019 and in its third year. DataFest will showcase Scotland's leading role in data science and artificial intelligence with networking from industry, academia and data enthusiasts. 

The Event details:

The Cyber Treasure Hunt has been created by Turing's Testers, a group of motivated pupils and STEM ambassadors; inspiring, engaging and supporting girls into the technology sector. This has been running over a number of months with codes being released every few weeks. For a school to gain invitation to this event they must crack the codes. This event will be the final code cracking session with the winner being announced at the end of the day.

The event will be broken into a number of sessions and vary between workshops and talks. Talks will be led by various female thought leaders from the world of tech, including none other than Hannah Fry.

Attendee spaces are limited at this event. We expect tickets to sell out quickly, however will have a waitlist and will inform you if you have secured a place. Attendees will have a hands on role as part of this event and will be asked to help out with mentoring and guidance to each of the groups throughout the day.

This event also plays part of DataFest, kicking off proceedings on the first day of the two week festival of data science.


10.00 - Registration
10.30 - First Rotation of Workshops and Talks
12.00 - Lunch
12.40 - Second Rotation of Workshops and Talks
14.10 - Prize Giving
15.00 - Closing Comments

Tuesday, 5 March 2019

International Women's Day

International Women's Day is fast approaching. It is celebrated on 8 March every year. It is a celebration of women globally. It is a chance to network, be inspired and share your stories to empower each other.
The Official UN theme for 2019 is Think Equal, Build Smart, Innovate for Change. The theme will focus on innovative ways to advance gender equality and the empowerment of women, particularly in the areas of social protection systems, access to public services and sustainable infrastructure.

I am excited to be contributing to a webinar for International Women's Day 2019. The webinar discussed women's roles in the field of technology.

Sunday, 3 March 2019

Tree of Learning sculpture

The Tree of Learning sculpture is a stunning celebration of the last 50 years of The Open University and an inspiration for the future. As part of the 50th Anniversary celebrations the “Tree of Learning” sculpture is being created and will be installed on campus, later in the Anniversary year. It will contain of hundreds of individually personalised gold-coloured OU logo-shaped shields hung as leaves on the tree.  
If you’ve studied with, worked with or been involved in some way with The Open University, it is a great way to still be a part of the amazing story to come. I am proud to have been a part of The Open University for over 13 years.

Saturday, 2 March 2019

SQL Bits 2019 Keynote

What an amazing SQLBits in Manchester. Four days packed full of leading edge data technology covering

  • SQL Server 2019 Big Data
  • Azure SQL Managed Database
  • Power BI
  • Kubernetes
  • Machine Learning
  • Python
  • Spark

This year SQLBits 2019 had a keynote.  It was nice for the event to have a keynote again. The theme Data Never Rests.  The Microsoft Data Platform Product group who spoke were Buck Woody, Bob Ward, Anna Thomas, Alain Dormehl, Adam Saxton and Patrick LeBlanc. An amazing set of speaks and fun keynote. They shared details of the evolution of the data platform to enable people to keep their skills up to date. The keynote is available to watch . There were several major announcements.

SQL Server 2019 will RTM in second half of the year. SQL Server 2019 CTP2.3 is available now with
  • Big data cluster enhancements
  • Accelerated database recovery
  • Performance enhancements
  • Graph data enhancements
  • SSAS enhancements

SQL Server 2019 is a modern innovation and there are various forms of the product.
  • On Premises
  • SQL Server Azure VM (IaaS)
  • Azure SQL DB Managed Instance (PaaS)
  • Azure SQL Data Warehouse

Azure SQL Database Hyperscale can autoscale up to 100TB and scale compute and storage independently.
During the keynote they showed Azure SQL Database Hyperscale where a 50TB database was restored in just under 8 minutes. That is nice accelerated database recovery.

Data virtualization and big data clusters is a game changing view with SQL Server 2019 big data clusters, data lake scale, machine learning and AI. Multiple data sources can be connected using external table, through the compute pool using Polybase connectors at the source.  Data persistence using multiple data sources is stored in shards of the data pool for SQL Server 2019 big data clusters data mart.

SQL 2019 will send push down predicated queries to other data platforms via Polybase to join SQL data with Oracle, Mongodb and CosmosDB data in one place efficiently.

SQL notebooks in Azure Data Studio is an awesome new feature. 

There is documentation to read and new courses for learning.


and a Summary of All Exams and Certifications Launched in January, 2019!


Friday, 1 March 2019

Data Relay 2019

Data Relay (formerly SQL Relay) announced their 2019 training conferences covering Microsoft Azure, Data, AI and Analytics. They are visiting the cities of Newcastle, Leeds, Nottingham, Birmingham and Bristol. There is a new website https://datarelay.co.uk/ to share all the latest news. The Twitter handle is @DataRelay_uk , the Linkedin group and Facebook page.

Thursday, 28 February 2019

Azure Data Architecture Guide

There is a useful guide to read which discusses the a structured approach for designing data-centric solutions on Microsoft Azure. The two different approaches are

Traditional RDBMS workloads.
These designs are for online transaction processing (OLTP) and online analytical processing (OLAP).

Big data solutions. This design looks at big data architecture to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. 

There is useful pages to read on machine learning at scale and non relational data

Wednesday, 27 February 2019

SQLBits 2019 The event

The first day of SQLBits 2019 at Manchester Central Convention Centre. What a great venue.

Monday, 25 February 2019

Monday, 11 February 2019

Data Trends for 2019

I created a survey question on Twitter to look at data trends. I was interested to see whether people felt that improving the quality of their data was more important than AI data ethics. Data quality is heavily influenced by data ingest so I added this as an option, as i felt it is often over looked, but is a foundation stone of good data quality. 

A few definitions:

Data Ethics describe a code of behaviour, specifically what is right and wrong, encompassing the following: Data Handling: generation, recording, curation, processing, dissemination, sharing, and use."  

Data Quality (DQ) as stated in the DAMA International, Data  Management Book of Knowledge  "Refers to both the characteristics associated with and to the processes used to measure or improve the quality of data.” Data is considered high quality to the degree it is fit for the purposes data consumers want to apply it."

Data ingestion is the process of obtaining and importing data  for immediate use or storage in a database. To ingest something is to "take something in or absorb something." Data can be streamed in real time or ingested in batches.”

Data ingestion tools provide a framework that allows companies to collect, import, load, transfer, integrate, and process data from a wide range of data sources.” 

The survey question had 267 votes.

What do you think will be the most important #Data trend for 2019 out of the following options?

In additions to the results above I received a few additional comments. 
  • Neither
  • The biggest thing in my opinion is just ethics. How is the data collected?
  • Also, what is it being used for. What are the impacts of high or low accuracy models.
  • Improving quality and ethics seem to me, to be related tasks
  • All of the above?

The results are quite interesting with AI Data Ethics and Improving Data Quality being the trends that the respondents thought were the most important.

Wednesday, 6 February 2019

Improved Microsoft Docs

A cool image from http://www.thinksinc.org/ about Microsoft Docs.

I was looking at the Microsoft Docs pages and its new design. I have found it is much easier to navigate which speeds up searching.

At the top of the page there are 3 helpful options 

  • Download SQL Server
  • Get an Azure VM with SQL Server
  • Download SQL Server Management Studio

Then the Microsoft SQL Documentation has 3 categories covering on premises and cloud.
  • SQL Server on Windows
  • SQL as an Azure Service
  • SQL Server on Linux
There are technology areas to drill down further.

Then a further collection of links to enable a deeper dive into the technology.

  • Design
  • Tools
  • Reference
  • Reporting
  • Data Analytics
  • AI and Machine Learning
I was looking for design documentation and the link takes you to a page with easy to select image and text.

Thursday, 24 January 2019

SQLBits 2019 is fast approaching

SQLBits 2019 is fast approaching. This year it is in Manchester 27 Feb - 2 March 2019. There is an informative article about The Great Data Heist. My insights on what to expect of the conference are here

There are some interesting training sessions on the Wednesday and Thursday to attend. These are

Wednesday 27 February 2019
with Alexander Klein and Gabi M√ľnster
with Itzik Ben-Gan
with Kalen Delaney
with Jason Horner
with Alberto Ferrari
with Mark Whitehorn and Kate Kilgour
with Kevin Kline, Richard Douglas, Andy Yun and Andy Mallon
Thursday 28 February 2019
with David Klee and Bob Pusateri
with Erik Darling
with Marco Russo
with Terry McCann and Simon Whiteley
with Theo van Kraay

I hope to see you there.

Friday, 18 January 2019

Data Science Activities linked to Business

Managing Big Data, AI and Data Science all need new processes and methods to be efficient. I am always on the look out for new tools that help refine my thinking and usage.  William Schmarzo shared the Hypothesis Development Canvas as tool to connect data science to the organization. It is to be used to develop business hypothesis.

He also shared thinking like a data scientist process.

Monday, 14 January 2019

The AI Journey

The AI Journey is a interesting blog post that discusses the pragmatic approach to AI and use, the pattern for AI and the journey. 

The patterns seen are for virtual agents, ambient intelligence, AI assisted professionals, knowledge mining and autonomous systems More details are discussed here.

The question of where to start is being asked in many circles and BI is still the foundation. Without good quality data there is no AI. The largest hurdle I think that needs to be overcome is data ingest quality.

Sunday, 13 January 2019

Thursday, 10 January 2019

Cloudera vision and strategy

Today the joint vision for the new Cloudera was shared. It was interesting to hear their strategy going forward. I was expecting to hear something revolutionary and new but seems very much the same as other companies at the moment.

Here is a summary of the points.

They will be the only provider to run across all cloud providers Azure, AWS, Google Cloud, IBM and Oracle. Both companies had the same vision to make the impossible possible, to transform data into clear and actionable insights and be committed to open source to give flexibility to its customers.
Cloudera want to

  • Invest in real time streaming at the edge
  • Be enterprise grade
  • Cloud native
  • A data warehouse
  • Provide AI industrialization
  • To deliver the industries first enterprise data cloud

They are developing the next generation platform called the Cloudera Data Platform. It will consist of

100% open source
The best of HDP3 + CDH 6
Hybrid and multi-cloud
Unified, from the edge to AI
Supported through till at least January 2022
Provide predictable and flexible migration paths
To separate compute and storage using technologies like Kubernetes
Have a consistent security ecosystem

There are two application changes:

The Cloudera Data Science workbench will now work with HDP.

HDF to work with CDH

Cloudera talked about the industrialization of AI which requires strategy, people and organization, security,governance and compliance and technology for an enterprise grade AI operation.

Cloudera have launched a new machine learning powered platform by Kubernetes. It is in preview.

Tuesday, 8 January 2019

GitHub announces free private repositories

GitHub has made two announcements to start the new year off.

  • Unlimited free private repositories for up to three collaborators per repository
  • GitHub Enterprise is the new unified product for Enterprise Cloud (formerly GitHub Business Cloud) and Enterprise Server (formerly GitHub Enterprise).

Friday, 4 January 2019

From the Edge to AI

Hortonworks completed their merger with Cloudera to make them the second largest open source software company in the world. The company is now called Cloudera. The combined platform will enable enterprises to create greater value from data with:

  • The right data analytics, running on data anywhere
  • Strong enterprise-grade and enterprise-wide data security, governance and management
  • Flexibility to choose among multi and hybrid clouds

There is a virtual event on 10 January from the edge to AI to hear about their vision and direction.

Wednesday, 2 January 2019

Microsoft Ignite | The Tour

The Microsoft Ignite | The Tour is at ExCeL in London 26-27 February 2019.

This conference is travelling round world to enable us to learn new ways to code, optimise your cloud infrastructure, and modernise your organisation with deep technical training.