Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Tuesday 24 December 2019

Christmas BINGO

Here is something to keep you amused whilst reading those technical articles over Christmas.

How many articles do you have to read to say BINGO?

T: @victoria_holt

Thursday 19 December 2019

SQLBits 2020

SQLBits 2020 is between 31st March – 4th April at the ExCel Centre, London. This year it is 5 days. Registration is open.

The agenda is:

Tuesday 31st March: Full day training (track registration mandatory)
Wednesday 1st April: Full day training (track registration mandatory)
Thursday 2nd April: Multiple 50 minute sessions (choose agenda on the day)
Friday 3rd April: Multiple 50 minute sessions (choose agenda on the day)
Saturday 4th April: FREE - Multiple 50 minute sessions (choose agenda on the day)

The training days have been announced. 

Tuesday 31 March 2020

A Day Full of Azure Data Factory with Cathrine Wilhelmsen
Mastering Index Tuning with Brent Ozar
A to Z Azure Cosmos DB with Hasan Savran
Getting started with Azure Machine Learning with Nico Jacobs and Aniek Sies
Further your Power BI development skills with Ásgeir Gunnarsson and Bent Nissen Pedersen

Wednesday 1 April 2020

Open Hack: From ingestion to consumption with Stijn Wynants, Steve Verschaeve and Rick Meijvogel
Succeed with DevOps as a DBA with Grant Fritchey

Sunday 1 December 2019

Azure HDInsight Learning Path

Microsoft Learn enables you to gain hands on learning for different products. Azure HDInsight is one of those such courses. The course entitled  Building Open Source Software (OSS) Analytical Solutions with Azure HDInsight has various modules:

  • Introduction to the Open source Analytics Offering
  • Choose the correct HDInsight Configuration to build open source analytical solutions
  • Creating and configuring a HDInsight cluster
  • Perform Zero ETL analytics with HDInsight Interactive Query
There is such a wealth of information on the site that is worth exploring including learning paths, certification and docs. It is great to be able to learn at your own pace.

Saturday 16 November 2019

The Tree of Learning is now fully grown

As part of the last 50 years of life changing education the Tree of Learning is now fully grown and installed in the Betty Boothroyd Library on Walton Hall Campus in Milton Keynes. It reflects on the last 50 years of life-changing education. As an alumni I supported this great cause to help provide potential students the opportunity to access higher education by supporting the 50th Anniversary Scholarships Fund. The Fund aims to offer free University education to 50 carers. Each donor could choose to be recognized and have a shield personalized with their name and a year that is significant to their OU journey.  Over 7200 people donated and choose to have a shield like myself. The Tree is 22-foot (6.7m). The Tree is organised with the earliest dates, the 1970s at the bottom moving through the decades to the most recent years at the top of the Tree. 

Tuesday 12 November 2019

The Fourth Industrial Revolution Report 2019

The fourth industrial revolution report 2019 is independent research, commissioned by Big Data LDN and sponsored by Cloudera, surveyed 500 of the UK’s most influential data leaders.

Key Fourth Industrial Revolution Report 2019 findings include:

  1. The human barriers to data-driven culture - The majority (86%) of UK data experts feel a lack of enthusiasm and support prevents creating a data-driven culture.
  2. Lack of data visualisation holds UK organisations back - 45% of UK data leaders see the lack of data visualisation skills as the biggest barrier to business requirements.
  3. Data leaders investing in humans to cross the skills chasm - Over half of UK organisations bridge the business (59%) and technology (57%) skills gap by upskilling employees.
  4. AI and ML slow down - Despite AI and ML being considered as the technology of the moment, this year’s respondents have cited a 21% decrease in its usage.
  5. GDPR still top concern - GDPR is still the dominating regulation for many (46%) UK data experts - its power over UK organisations’ data governance programs grows stronger each year. 
Download the report here

Friday 8 November 2019

Big Data LDN 2019

Register for free for The UK's largest data & analytics conference and exhibition https://bigdataldn.com/register

Plan your two days at Big Data LDN. Create your own personalised itinerary, search speaker & exhibitor profiles and network with other delegates. https://guidebook.com/g/bigdataldn19/

Wednesday 6 November 2019

Book of News Ignite 2019

This is the second edition of the Microsoft Ignite Book of News. It is a guide to all the announcements that Microsoft are making.  The book makes its easy to navigate through all the latest information and the vast number of exciting new products and features.

Monday 4 November 2019

Day 1 Microsoft Ignite 2019

Microsoft Ignite day 1 shared many exciting announcements. three of them are

SQL Server 2019 is now generally available.
SQL Server 2019 brings enhancements in the core SQL engine, offers a scale-up and scale-out system with built in support for Big Data (Apache Spark, Data Lake), state of the art data virtualization technology, and with built-in machine learning capabilities.
Get the free e-book on SQL Server 2019.

Simply unmatched, truly limitless:  Azure Synapse Analytics

Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives the freedom to query data , using either serverless on-demand or provisioned resources at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.

Azure services run anywhere with new hybrid capabilities: Azure Arc

Azure Arc is a set of technologies that unlocks new hybrid scenarios for customers by bringing Azure services and management to any infrastructure. Azure Arc is available in preview.

HDFS tiering in SQL Server Big Data Clusters

SQL Server Big data clusters has its own local HDFS built-in data lake to enable the storing of unstructured data and high volume data.  This data virtualization capability has a feature called HDFS tiering. It is a major new contribution to the Apache Hadoop project. 

With HDFS tiering you can access other data lakes by mounting the remote HDFS/S3 compatible data source to your local HDFS data lake. Access is seamlessly available from SQL Server or Apache Spark. Currently you can mount the following storage: Azure Data Lake Storage Gen2, AWS S3, Isilon, StorageGRID and Flashblase.

Wednesday 30 October 2019

Two Great Conferences

Next week will be a really exciting week with 2 great conferences
November 4–8, 2019 | Orlando, FL

Learn innovative ways to build solutions and migrate and manage your infrastructure. Connect with over 25,000 individuals focused on software development, security, architecture, and IT. Explore new hands-on experiences that will help you innovate in areas such as security, cloud, and hybrid infrastructure and development.

PASS Summit
November 5 - 8, 2019 | Seattle, WA

The summit has 3 tracks architecture, data management and analytics. There is interactive training on the latest technologies and spotlights on hot topics such as security, cloud, and AI will be led by the best data minds in technology.

Tuesday 29 October 2019

Introducing SQL Server 2019

SQL Server 2019 is the latest version of  SQL Server. It redefines SQL Server from a traditional relational database system to a data platform for every data scenario from OLTP to DW to now big data and analytics. 

There is a great video on Channel 9 which gives a quick overview of all the new things in SQL Server 2019.

Sunday 20 October 2019

Cloud Migration Strategy

I had to share this article because of the picture. This article looks at the merits of each migratory path to plan your journey. The options discussed are: 
  • Lift and Shift
  • Evolve - between lift and shift and a full rebuild
  • Go Native - Cloud-native apps are designed for the cloud, so assume that the infrastructure they run on is inherently unreliable, but also controllable. They should be largely self-aware, self-scaling and self-healing. Most of the functionality is simply consumed rather than written into the code.

Friday 18 October 2019

European Spark and AI summit 2019

The keynotes for the European Spark and AI summit 2019 which took place 16 October 2019 in Amsterdam are online to watch if you missed the event.  A few of the topics are

Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems
New Developments in the Open Source Ecosystem
Saving Energy in Homes with a Unified Approach to Data and AI

Simplifying Model Management with MLflow
Scalable AI for Good
Forecasting 'What-if' Scenarios in Retail Using ML-Powered Interactive Tools
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Imaging the Unseen: Taking the First Picture of a Black Hole

Some Annoucements

Databricks Simplifies Machine Learning Model Management At Scale With MLflow Model Registry
Databricks brings its Delta Lake project to the Linux Foundation

A design pattern is emerging called 'Lake House'. This is a pattern where Spark is not just replacing Hadoop and ETL, but also associated with data warehouses, business intelligence and reporting. 

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Wednesday 16 October 2019

After Data Relay

Data Relay has finished for the 2019 season. This year I was involved as Head of Marketing and was the Bristol event owner again. I enjoyed being a part of the Data Relay team. It is great to be able to put on an event with free training across the breadth of the Microsoft Data Platform.  

I also presented on “The big data and database management paradigm shift”  The abstract

With the emergence of big data this has created a new paradigm for data and database management.  Structured and unstructured data must now work together to produce actionable insights. This session will share details on how to embrace this new era of database management, to prepare you for SQL Server 2019.

The slides summary 


Wednesday 9 October 2019

Apache Spark Under the Hood Free ebook

Databricks has created an ebook to share excerpts from the book, Spark: The Definitive Guide
In this eBook, it cover:
  • The past, present, and future of Apache Spark.
  • Basic steps to install and run Spark yourself.
  • A summary of Spark's core architecture and concepts.
  • Spark's powerful language APIs and how you can use them.

Monday 7 October 2019

Data Relay starts today

The week has finally arrived and it is Data Relay 2019. The agendas are here:

#Newcastle Mon 7 https://bit.ly/2mktU73 #Leeds Tues 8 https://bit.ly/2kP99jh  #Nottingham Wed 9 https://bit.ly/2kkct5P #Birmingham Thurs 10 https://bit.ly/2lRh6F2  #Bristol Fri 11 https://bit.ly/2kkcwi1

The sessions are covering a raft of exciting topics.

Friday 4 October 2019

Two New Datasets to Improve Natural Language Understanding Models

Google’s PAWS data set helps AI models capture word order and structure. Yuan Zhang, Research Scientist and Yinfei Yang, Software Engineer, Google Research posted: Read more

Word order and syntactic structure have a large impact on sentence meaning — even small perturbations in word order can completely change interpretation. For example, consider the following related sentences:

Flights from New York to Florida.
Flights to Florida from New York.
Flights from Florida to New York.

All three have the same set of words. However, 1 and 2 have the same meaning — known as paraphrase pairs — while 1 and 3 have very different meanings — known as non-paraphrase pairs. The task of identifying whether pairs are paraphrase or not is called paraphrase identification, and this task is important to many real-world natural language understanding (NLU) applications such as question answering Read on

Monday 30 September 2019

Accelerating Competitive advantage with AI

Microsoft have released a new report Accelerating Competitive advantage with AI.  The report highlights that the UK is well placed to succeed.  To download the report https://aka.ms/AcceleratingAI

The initial report also worth reading was entitled Maximizing the AI opportunity: How to harness the potential of AI effectively and ethically. To download that report https://aka.ms/UKAIreport 

The anatomy of an AI enabled organisation is explained in the diagram below.

Thursday 19 September 2019

Data Relay 2019

Data Relay is only a few weeks away. The agendas for the events are:

The Newcastle agenda  https://bit.ly/2mktU73 Register now for the event Mon 7 http://bit.ly/datarelaynew

The Leeds agenda https://bit.ly/2kP99jh Register now for the event Tues 8 http://bit.ly/datarelayleeds

The Nottingham agenda https://bit.ly/2kkct5P Register now for the event Wed 9 http://bit.ly/datarelaynott 

The Birmingham agenda https://bit.ly/2lRh6F2 Register now for the event Thurs 10 http://bit.ly/datarelaybham

The Bristol agenda  https://bit.ly/2kkcwi1 Register now for the event Fri 11 http://bit.ly/datarelaybsl

Saturday 7 September 2019

The big data and database management paradigm shift

I am really pleased to be speaking at Data Relay this year in Bristol on Friday 11 October. The agenda is here . The abstract for my session is below

With the emergence of big data this has created a new paradigm for data and database management.  Structured and unstructured data must now work together to produce actionable insights. This session will share details on how to embrace this new era of database management, to prepare you for SQL Server 2019.

Thursday 5 September 2019

The Mirage and Metamorphosis of Data and AI

I have just taken a break to rejuvenate my creative juices. It was a time to reflect and innovate. We are often so busy in our day to day lives we don't stop and reflect. I spent my time reading and catching up on bleeding edge technology. I am always fascinated to see what is coming next, what problems researchers are trying to address and how Data and AI could be utilised to benefit industry and the world around us.

The role I enjoy the most is as a Data and AI philosopher providing thought leadership. We are at an exciting time in history to witness and contribute to the mirage and metamorphosis of Data and AI. My explorations find exciting challenges in diversity and Data and AI at the centre of most things we want to achieve. Research is increasingly needed in industry to achieve business success due to the increasing complexity within industry and the world around us. We need to move away from agile for certain tasks to enable complexity to be understood and use systems thinking techniques.  My findings on the future mirage and metamorphosis of Data and AI are a complex interconnected world around Data and AI and mastering that complexity is the key to success.


The data and AI market landscape 2019: The next wave of hybrid emerges
Part I: A Turbulent Year: The 2019 Data & AI Landscape
Part II: Major Trends in the 2019 Data & AI Landscape
Navigating AI hype in search of success, Oliver Pickup (Sunday Times 12 May 2019)
The real big-data problem and why only machine learning can fix it
Big Data is just Data
Maximising the AI opportunity
The Data Ethics Framework principles

Thursday 22 August 2019

Microsoft ML for Apache Spark

Microsoft Research announce a new version Microsoft ML for Apache Spark, an open-source and distributed ML and microservice library. v0.18 brings Vowpal Wabbit on Spark, Speech to Text & more!

Microsoft Machine Learning for Apache Spark (MMLSpark) is an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning. It enables sending streaming data to Power BI.
Website: http://aka.ms/spark Paper: http://aka.ms/spark-paper