Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Saturday 30 December 2017

Diagrams help explain complex data

Diagrams are useful to help explain qualitative data. There are various diagramming tools that can help with understanding complex systems. These are the main tools that I use

  • Rich Pictures
  • Spray Diagrams
  • Systems Map
  • Influence Diagram
  • Multiple Cause Diagram

The  diagrams above were provided by the Open University and guidelines for constructing such  diagrams are explained.

The Open University guide to using diagrams can be seen in this video.

Wednesday 20 December 2017

Continual Change and Complexity

This year has been an entire year of change for me, that will continue into the new year.  Continual change is the way of the new world. With data and AI being embedded into every realm of technology, we can expect more frequent and smaller changes on a day to day basis. I have enjoyed researching immensely and being able to apply that research to understanding the complexity of real world database problems.

As the holidays approach I wish you all a very Merry Christmas and Happy a New Year.

Thursday 14 December 2017

The DevOps Model

TechNet UK had a live stream of talks back in September and this was a diagram that they shared. I think it is a helpful picture describing the process.

Friday 17 November 2017

Big Data LDN 2017

I attended Big Data London 15-16 Nov 2017 with leading data & analytics experts showcasing their tools to help with delivering data-driven strategy. The conference showcased the fourth industrial revolution report which explains what the UK’s data leaders think about the state of the UK data economy.

A summary of things I found interesting during the two day event are summarized here.

Machine Learning is such a topical discussion point, but it is not that difficult to get started. An area to initially look at is co-occurrence and recommendation.  Co-occurrence helps you find behaviours and you can use that to find recommendations in areas such as textual analysis and intrusion detection.

Machine learning was described as the integration between analytics and operations. The three questions to ask were: what algorithm, what tools and what process. 90% of machine learning success is in data logistics (being able to handle lots of data types), not learning.

The CDO’s playbook was launched. The Chief Data Officer is a rapidly expanding role and this book offers practical advice on what this role is, how it fits into to other c-suite roles and provides actionable tips.

There are many challenges when dealing with citizen data. At the heart of audiences is

- single view of the customer
- deeper engagement
- supported intelligence
- relationship management

The main challenge is data quality and having a high enough quality of data to provide insight.

Citizens want to be data scientists and be able to dive into the data with ease. This self-service model can have challenges. Better governance, data management and operational efficiency are required together with the rise of managed service to remove the complexities of running these services.

The keynote on day 2, machine learning, AI and the future of big data analytics by Dr Amr 
Awadallah, Co-founder of Cloudera, talked about a history of waves.

- wave 1 automation of knowledge transfer
- wave 2 automation of food
- wave 3 automation of discovery
- wave 4 making and moving stuff (Industrial revolution)
- wave 5 automation of processes (IT revolution)
- wave 6 automation of decisions.

We are in wave 6 which is about collecting data and leveraging data to make decisions. It is different from the BI wave where humans made decisions. The new wave is learning how decisions are made and automating them. Things to consider for success are

- build a data driven culture
- develop the right team and skills
- be agile/lean in development
- leverage DevOps for production
- right size data governance 

There were discussions about data narrative and telling a story to the audience. The five steps learnt for better storytelling

- identify the right data
- choose the right visualizations
- calibrate visuals to your message
- remove unnecessary noise
- focus attention on what’s important

Matt Aslett talked on pervasive intelligence: the future of big data, machine learning and IoT, the details of which have been published in a report. He discussed trends and implications of the AI automation spectrum. It will bring about fundamental and wide ranging positive societal implication that will change the way we live, work, play, transact and travel. He mentioned a risk of having a small number of platform oriented companies that control the forces of production for generating value from data. The 4sight report on the future of IT is coming soon and sounds an interesting read.

Deep learning demystified explained why neural networks, that are not new, have only just come to the fore. It was because they were originally thought of as part of a failed experiment. In fact, it was that they did not use enough data. For supervised learning it works well with very large data sets. The key things to think of when considering deep learning are that it
- must have large data, a minimum of 10 million labels of data
- what level of accuracy do you need?
- can something simple work? – start with classical models such as linear models

There is a deep learning institute to learn more. 

The conference was useful and provided a wide range of discussions on high level data topics.

Monday 13 November 2017

A Guide to Complexity of Database Systems

The Phd research I undertook examined the complexity of database systems. A summary of the findings are provided

In the turbulent fast moving field of database systems, complexity is found everywhere. The volume, variety and velocity of data is continually expanding as well as the accessibility and realization that businesses have a wealth of untapped data that can be democratised. Not only this, and changes in new technology, but also with the shift in business markets, organisational changes, knowledge required by operating staff and numerous stakeholders, adds to the complexity. Many of these complexities have been discussed in the Claremont and Beckman Reports (Agrawal et al. 2009; Abadi et al. 2016)

This guide takes a 360 degree view of the situation through a systems thinking lens, providing synthesis between the cross disciplinary fields. To be able to explain what complexity is shapes our understanding of the situation and a basic visualisation of this is shared through the use of a graph. The usage of graphs as visual representation are discussed with the presentation of the graph metrics leading to the CODEX, a blueprint for the management of database systems. The CODEX could enable transformation of the management of database systems so that actionable insight can be achieved.

Friday 10 November 2017

Data Story Telling Booklet and Video

Story telling is an important facet when working with data. Whilst I was working on my research PhD, I thought it was important to share the main stages of my research.  The storyboard is shared as two resources. These are:

A PhD storyboard booklet
A collated storyboard video

More research articles see http://sqltoolkit.co.uk/

Tuesday 7 November 2017

Innovative Designs for Innovative Thinking

These Spheres in Seattle will be home of a botanical garden, waterfalls, a river and tree-house like. They are disruptive, pioneering and futuristic to let Amazonians break free for innovative thinking. What an amazingly cool idea. 

Thursday 2 November 2017

PASS Summit 2017 Day 2 Keynote

I attended the Day 2 Keynote at PASS Summit presented by Rimma Nehme on Globally Distributed Databases Made Simple. This was an amazing presentation. It was presented seamlessly, explaining the technicalities of CosmosDB and how the globally distributed database works from the ground up.

Rimma raised the question, do we need another database? Databases need to meet the data needs for today and the future. Data is global, with large volumes of data being created every 60 seconds, which are continually growing and data is interconnected. The balance is shifting in the type of data and we need to have data globally next to users for processing, meaning the architecture needs to be different. 

CosmosDB was originally call Project Florence and was named as such because it is the place where the renaissance began. It was built in the cloud database for global distribution, with a fully resource governed stack and schema agnostic service. A single system image is used for all globally distributed resources. 

The resource model may have a database account / database that may span clusters and regions. The database is scaled out in terms of containers. It is designed to scale throughput and storage independently. There are two parts to the design. The physical system design is:

The partitioning system design is:

The design is to enable elastically scalable storage, throughput, anywhere, anytime.

Resource governance cannot be an afterthought. The request unit/sec (RU) is the normalized currency.

There are 5 well-defined consistency models in Azure Cosmos DB  with clear trade offs: strong; bounded-stateless; sessions; consistent prefix and eventual.

There is native support for multiple data models with more coming in the future.

The talk continued to cover how indexing works in depth and the key points to remember about Cosmosdb are:

The talk concluded with a great quote “It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change.”

The slides can be downloaded.

Wednesday 1 November 2017

PASS Summit 2017 Day 1 Keynote

I attended PASS Summit 2017 which was my second year of attendance. I enjoyed the conference enormously. It is enjoyable being immersed in data and being with people who are enthusiastic in the field.

The Day 1 Keynote "Microsoft for the Modern Data Estate" was presented by Rohan Kumar. Data is driving transformation. Data, Cloud and AI are the three most disruptive trends of our time. The modern data estate, enables simplicity and common sense. It takes any data from any source, structured or unstructured data and large or small data. The modern data estate provides a seamless infrastructure between on premises, private and public cloud, enabling a hybrid set up that hides the dichotomy of these disparate systems. Seamless flexibly and a choice of engines.

New features in SQL Server 2017

There are many changes to SQL Server 2017. SQL Server 2017 has industry leading performance and security now on Linux and Docker. The key engine changes 
  • Support for graph data and queries
  • Advanced Machine Learning with R and Python
  • Native T-SQL scoring
  • Adaptive Query Processing and Automation Plan Correction

SQL Server 2017 will enable deployment in seconds on Linux and Windows containers and has special pricing for SQL Server on Linux and Red Hat Enterprise Linux.

New Features Azure SQL Database

Azure SQL Database offers intelligent DBaaS, privacy and trust, seamless and compatibility and competitive TCO. There is seamless migration to the cloud with the cloud first approach breading faster innovations. The list of changes presented

Azure Data Factory now provides a managed environment for SQL Server Integration Services (SSIS) packages and easily move your SSIS workloads to cloud.

There was the announcement made for a new tool called, Microsoft SQL Operations Studio, a free lightweight modern data operations tools for SQL everywhere.

These are but some of the changes coming to the products.

Thursday 19 October 2017

Machines that learn to see and move: The future of artificial intelligence

I attended the Institute for Mathematical Innovation (IMI) public lecture by Professor Andrew Blake, Research Director at The Alan Turing Institute on 18 October. Professor  Blake is a pioneer in the development of algorithms that make it possible for computers to behave as seeing machines. Before joining the Institute in 2015, Professor Blake held the position of Microsoft Distinguished Scientist and Laboratory Director at the Microsoft Research Lab in Cambridge, and he has been on the faculty at Oxford University. He is a part of a new startup FiveAI.

The session abstract:

Neural networks have taken the world of computing in general and artificial intelligence (AI) in particular by storm.

But in the future, AI will need to revisit these generative models which are used to make predictions. There are several reasons for this – system robustness, precision issues, transparency, and the high cost of labelling data.

This is particularly true for perceptual AI, needed for autonomous vehicles, where the need for simulators and the need to confront novel situations, will demand further development of generative, probabilistic models. 

He talked about the empirical detector and generative model. At the moment it is the era of deep learning and neural networks, that sit within the empirical detector area. A black box area of big data and optimal predictive power. The generative model is analysis by synthesis and comes with an ‘explanation’, like a model. It starts with a hypothesis, typically probabilistic. Professor Blake believes the generative model will come back as perceptual models need this. This is

  • to simulate labelled data
  • for data fusion - to increase reliability
  • to make detailed interpretations 
  • for online simulation - to explain hard to read situations
This was a very insightful lecture and very interesting to see the mention of analysis by synthesis. 

Monday 16 October 2017

Agilience Authority Index

I came across the Agilience Authority Index placing me in the top 250 for SQL Server.

The Agilience Authority Index shows how influential you are and looks at your twitter profile. Agilience state your influence is more than your audience, your influence is your recognized expertise on a topic. Your profile on agilience.com shows your main topics of influence based on the Agilience Authority Index. 

PhD Thesis

My PhD thesis is now available online.

Holt, Victoria (2017). A Study into Best Practices and Procedures used in the Management of Database Systems. PhD thesis The Open University.

Saturday 14 October 2017

SQL Relay 2017

SQL Relay took place between 9 – 13 October 2017. It was the end of my first year of being on the organising committee which has been great fun. 

The relay begin in Reading, moving to Nottingham, Leeds, Birmingham and ending Bristol. This year I helped out on site at 2 events Reading and Bristol. The event brought 4 tracks, 3 general tracks and a workshop track, to each venue. With only 1 hour in the morning before the event starts to set up, it is all hands on deck to get all the attendees registered and the event starting on time. We were lucky to have so many amazing volunteers who helped during the days and without sponsors and speakers we wouldn’t have been able to run the events. I became the event lead for Bristol and it  was nice to be able to bring the event back to the city this year. We enabled around 1000 people to be trained, enabled the SQL community to grow and for people to learn something new. It is a privilege to be a part of this unique event.  

Friday 6 October 2017

Machina Summit.AI

I attended IPExpo Europe 4-5 October in ExCel in London with the specific attendance at the Machina Summit.AI.

The opening keynote was by Professor Brian Cox OBE on ‘Where IT & Physics Collide’.  The talk interlinked big data, quantum mechanics and quantum computing. The whistle top tour mentioned the Sloan Digital Sky Survey, which are the most detailed three-dimensional maps of the universe; general relativity; history of space and time; the theory of cosmology; and quantum mechanics ending with quantum theory and predicting the distribution of galaxies. This was an amazing talk and gave a glimpse of the interconnected future.

This was followed by Brad Anderson, Corporate Vice President of Microsoft on ‘Business as usual in a digital war zone’. We live in turbulent times with a 300% increase in user account attacks this year, 96% of malware is automated polymorphic which costs business $15 million. Attacks happen in increasing waves and old defences never stand up against these attacks. In this intelligent war you need an intelligent graph. He introduced the Microsoft Azure Active Directory service as the new control plane. There is the need to eliminate false positives, classify email and guarantee data never leaves the browser and be able to use a real time evaluation engine.

A few other talks covered the practice of monitoring with machine data. There are 2 types of monitoring, transitional IT and the new data driven IT. For the latter there is the need to rethink and improve how IT operates using machine learning to be proactive. Organizational silos and increasing quality are things that need to be broken down to be able to address the velocity data in a more agile way to produce actionable insights.

Conrad Wolfram, Strategic Director, Wolfram Research talked about ‘Enterprise computation: the next frontier in AI and data science’ Todays data challenge is about accessibility of data, personalisation of data and providing insightful answers. Data Science is multi paradigm and machine learning does not have all the answers. Computation is required for everyone with smart automation and computational thinking is needed for everyone. Data science needs to be personalised, multifaceted but unified.

The day 2 keynote was given by Stuart Russell, Professor of Electrical Engineering and Computer Science, University California Berkeley on ‘Human-Compatible AI’. He discussed what is coming soon. Basic language understanding with web-scale question answering and intelligent assistants for health, education, finances and life (not chatbots!!). Robots for unstructured tasks (home, construction, agriculture) and new tools for economics, management and scientific research. He discussed the premise that eventually AI systems will make better* decisions than humans. Well *taking into account more information and looking further into the future. He argued that for the case of super intelligent AI, that you can’t switch off the machine and AI will never succeed.

Other sessions discussed the journey of chaos and how everything fails all the time. To address this there is the need to consider that every journey begins with a single step. There is the inevitable question to consider skills versus knowledge and that is practice.

Microsoft talked about their 'AI and Analytics in the Enterprise'. There is now a need to look at more than the rear view mirror, to see what happened. There is a convergence of cloud, data and AI. With that Microsoft have created an AI platform that is fast and agile, with AI built in and enterprise proven for on-premises to edge to create insights. The evolution of the data state takes into account increasing data volumes, new data sources and types and open source languages. There are 3 stages between the heterogeneous sources and providing apps and insights.
  • Ingest – data orchestration and monitoring
  • Store – Data Lake and storages
  • Machine learning – preparations and train ( Hadoop / spark / SQL and ML) then model and serve (on-prem, Cloud, IoT).

In summary the 2 day conference provided great insight into many new technical areas and raised thought provoking questions about the future of data and AI. 

Sunday 1 October 2017

Microsoft for the Modern Data Estate

The Microsoft Ignite session on the modern data estate was full of announcements. All around us, data is driving digital transformation. Modernize with SQL Server 2017 on Linux and Windows and Azure Data Services; deliver modern intelligent applications using technologies like Azure Cosmos DB and Azure Database for PostgreSQL.

The world is changing.  We need to help invest in the future without being tied to the past. AI is a fundamental pillar to leverage that. If businesses invests in data they outperform other companies. 

Data doesn’t need to leave the database for data science to take place.

The cloud first approach breeds faster innovation and SQL Server 2017 is proof of that.

SQL Server 2017 on Linux, Docker , and Windows server
Supports for graph data and queries
Advances Machine Learning with R & Python
Native T-SQL scoring
Adaptive Query processing and Automatic Plan Correction
Vulnerability assessment for GDPR - preview
Intelligent insights into performance – preview
Support for Graph data and queries –GA
Adaptive query processing – GA
Native scoring and support for Azure Machine Learning - GA

SQL Database and Database Migration Service
Migration to the cloud is easy with this new service. 
Azure SQL Database is the intelligent cloud database for app developers. It learns and adapts, scales on the fly, enables multi-tenant SaaS apps, works in your environment, secures and protects. The systems of intelligence on SQL Database is shared.

Globally Distributed Applications

Announcing Azure functions for Azure Cosmos DB to build apps faster with a serverless infrastructure. 

Uncovering insights with big data and advanced analytics

The new Azure Data Factory allows easy modelling of diverse data integration scenarios. You can now with the preview service, easily move your SQL Server Integration Services (SSIS) workloads to cloud. There is also a data movements as a service with 30+ connectors. Azure SQL Data Warehouse has a compute- optimized tier and unlimited columnar storage in preview. The last announcement was the Power BI Report Server.

Saturday 30 September 2017

Microsoft Ignite Vision Keynote

The 3 key technologies that will fundamentally transform our industry are mixed reality, artificial intelligence and quantum computing. 

There is a new computing paradigm on how to support and transform apps that we build, that is AI first. This is changing the culture inside organizations. Microsoft are joining graphs to produce systems of intelligence for sales, field service, customer service, talent and operations. To be one pervasive graph that is extensible. This will change the frontier about simplicity of tasks, unlocking new innovation and creativity, to create the next platform.

The aim is to democratize access to technology and to unlock the unimaginable and solve the impossible, that is the quest we are on.

Tuesday 26 September 2017

SQL Server 2017

Microsoft have announced the general availability of SQL Server 2017 and Machine Learning Services. SQL Server 2017 is available for purchase on October 2nd.

SQL Server 2017 is a milestone representing the first version of SQL Server to run on Windows Server, Linux and Docker. It has already has been pulled on Docker X 2 million times since November.