Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday 17 November 2017

Big Data LDN 2017

I attended Big Data London 15-16 Nov 2017 with leading data & analytics experts showcasing their tools to help with delivering data-driven strategy. The conference showcased the fourth industrial revolution report which explains what the UK’s data leaders think about the state of the UK data economy.

A summary of things I found interesting during the two day event are summarized here.

Machine Learning is such a topical discussion point, but it is not that difficult to get started. An area to initially look at is co-occurrence and recommendation.  Co-occurrence helps you find behaviours and you can use that to find recommendations in areas such as textual analysis and intrusion detection.

Machine learning was described as the integration between analytics and operations. The three questions to ask were: what algorithm, what tools and what process. 90% of machine learning success is in data logistics (being able to handle lots of data types), not learning.

The CDO’s playbook was launched. The Chief Data Officer is a rapidly expanding role and this book offers practical advice on what this role is, how it fits into to other c-suite roles and provides actionable tips.

There are many challenges when dealing with citizen data. At the heart of audiences is

- single view of the customer
- deeper engagement
- supported intelligence
- relationship management

The main challenge is data quality and having a high enough quality of data to provide insight.

Citizens want to be data scientists and be able to dive into the data with ease. This self-service model can have challenges. Better governance, data management and operational efficiency are required together with the rise of managed service to remove the complexities of running these services.

The keynote on day 2, machine learning, AI and the future of big data analytics by Dr Amr 
Awadallah, Co-founder of Cloudera, talked about a history of waves.

- wave 1 automation of knowledge transfer
- wave 2 automation of food
- wave 3 automation of discovery
- wave 4 making and moving stuff (Industrial revolution)
- wave 5 automation of processes (IT revolution)
- wave 6 automation of decisions.

We are in wave 6 which is about collecting data and leveraging data to make decisions. It is different from the BI wave where humans made decisions. The new wave is learning how decisions are made and automating them. Things to consider for success are

- build a data driven culture
- develop the right team and skills
- be agile/lean in development
- leverage DevOps for production
- right size data governance 

There were discussions about data narrative and telling a story to the audience. The five steps learnt for better storytelling

- identify the right data
- choose the right visualizations
- calibrate visuals to your message
- remove unnecessary noise
- focus attention on what’s important

Matt Aslett talked on pervasive intelligence: the future of big data, machine learning and IoT, the details of which have been published in a report. He discussed trends and implications of the AI automation spectrum. It will bring about fundamental and wide ranging positive societal implication that will change the way we live, work, play, transact and travel. He mentioned a risk of having a small number of platform oriented companies that control the forces of production for generating value from data. The 4sight report on the future of IT is coming soon and sounds an interesting read.

Deep learning demystified explained why neural networks, that are not new, have only just come to the fore. It was because they were originally thought of as part of a failed experiment. In fact, it was that they did not use enough data. For supervised learning it works well with very large data sets. The key things to think of when considering deep learning are that it
- must have large data, a minimum of 10 million labels of data
- what level of accuracy do you need?
- can something simple work? – start with classical models such as linear models

There is a deep learning institute to learn more. 

The conference was useful and provided a wide range of discussions on high level data topics.

Monday 13 November 2017

A Guide to Complexity of Database Systems

The Phd research I undertook examined the complexity of database systems. A summary of the findings are provided

In the turbulent fast moving field of database systems, complexity is found everywhere. The volume, variety and velocity of data is continually expanding as well as the accessibility and realization that businesses have a wealth of untapped data that can be democratised. Not only this, and changes in new technology, but also with the shift in business markets, organisational changes, knowledge required by operating staff and numerous stakeholders, adds to the complexity. Many of these complexities have been discussed in the Claremont and Beckman Reports (Agrawal et al. 2009; Abadi et al. 2016)

This guide takes a 360 degree view of the situation through a systems thinking lens, providing synthesis between the cross disciplinary fields. To be able to explain what complexity is shapes our understanding of the situation and a basic visualisation of this is shared through the use of a graph. The usage of graphs as visual representation are discussed with the presentation of the graph metrics leading to the CODEX, a blueprint for the management of database systems. The CODEX could enable transformation of the management of database systems so that actionable insight can be achieved.

Friday 10 November 2017

Data Story Telling Booklet and Video

Story telling is an important facet when working with data. Whilst I was working on my research PhD, I thought it was important to share the main stages of my research.  The storyboard is shared as two resources. These are:

A PhD storyboard booklet
A collated storyboard video

More research articles see http://sqltoolkit.co.uk/

Tuesday 7 November 2017

Innovative Designs for Innovative Thinking

These Spheres in Seattle will be home of a botanical garden, waterfalls, a river and tree-house like. They are disruptive, pioneering and futuristic to let Amazonians break free for innovative thinking. What an amazingly cool idea. 

Thursday 2 November 2017

PASS Summit 2017 Day 2 Keynote

I attended the Day 2 Keynote at PASS Summit presented by Rimma Nehme on Globally Distributed Databases Made Simple. This was an amazing presentation. It was presented seamlessly, explaining the technicalities of CosmosDB and how the globally distributed database works from the ground up.

Rimma raised the question, do we need another database? Databases need to meet the data needs for today and the future. Data is global, with large volumes of data being created every 60 seconds, which are continually growing and data is interconnected. The balance is shifting in the type of data and we need to have data globally next to users for processing, meaning the architecture needs to be different. 

CosmosDB was originally call Project Florence and was named as such because it is the place where the renaissance began. It was built in the cloud database for global distribution, with a fully resource governed stack and schema agnostic service. A single system image is used for all globally distributed resources. 

The resource model may have a database account / database that may span clusters and regions. The database is scaled out in terms of containers. It is designed to scale throughput and storage independently. There are two parts to the design. The physical system design is:

The partitioning system design is:

The design is to enable elastically scalable storage, throughput, anywhere, anytime.

Resource governance cannot be an afterthought. The request unit/sec (RU) is the normalized currency.

There are 5 well-defined consistency models in Azure Cosmos DB  with clear trade offs: strong; bounded-stateless; sessions; consistent prefix and eventual.

There is native support for multiple data models with more coming in the future.

The talk continued to cover how indexing works in depth and the key points to remember about Cosmosdb are:

The talk concluded with a great quote “It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change.”

The slides can be downloaded.

Wednesday 1 November 2017

PASS Summit 2017 Day 1 Keynote

I attended PASS Summit 2017 which was my second year of attendance. I enjoyed the conference enormously. It is enjoyable being immersed in data and being with people who are enthusiastic in the field.

The Day 1 Keynote "Microsoft for the Modern Data Estate" was presented by Rohan Kumar. Data is driving transformation. Data, Cloud and AI are the three most disruptive trends of our time. The modern data estate, enables simplicity and common sense. It takes any data from any source, structured or unstructured data and large or small data. The modern data estate provides a seamless infrastructure between on premises, private and public cloud, enabling a hybrid set up that hides the dichotomy of these disparate systems. Seamless flexibly and a choice of engines.

New features in SQL Server 2017

There are many changes to SQL Server 2017. SQL Server 2017 has industry leading performance and security now on Linux and Docker. The key engine changes 
  • Support for graph data and queries
  • Advanced Machine Learning with R and Python
  • Native T-SQL scoring
  • Adaptive Query Processing and Automation Plan Correction

SQL Server 2017 will enable deployment in seconds on Linux and Windows containers and has special pricing for SQL Server on Linux and Red Hat Enterprise Linux.

New Features Azure SQL Database

Azure SQL Database offers intelligent DBaaS, privacy and trust, seamless and compatibility and competitive TCO. There is seamless migration to the cloud with the cloud first approach breading faster innovations. The list of changes presented

Azure Data Factory now provides a managed environment for SQL Server Integration Services (SSIS) packages and easily move your SSIS workloads to cloud.

There was the announcement made for a new tool called, Microsoft SQL Operations Studio, a free lightweight modern data operations tools for SQL everywhere.

These are but some of the changes coming to the products.