Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Saturday, 17 February 2018

SQLBits 2018 The Magic of Data



It is nearly time for the 2018 edition of SQLBits. The four day conference is at Olympia, London. It is the leading data professional conference in Europe. It offers the op3portunity to learn, network, develop and share your data knowledge. There is a great shift in the provision of data management allowing operational and predictive insight and embedded AI on any platform. This conference offers attendees the chance to experience new paradigms and to access an array data related events.  
  •          4 days of world class training
  •         Over 170 specialist sessions
  •          More than 10 hours of best practice sharing with your peers
  •          Networking events with the Microsoft Product Group
  •          Introductions to software vendors and consultants
  •          The infamous SQLBits Party
The conference has grown from strength to strength over the years and it is privilege to be a conference helper for an 8th year.

Tuesday, 13 February 2018

Introduction to Splunk for operational intelligence


Machine data is everywhere. Machine data is digital information created by computers, mobile phones, embedded systems, sensors and other networked devices. The data types:




Machine data can intersect with and improve human lives. A summary is in this video




Splunk, a software platform, has the capability to leverage machine data for data management and analytics.  It can be used for

  • Data driven decision making
  • Alerts for network security threats
  • Report on system failures
  • Analyse and improve functionality

It enables performance analysis, dashboard creation, monitoring, troubleshooting and investigation of the real-time data collected. A Edureka learning video showed the Splunk components.  



















and an overview of Splunk.





















The very useful Splunk free fundamentals 1 course is a self paced learning course covering search, navigation, use of fields, statistics, reports, dashboards and alerts.

Saturday, 27 January 2018

Data Scientist Skills

There are an avalanche of skills required to become a data scientist. I came across this useful diagram.


The hierarchy of needs for data science could help you be more effective with AI and machine learning.




Wednesday, 17 January 2018

Field Guide to Data Science

I read this really useful  guide about data science. The Field Guide to Data Science was created to help organizations of all types and missions understand how to make use of data as a resource

More details about understanding the DNA of data can be found here.



Wednesday, 20 December 2017

Continual Change and Complexity

This year has been an entire year of change for me, that will continue into the new year.  Continual change is the way of the new world. With data and AI being embedded into every realm of technology, we can expect more frequent and smaller changes on a day to day basis. I have enjoyed researching immensely and being able to apply that research to understanding the complexity of real world database problems.

As the holidays approach I wish you all a very Merry Christmas and Happy a New Year.

Friday, 17 November 2017

Big Data LDN 2017


















I attended Big Data London 15-16 Nov 2017 with leading data & analytics experts showcasing their tools to help with delivering data-driven strategy. The conference showcased the fourth industrial revolution report which explains what the UK’s data leaders think about the state of the UK data economy.

A summary of things I found interesting during the two day event are summarized here.

Machine Learning is such a topical discussion point, but it is not that difficult to get started. An area to initially look at is co-occurrence and recommendation.  Co-occurrence helps you find behaviours and you can use that to find recommendations in areas such as textual analysis and intrusion detection.


Machine learning was described as the integration between analytics and operations. The three questions to ask were: what algorithm, what tools and what process. 90% of machine learning success is in data logistics (being able to handle lots of data types), not learning.

The CDO’s playbook was launched. The Chief Data Officer is a rapidly expanding role and this book offers practical advice on what this role is, how it fits into to other c-suite roles and provides actionable tips.

There are many challenges when dealing with citizen data. At the heart of audiences is

- single view of the customer
- deeper engagement
- supported intelligence
- relationship management

The main challenge is data quality and having a high enough quality of data to provide insight.

Citizens want to be data scientists and be able to dive into the data with ease. This self-service model can have challenges. Better governance, data management and operational efficiency are required together with the rise of managed service to remove the complexities of running these services.

The keynote on day 2, machine learning, AI and the future of big data analytics by Dr Amr 
Awadallah, Co-founder of Cloudera, talked about a history of waves.

- wave 1 automation of knowledge transfer
- wave 2 automation of food
- wave 3 automation of discovery
- wave 4 making and moving stuff (Industrial revolution)
- wave 5 automation of processes (IT revolution)
- wave 6 automation of decisions.

We are in wave 6 which is about collecting data and leveraging data to make decisions. It is different from the BI wave where humans made decisions. The new wave is learning how decisions are made and automating them. Things to consider for success are

- build a data driven culture
- develop the right team and skills
- be agile/lean in development
- leverage DevOps for production
- right size data governance 

There were discussions about data narrative and telling a story to the audience. The five steps learnt for better storytelling

- identify the right data
- choose the right visualizations
- calibrate visuals to your message
- remove unnecessary noise
- focus attention on what’s important

Matt Aslett talked on pervasive intelligence: the future of big data, machine learning and IoT, the details of which have been published in a report. He discussed trends and implications of the AI automation spectrum. It will bring about fundamental and wide ranging positive societal implication that will change the way we live, work, play, transact and travel. He mentioned a risk of having a small number of platform oriented companies that control the forces of production for generating value from data. The 4sight report on the future of IT is coming soon and sounds an interesting read.

Deep learning demystified explained why neural networks, that are not new, have only just come to the fore. It was because they were originally thought of as part of a failed experiment. In fact, it was that they did not use enough data. For supervised learning it works well with very large data sets. The key things to think of when considering deep learning are that it
- must have large data, a minimum of 10 million labels of data
- what level of accuracy do you need?
- can something simple work? – start with classical models such as linear models

There is a deep learning institute to learn more. 

The conference was useful and provided a wide range of discussions on high level data topics.

Monday, 13 November 2017

A Guide to Complexity of Database Systems


The Phd research I undertook examined the complexity of database systems. A summary of the findings are provided

In the turbulent fast moving field of database systems, complexity is found everywhere. The volume, variety and velocity of data is continually expanding as well as the accessibility and realization that businesses have a wealth of untapped data that can be democratised. Not only this, and changes in new technology, but also with the shift in business markets, organisational changes, knowledge required by operating staff and numerous stakeholders, adds to the complexity. Many of these complexities have been discussed in the Claremont and Beckman Reports (Agrawal et al. 2009; Abadi et al. 2016)

This guide takes a 360 degree view of the situation through a systems thinking lens, providing synthesis between the cross disciplinary fields. To be able to explain what complexity is shapes our understanding of the situation and a basic visualisation of this is shared through the use of a graph. The usage of graphs as visual representation are discussed with the presentation of the graph metrics leading to the CODEX, a blueprint for the management of database systems. The CODEX could enable transformation of the management of database systems so that actionable insight can be achieved.