Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday 17 November 2017

Big Data LDN 2017

I attended Big Data London 15-16 Nov 2017 with leading data & analytics experts showcasing their tools to help with delivering data-driven strategy. The conference showcased the fourth industrial revolution report which explains what the UK’s data leaders think about the state of the UK data economy.

A summary of things I found interesting during the two day event are summarized here.

Machine Learning is such a topical discussion point, but it is not that difficult to get started. An area to initially look at is co-occurrence and recommendation.  Co-occurrence helps you find behaviours and you can use that to find recommendations in areas such as textual analysis and intrusion detection.

Machine learning was described as the integration between analytics and operations. The three questions to ask were: what algorithm, what tools and what process. 90% of machine learning success is in data logistics (being able to handle lots of data types), not learning.

The CDO’s playbook was launched. The Chief Data Officer is a rapidly expanding role and this book offers practical advice on what this role is, how it fits into to other c-suite roles and provides actionable tips.

There are many challenges when dealing with citizen data. At the heart of audiences is

- single view of the customer
- deeper engagement
- supported intelligence
- relationship management

The main challenge is data quality and having a high enough quality of data to provide insight.

Citizens want to be data scientists and be able to dive into the data with ease. This self-service model can have challenges. Better governance, data management and operational efficiency are required together with the rise of managed service to remove the complexities of running these services.

The keynote on day 2, machine learning, AI and the future of big data analytics by Dr Amr 
Awadallah, Co-founder of Cloudera, talked about a history of waves.

- wave 1 automation of knowledge transfer
- wave 2 automation of food
- wave 3 automation of discovery
- wave 4 making and moving stuff (Industrial revolution)
- wave 5 automation of processes (IT revolution)
- wave 6 automation of decisions.

We are in wave 6 which is about collecting data and leveraging data to make decisions. It is different from the BI wave where humans made decisions. The new wave is learning how decisions are made and automating them. Things to consider for success are

- build a data driven culture
- develop the right team and skills
- be agile/lean in development
- leverage DevOps for production
- right size data governance 

There were discussions about data narrative and telling a story to the audience. The five steps learnt for better storytelling

- identify the right data
- choose the right visualizations
- calibrate visuals to your message
- remove unnecessary noise
- focus attention on what’s important

Matt Aslett talked on pervasive intelligence: the future of big data, machine learning and IoT, the details of which have been published in a report. He discussed trends and implications of the AI automation spectrum. It will bring about fundamental and wide ranging positive societal implication that will change the way we live, work, play, transact and travel. He mentioned a risk of having a small number of platform oriented companies that control the forces of production for generating value from data. The 4sight report on the future of IT is coming soon and sounds an interesting read.

Deep learning demystified explained why neural networks, that are not new, have only just come to the fore. It was because they were originally thought of as part of a failed experiment. In fact, it was that they did not use enough data. For supervised learning it works well with very large data sets. The key things to think of when considering deep learning are that it
- must have large data, a minimum of 10 million labels of data
- what level of accuracy do you need?
- can something simple work? – start with classical models such as linear models

There is a deep learning institute to learn more. 

The conference was useful and provided a wide range of discussions on high level data topics.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.