The Fourth Industrial Revolution Report – Download for Free
Keynote
The day 2 keynote was given by Michael Stonebraker, Turing Prize winner, IEEE John von Neumann Medal Holder, Professor at MIT, Co-founder of Tamr entitled Big Data, Disruption and the 800 pound gorilla in the corner.
A few vignettes were mentioned Hamiltons, Dewitts and Amadeus. Hadoop (meaning map reduce) was started to be used in 2010 and Google stopped using it in 2011. Hadoop now means a HDFS file system. Cloudera's big problem is that no one wants map reduce. Map reduce is not used for anything.
The data warehouse is yesterdays problem. BI is simple SQL. Data Science has complex problems and it is a different skill set. It is based
on deep learning, machine learning and linear algebra, nothing to do with SQL. Deep learning
is all the rage but you need vast amounts of training data. It is not possible to
explain why the black box gives certain recommendations so it is not good when data providence is required.
Big velocity is a big problem over time. Pattern matching
and CEP (Complex Event Processing)
like Storm is not competitive. Don’t run Oracle but instead run Mongodb, Cassandra or Redis. NoSQL means no standards and no ACID. ACID is a good
idea. NoSQL means you always give up something as per CAP theorem. Declarative languages are a great idea.
Data discovery is a big problem. You spend 90% of your time
finding and cleaning data. Then 10% finding and cleaning the errors. Very little time is spent doing data integration. It is a data integration challenge.
Graphs
Jim Webber from Neo4J gave an insightful talk about how useful graphs are to solve problems and predict outcomes. There was some great examples of how to use graphs. He talked about triad closure and strong and week ties. Also mentioning a couple of papers to read
Effects of organizational support on organizationalcommitment Fakhraei M, Imami R, Manuchehri S (2015)
Semi-Supervised Classification with Graph ConvolutionalNetworks Thomas N. Kipf , Max Welling (2017)
It is important to have semantic domain knowledge for inference and understanding in graphs as graphs depend on the context. Graphml convolutional network graph will be the data
structure for AI.
The Joy of Data
The closing session of the event was delivered by Dr Hannah Fry, Associate Professor in the mathematics of Cities – UCL. This was an amazing session exploring what visualization and insights can be achieved from understanding the data.
She started the talk with the strange Wikipedia phenomenon that
all routes lead to philosophy. So by clicking the first proper link on a
page you will eventually end up on the philosophy page.
There are 2 parallel universes where people click the link and the mathematical universe. Data is the bridge.
She showed how data could be used to investigate why the bicycle transport scheme in London was seeing all the bikes ending up in the wrong place. Vans have to go round moving bikes into the right places during the day. This was the result of people liking to cycle down the hills but not up.
Another example showed that Islington station was a bottleneck which caused a cascading problem because it has a lack of transport routes from there. There were many other interesting examples and how gossip can pay by using network science to track the problem down.
Big Data LDN had some amazing sessions and insightful content. Big Data LDN will be back next year 13-14 Nov 2019.
There are 2 parallel universes where people click the link and the mathematical universe. Data is the bridge.
She showed how data could be used to investigate why the bicycle transport scheme in London was seeing all the bikes ending up in the wrong place. Vans have to go round moving bikes into the right places during the day. This was the result of people liking to cycle down the hills but not up.
Another example showed that Islington station was a bottleneck which caused a cascading problem because it has a lack of transport routes from there. There were many other interesting examples and how gossip can pay by using network science to track the problem down.
Big Data LDN had some amazing sessions and insightful content. Big Data LDN will be back next year 13-14 Nov 2019.