Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Wednesday 30 October 2019

Two Great Conferences


Next week will be a really exciting week with 2 great conferences
November 4–8, 2019 | Orlando, FL

Learn innovative ways to build solutions and migrate and manage your infrastructure. Connect with over 25,000 individuals focused on software development, security, architecture, and IT. Explore new hands-on experiences that will help you innovate in areas such as security, cloud, and hybrid infrastructure and development.



PASS Summit
November 5 - 8, 2019 | Seattle, WA

The summit has 3 tracks architecture, data management and analytics. There is interactive training on the latest technologies and spotlights on hot topics such as security, cloud, and AI will be led by the best data minds in technology.






Tuesday 29 October 2019

Introducing SQL Server 2019

SQL Server 2019 is the latest version of  SQL Server. It redefines SQL Server from a traditional relational database system to a data platform for every data scenario from OLTP to DW to now big data and analytics. 

There is a great video on Channel 9 which gives a quick overview of all the new things in SQL Server 2019.



Sunday 20 October 2019

Cloud Migration Strategy





















I had to share this article because of the picture. This article looks at the merits of each migratory path to plan your journey. The options discussed are: 
  • Lift and Shift
  • Evolve - between lift and shift and a full rebuild
  • Go Native - Cloud-native apps are designed for the cloud, so assume that the infrastructure they run on is inherently unreliable, but also controllable. They should be largely self-aware, self-scaling and self-healing. Most of the functionality is simply consumed rather than written into the code.



Friday 18 October 2019

European Spark and AI summit 2019




The keynotes for the European Spark and AI summit 2019 which took place 16 October 2019 in Amsterdam are online to watch if you missed the event.  A few of the topics are



Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems
New Developments in the Open Source Ecosystem
Saving Energy in Homes with a Unified Approach to Data and AI

Simplifying Model Management with MLflow
Scalable AI for Good
Forecasting 'What-if' Scenarios in Retail Using ML-Powered Interactive Tools
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Imaging the Unseen: Taking the First Picture of a Black Hole


Some Annoucements

Databricks Simplifies Machine Learning Model Management At Scale With MLflow Model Registry
Databricks brings its Delta Lake project to the Linux Foundation

A design pattern is emerging called 'Lake House'. This is a pattern where Spark is not just replacing Hadoop and ETL, but also associated with data warehouses, business intelligence and reporting. 

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.



Wednesday 16 October 2019

After Data Relay


Data Relay has finished for the 2019 season. This year I was involved as Head of Marketing and was the Bristol event owner again. I enjoyed being a part of the Data Relay team. It is great to be able to put on an event with free training across the breadth of the Microsoft Data Platform.  

I also presented on “The big data and database management paradigm shift”  The abstract

With the emergence of big data this has created a new paradigm for data and database management.  Structured and unstructured data must now work together to produce actionable insights. This session will share details on how to embrace this new era of database management, to prepare you for SQL Server 2019.

The slides summary 




















References



Wednesday 9 October 2019

Apache Spark Under the Hood Free ebook

Databricks has created an ebook to share excerpts from the book, Spark: The Definitive Guide
In this eBook, it cover:
  • The past, present, and future of Apache Spark.
  • Basic steps to install and run Spark yourself.
  • A summary of Spark's core architecture and concepts.
  • Spark's powerful language APIs and how you can use them.

Monday 7 October 2019

Data Relay starts today

The week has finally arrived and it is Data Relay 2019. The agendas are here:

#Newcastle Mon 7 https://bit.ly/2mktU73 #Leeds Tues 8 https://bit.ly/2kP99jh  #Nottingham Wed 9 https://bit.ly/2kkct5P #Birmingham Thurs 10 https://bit.ly/2lRh6F2  #Bristol Fri 11 https://bit.ly/2kkcwi1

The sessions are covering a raft of exciting topics.


Friday 4 October 2019

Two New Datasets to Improve Natural Language Understanding Models

Google’s PAWS data set helps AI models capture word order and structure. Yuan Zhang, Research Scientist and Yinfei Yang, Software Engineer, Google Research posted: Read more

Word order and syntactic structure have a large impact on sentence meaning — even small perturbations in word order can completely change interpretation. For example, consider the following related sentences:

Flights from New York to Florida.
Flights to Florida from New York.
Flights from Florida to New York.

All three have the same set of words. However, 1 and 2 have the same meaning — known as paraphrase pairs — while 1 and 3 have very different meanings — known as non-paraphrase pairs. The task of identifying whether pairs are paraphrase or not is called paraphrase identification, and this task is important to many real-world natural language understanding (NLU) applications such as question answering Read on