Dr Victoria Holt: life, the universe and everything: 2019

Microsoft Learn enables you to gain hands on learning for different products. Azure HDInsight is one of those such courses. The course entitled Building Open Source Software (OSS) Analytical Solutions with Azure HDInsight has various modules:

Introduction to the Open source Analytics Offering
Choose the correct HDInsight Configuration to build open source analytical solutions
Creating and configuring a HDInsight cluster
Perform Zero ETL analytics with HDInsight Interactive Query

There is such a wealth of information on the site that is worth exploring including learning paths, certification and docs. It is great to be able to learn at your own pace.

Saturday, 16 November 2019

The Tree of Learning is now fully grown

As part of the last 50 years of life changing education the Tree of Learning is now fully grown and installed in the Betty Boothroyd Library on Walton Hall Campus in Milton Keynes. It reflects on the last 50 years of life-changing education. As an alumni I supported this great cause to help provide potential students the opportunity to access higher education by supporting the 50th Anniversary Scholarships Fund. The Fund aims to offer free University education to 50 carers. Each donor could choose to be recognized and have a shield personalized with their name and a year that is significant to their OU journey. Over 7200 people donated and choose to have a shield like myself. The Tree is 22-foot (6.7m). The Tree is organised with the earliest dates, the 1970s at the bottom moving through the decades to the most recent years at the top of the Tree.

Tuesday, 12 November 2019

The Fourth Industrial Revolution Report 2019

The fourth industrial revolution report 2019 is independent research, commissioned by Big Data LDN and sponsored by Cloudera, surveyed 500 of the UK’s most influential data leaders.

Key Fourth Industrial Revolution Report 2019 findings include:

The human barriers to data-driven culture - The majority (86%) of UK data experts feel a lack of enthusiasm and support prevents creating a data-driven culture.
Lack of data visualisation holds UK organisations back - 45% of UK data leaders see the lack of data visualisation skills as the biggest barrier to business requirements.
Data leaders investing in humans to cross the skills chasm - Over half of UK organisations bridge the business (59%) and technology (57%) skills gap by upskilling employees.
AI and ML slow down - Despite AI and ML being considered as the technology of the moment, this year’s respondents have cited a 21% decrease in its usage.
GDPR still top concern - GDPR is still the dominating regulation for many (46%) UK data experts - its power over UK organisations’ data governance programs grows stronger each year.

Download the report here

Friday, 8 November 2019

Big Data LDN 2019

Register for free for The UK's largest data & analytics conference and exhibition https://bigdataldn.com/register

Plan your two days at Big Data LDN. Create your own personalised itinerary, search speaker & exhibitor profiles and network with other delegates. https://guidebook.com/g/bigdataldn19/

Wednesday, 6 November 2019

Book of News Ignite 2019

This is the second edition of the Microsoft Ignite Book of News. It is a guide to all the announcements that Microsoft are making. The book makes its easy to navigate through all the latest information and the vast number of exciting new products and features.

Tuesday, 5 November 2019

Microsoft MVP Wall 2019

Incredibly honored to have my name listed on the MVP wall at Microsoft Ignite.

Kubernetes Learning Path v2.0

A guide for anyone interested in learning more about Kubernetes in just 50 days

Monday, 4 November 2019

Day 1 Microsoft Ignite 2019

Microsoft Ignite day 1 shared many exciting announcements. three of them are

SQL Server 2019 is now generally available.

SQL Server 2019 brings enhancements in the core SQL engine, offers a scale-up and scale-out system with built in support for Big Data (Apache Spark, Data Lake), state of the art data virtualization technology, and with built-in machine learning capabilities.
Get the free e-book on SQL Server 2019.

Simply unmatched, truly limitless: Azure Synapse Analytics

Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives the freedom to query data , using either serverless on-demand or provisioned resources at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.

Azure services run anywhere with new hybrid capabilities: Azure Arc

Azure Arc is a set of technologies that unlocks new hybrid scenarios for customers by bringing Azure services and management to any infrastructure. Azure Arc is available in preview.

HDFS tiering in SQL Server Big Data Clusters

SQL Server Big data clusters has its own local HDFS built-in data lake to enable the storing of unstructured data and high volume data. This data virtualization capability has a feature called HDFS tiering. It is a major new contribution to the Apache Hadoop project.

With HDFS tiering you can access other data lakes by mounting the remote HDFS/S3 compatible data source to your local HDFS data lake. Access is seamlessly available from SQL Server or Apache Spark. Currently you can mount the following storage: Azure Data Lake Storage Gen2, AWS S3, Isilon, StorageGRID and Flashblase.

Wednesday, 30 October 2019

Two Great Conferences

Next week will be a really exciting week with 2 great conferences

Microsoft Ignite

November 4–8, 2019 | Orlando, FL

Learn innovative ways to build solutions and migrate and manage your infrastructure. Connect with over 25,000 individuals focused on software development, security, architecture, and IT. Explore new hands-on experiences that will help you innovate in areas such as security, cloud, and hybrid infrastructure and development.

PASS Summit
November 5 - 8, 2019 | Seattle, WA

The summit has 3 tracks architecture, data management and analytics. There is interactive training on the latest technologies and spotlights on hot topics such as security, cloud, and AI will be led by the best data minds in technology.

Tuesday, 29 October 2019

Introducing SQL Server 2019

SQL Server 2019 is the latest version of SQL Server. It redefines SQL Server from a traditional relational database system to a data platform for every data scenario from OLTP to DW to now big data and analytics.

There is a great video on Channel 9 which gives a quick overview of all the new things in SQL Server 2019.

Sunday, 20 October 2019

Cloud Migration Strategy

I had to share this article because of the picture. This article looks at the merits of each migratory path to plan your journey. The options discussed are:

Lift and Shift
Evolve - between lift and shift and a full rebuild
Go Native - Cloud-native apps are designed for the cloud, so assume that the infrastructure they run on is inherently unreliable, but also controllable. They should be largely self-aware, self-scaling and self-healing. Most of the functionality is simply consumed rather than written into the code.

Friday, 18 October 2019

European Spark and AI summit 2019

The keynotes for the European Spark and AI summit 2019 which took place 16 October 2019 in Amsterdam are online to watch if you missed the event. A few of the topics are

Day 1 videos

Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems

New Developments in the Open Source Ecosystem

Saving Energy in Homes with a Unified Approach to Data and AI

Day 2 videos

Simplifying Model Management with MLflow

Scalable AI for Good

Forecasting 'What-if' Scenarios in Retail Using ML-Powered Interactive Tools

Democratizing Machine Learning: Perspective from a scikit-learn Creator

Imaging the Unseen: Taking the First Picture of a Black Hole

Some Annoucements

Databricks Simplifies Machine Learning Model Management At Scale With MLflow Model Registry
Databricks brings its Delta Lake project to the Linux Foundation

A design pattern is emerging called 'Lake House'. This is a pattern where Spark is not just replacing Hadoop and ETL, but also associated with data warehouses, business intelligence and reporting.

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Wednesday, 16 October 2019

After Data Relay

Data Relay has finished for the 2019 season. This year I was involved as Head of Marketing and was the Bristol event owner again. I enjoyed being a part of the Data Relay team. It is great to be able to put on an event with free training across the breadth of the Microsoft Data Platform.

I also presented on “The big data and database management paradigm shift” The abstract

With the emergence of big data this has created a new paradigm for data and database management. Structured and unstructured data must now work together to produce actionable insights. This session will share details on how to embrace this new era of database management, to prepare you for SQL Server 2019.

The slides summary

References

SQL Server on virtual machines

Azure SQL Database

Azure SQL Database managed instance

SQL Data Warehouse

Azure Cosmos DB

https://www.ibmbigdatahub.com/infographic/four-vs-big-data

Apache Hadoop

https://hadoop.apache.org/ozone/

https://hadoop.apache.org/submarine/

https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html

https://dama.org/content/what-data-governance

https://azure.microsoft.com/en-us/services/open-datasets/

https://github.com/cvdfoundation/google-landmark

https://data.gov.uk/

Microsoft Maximising AI Opportunity Report

Accelerating competitive advantage with AI

SQL Server 2019

SQL Server 2019 Ground to Cloud course

ODI The data ethics canvas

Wednesday, 9 October 2019

Apache Spark Under the Hood Free ebook

Databricks has created an ebook to share excerpts from the book, Spark: The Definitive Guide.

In this eBook, it cover:

The past, present, and future of Apache Spark.
Basic steps to install and run Spark yourself.
A summary of Spark's core architecture and concepts.
Spark's powerful language APIs and how you can use them.

Monday, 7 October 2019

Data Relay starts today

The week has finally arrived and it is Data Relay 2019. The agendas are here:

#Newcastle Mon 7 https://bit.ly/2mktU73 #Leeds Tues 8 https://bit.ly/2kP99jh #Nottingham Wed 9 https://bit.ly/2kkct5P #Birmingham Thurs 10 https://bit.ly/2lRh6F2 #Bristol Fri 11 https://bit.ly/2kkcwi1

The sessions are covering a raft of exciting topics.

Friday, 4 October 2019

Two New Datasets to Improve Natural Language Understanding Models

Google’s PAWS data set helps AI models capture word order and structure. Yuan Zhang, Research Scientist and Yinfei Yang, Software Engineer, Google Research posted: Read more

Word order and syntactic structure have a large impact on sentence meaning — even small perturbations in word order can completely change interpretation. For example, consider the following related sentences:

Flights from New York to Florida.
Flights to Florida from New York.
Flights from Florida to New York.

All three have the same set of words. However, 1 and 2 have the same meaning — known as paraphrase pairs — while 1 and 3 have very different meanings — known as non-paraphrase pairs. The task of identifying whether pairs are paraphrase or not is called paraphrase identification, and this task is important to many real-world natural language understanding (NLU) applications such as question answering Read on

Monday, 30 September 2019

Accelerating Competitive advantage with AI

Microsoft have released a new report Accelerating Competitive advantage with AI. The report highlights that the UK is well placed to succeed. To download the report https://aka.ms/AcceleratingAI

The initial report also worth reading was entitled Maximizing the AI opportunity: How to harness the potential of AI effectively and ethically. To download that report https://aka.ms/UKAIreport

The anatomy of an AI enabled organisation is explained in the diagram below.

Thursday, 19 September 2019

Data Relay 2019

Data Relay is only a few weeks away. The agendas for the events are:

The Newcastle agenda https://bit.ly/2mktU73 Register now for the event Mon 7 http://bit.ly/datarelaynew

The Leeds agenda https://bit.ly/2kP99jh Register now for the event Tues 8 http://bit.ly/datarelayleeds

The Nottingham agenda https://bit.ly/2kkct5P Register now for the event Wed 9 http://bit.ly/datarelaynott

The Birmingham agenda https://bit.ly/2lRh6F2 Register now for the event Thurs 10 http://bit.ly/datarelaybham

The Bristol agenda https://bit.ly/2kkcwi1 Register now for the event Fri 11 http://bit.ly/datarelaybsl

Saturday, 7 September 2019

The big data and database management paradigm shift

I am really pleased to be speaking at Data Relay this year in Bristol on Friday 11 October. The agenda is here . The abstract for my session is below

Thursday, 5 September 2019

The Mirage and Metamorphosis of Data and AI

I have just taken a break to rejuvenate my creative juices. It was a time to reflect and innovate. We are often so busy in our day to day lives we don't stop and reflect. I spent my time reading and catching up on bleeding edge technology. I am always fascinated to see what is coming next, what problems researchers are trying to address and how Data and AI could be utilised to benefit industry and the world around us.

The role I enjoy the most is as a Data and AI philosopher providing thought leadership. We are at an exciting time in history to witness and contribute to the mirage and metamorphosis of Data and AI. My explorations find exciting challenges in diversity and Data and AI at the centre of most things we want to achieve. Research is increasingly needed in industry to achieve business success due to the increasing complexity within industry and the world around us. We need to move away from agile for certain tasks to enable complexity to be understood and use systems thinking techniques. My findings on the future mirage and metamorphosis of Data and AI are a complex interconnected world around Data and AI and mastering that complexity is the key to success.

References
The data and AI market landscape 2019: The next wave of hybrid emerges
https://www.zdnet.com/article/the-data-and-ai-market-landscape-2019-the-next-wave-of-hybrid-emerges/
Part I: A Turbulent Year: The 2019 Data & AI Landscape
https://mattturck.com/data2019/
Part II: Major Trends in the 2019 Data & AI Landscape
https://mattturck.com/2019trends/
Navigating AI hype in search of success, Oliver Pickup (Sunday Times 12 May 2019)
The real big-data problem and why only machine learning can fix it
https://siliconangle.com/2019/08/09/real-big-data-problem-machine-learning-can-fix-mitcdoiq-startupoftheweek/
Big Data is just Data
https://buckwoody.wordpress.com/2019/08/26/big-data-is-just-data/
Maximising the AI opportunity
https://info.microsoft.com/rs/157-GQE-382/images/UK-DIGTRNS-CNTNT-content-MGC0003240.pdf
The Data Ethics Framework principles
https://www.gov.uk/government/publications/data-ethics-framework/data-ethics-framework

Thursday, 22 August 2019

Microsoft ML for Apache Spark

Microsoft Research announce a new version Microsoft ML for Apache Spark, an open-source and distributed ML and microservice library. v0.18 brings Vowpal Wabbit on Spark, Speech to Text & more!

Microsoft Machine Learning for Apache Spark (MMLSpark) is an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning. It enables sending streaming data to Power BI.
Website: http://aka.ms/spark Paper: http://aka.ms/spark-paper

Welcome

Tuesday, 24 December 2019

Thursday, 19 December 2019

Tuesday 31 March 2020

Wednesday 1 April 2020

Sunday, 1 December 2019