Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Wednesday, 6 November 2019

Book of News Ignite 2019

This is the second edition of the Microsoft Ignite Book of News. It is a guide to all the announcements that Microsoft are making.  The book makes its easy to navigate through all the latest information and the vast number of exciting new products and features.


Monday, 4 November 2019

Day 1 Microsoft Ignite 2019

Microsoft Ignite day 1 shared many exciting announcements. three of them are

SQL Server 2019 is now generally available.
SQL Server 2019 brings enhancements in the core SQL engine, offers a scale-up and scale-out system with built in support for Big Data (Apache Spark, Data Lake), state of the art data virtualization technology, and with built-in machine learning capabilities.
Get the free e-book on SQL Server 2019.

Simply unmatched, truly limitless:  Azure Synapse Analytics


Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives the freedom to query data , using either serverless on-demand or provisioned resources at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs.

Azure services run anywhere with new hybrid capabilities: Azure Arc

Azure Arc is a set of technologies that unlocks new hybrid scenarios for customers by bringing Azure services and management to any infrastructure. Azure Arc is available in preview.

HDFS tiering in SQL Server Big Data Clusters


SQL Server Big data clusters has its own local HDFS built-in data lake to enable the storing of unstructured data and high volume data.  This data virtualization capability has a feature called HDFS tiering. It is a major new contribution to the Apache Hadoop project. 

With HDFS tiering you can access other data lakes by mounting the remote HDFS/S3 compatible data source to your local HDFS data lake. Access is seamlessly available from SQL Server or Apache Spark. Currently you can mount the following storage: Azure Data Lake Storage Gen2, AWS S3, Isilon, StorageGRID and Flashblase.



Wednesday, 30 October 2019

Two Great Conferences


Next week will be a really exciting week with 2 great conferences
November 4–8, 2019 | Orlando, FL

Learn innovative ways to build solutions and migrate and manage your infrastructure. Connect with over 25,000 individuals focused on software development, security, architecture, and IT. Explore new hands-on experiences that will help you innovate in areas such as security, cloud, and hybrid infrastructure and development.



PASS Summit
November 5 - 8, 2019 | Seattle, WA

The summit has 3 tracks architecture, data management and analytics. There is interactive training on the latest technologies and spotlights on hot topics such as security, cloud, and AI will be led by the best data minds in technology.






Tuesday, 29 October 2019

Introducing SQL Server 2019

SQL Server 2019 is the latest version of  SQL Server. It redefines SQL Server from a traditional relational database system to a data platform for every data scenario from OLTP to DW to now big data and analytics. 

There is a great video on Channel 9 which gives a quick overview of all the new things in SQL Server 2019.



Sunday, 20 October 2019

Cloud Migration Strategy





















I had to share this article because of the picture. This article looks at the merits of each migratory path to plan your journey. The options discussed are: 
  • Lift and Shift
  • Evolve - between lift and shift and a full rebuild
  • Go Native - Cloud-native apps are designed for the cloud, so assume that the infrastructure they run on is inherently unreliable, but also controllable. They should be largely self-aware, self-scaling and self-healing. Most of the functionality is simply consumed rather than written into the code.



Friday, 18 October 2019

European Spark and AI summit 2019




The keynotes for the European Spark and AI summit 2019 which took place 16 October 2019 in Amsterdam are online to watch if you missed the event.  A few of the topics are



Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems
New Developments in the Open Source Ecosystem
Saving Energy in Homes with a Unified Approach to Data and AI

Simplifying Model Management with MLflow
Scalable AI for Good
Forecasting 'What-if' Scenarios in Retail Using ML-Powered Interactive Tools
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Imaging the Unseen: Taking the First Picture of a Black Hole


Some Annoucements

Databricks Simplifies Machine Learning Model Management At Scale With MLflow Model Registry
Databricks brings its Delta Lake project to the Linux Foundation

A design pattern is emerging called 'Lake House'. This is a pattern where Spark is not just replacing Hadoop and ETL, but also associated with data warehouses, business intelligence and reporting. 

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.



Wednesday, 16 October 2019

After Data Relay


Data Relay has finished for the 2019 season. This year I was involved as Head of Marketing and was the Bristol event owner again. I enjoyed being a part of the Data Relay team. It is great to be able to put on an event with free training across the breadth of the Microsoft Data Platform.  

I also presented on “The big data and database management paradigm shift”  The abstract

With the emergence of big data this has created a new paradigm for data and database management.  Structured and unstructured data must now work together to produce actionable insights. This session will share details on how to embrace this new era of database management, to prepare you for SQL Server 2019.

The slides summary 




















References



Wednesday, 9 October 2019

Apache Spark Under the Hood Free ebook

Databricks has created an ebook to share excerpts from the book, Spark: The Definitive Guide
In this eBook, it cover:
  • The past, present, and future of Apache Spark.
  • Basic steps to install and run Spark yourself.
  • A summary of Spark's core architecture and concepts.
  • Spark's powerful language APIs and how you can use them.

Monday, 7 October 2019

Data Relay starts today

The week has finally arrived and it is Data Relay 2019. The agendas are here:

#Newcastle Mon 7 https://bit.ly/2mktU73 #Leeds Tues 8 https://bit.ly/2kP99jh  #Nottingham Wed 9 https://bit.ly/2kkct5P #Birmingham Thurs 10 https://bit.ly/2lRh6F2  #Bristol Fri 11 https://bit.ly/2kkcwi1

The sessions are covering a raft of exciting topics.


Friday, 4 October 2019

Two New Datasets to Improve Natural Language Understanding Models

Google’s PAWS data set helps AI models capture word order and structure. Yuan Zhang, Research Scientist and Yinfei Yang, Software Engineer, Google Research posted: Read more

Word order and syntactic structure have a large impact on sentence meaning — even small perturbations in word order can completely change interpretation. For example, consider the following related sentences:

Flights from New York to Florida.
Flights to Florida from New York.
Flights from Florida to New York.

All three have the same set of words. However, 1 and 2 have the same meaning — known as paraphrase pairs — while 1 and 3 have very different meanings — known as non-paraphrase pairs. The task of identifying whether pairs are paraphrase or not is called paraphrase identification, and this task is important to many real-world natural language understanding (NLU) applications such as question answering Read on

Monday, 30 September 2019

Accelerating Competitive advantage with AI

Microsoft have released a new report Accelerating Competitive advantage with AI.  The report highlights that the UK is well placed to succeed.  To download the report https://aka.ms/AcceleratingAI

The initial report also worth reading was entitled Maximizing the AI opportunity: How to harness the potential of AI effectively and ethically. To download that report https://aka.ms/UKAIreport 




The anatomy of an AI enabled organisation is explained in the diagram below.




Thursday, 19 September 2019

Data Relay 2019

Data Relay is only a few weeks away. The agendas for the events are:

The Newcastle agenda  https://bit.ly/2mktU73 Register now for the event Mon 7 http://bit.ly/datarelaynew

The Leeds agenda https://bit.ly/2kP99jh Register now for the event Tues 8 http://bit.ly/datarelayleeds

The Nottingham agenda https://bit.ly/2kkct5P Register now for the event Wed 9 http://bit.ly/datarelaynott 

The Birmingham agenda https://bit.ly/2lRh6F2 Register now for the event Thurs 10 http://bit.ly/datarelaybham

The Bristol agenda  https://bit.ly/2kkcwi1 Register now for the event Fri 11 http://bit.ly/datarelaybsl
 

Saturday, 7 September 2019

The big data and database management paradigm shift

I am really pleased to be speaking at Data Relay this year in Bristol on Friday 11 October. The agenda is here . The abstract for my session is below

With the emergence of big data this has created a new paradigm for data and database management.  Structured and unstructured data must now work together to produce actionable insights. This session will share details on how to embrace this new era of database management, to prepare you for SQL Server 2019.


Thursday, 5 September 2019

The Mirage and Metamorphosis of Data and AI



I have just taken a break to rejuvenate my creative juices. It was a time to reflect and innovate. We are often so busy in our day to day lives we don't stop and reflect. I spent my time reading and catching up on bleeding edge technology. I am always fascinated to see what is coming next, what problems researchers are trying to address and how Data and AI could be utilised to benefit industry and the world around us.

The role I enjoy the most is as a Data and AI philosopher providing thought leadership. We are at an exciting time in history to witness and contribute to the mirage and metamorphosis of Data and AI. My explorations find exciting challenges in diversity and Data and AI at the centre of most things we want to achieve. Research is increasingly needed in industry to achieve business success due to the increasing complexity within industry and the world around us. We need to move away from agile for certain tasks to enable complexity to be understood and use systems thinking techniques.  My findings on the future mirage and metamorphosis of Data and AI are a complex interconnected world around Data and AI and mastering that complexity is the key to success.



  











References
The data and AI market landscape 2019: The next wave of hybrid emerges
https://www.zdnet.com/article/the-data-and-ai-market-landscape-2019-the-next-wave-of-hybrid-emerges/
Part I: A Turbulent Year: The 2019 Data & AI Landscape
https://mattturck.com/data2019/
Part II: Major Trends in the 2019 Data & AI Landscape
https://mattturck.com/2019trends/
Navigating AI hype in search of success, Oliver Pickup (Sunday Times 12 May 2019)
The real big-data problem and why only machine learning can fix it
https://siliconangle.com/2019/08/09/real-big-data-problem-machine-learning-can-fix-mitcdoiq-startupoftheweek/
Big Data is just Data
https://buckwoody.wordpress.com/2019/08/26/big-data-is-just-data/
Maximising the AI opportunity
https://info.microsoft.com/rs/157-GQE-382/images/UK-DIGTRNS-CNTNT-content-MGC0003240.pdf
The Data Ethics Framework principles
https://www.gov.uk/government/publications/data-ethics-framework/data-ethics-framework

Thursday, 22 August 2019

Microsoft ML for Apache Spark

Microsoft Research announce a new version Microsoft ML for Apache Spark, an open-source and distributed ML and microservice library. v0.18 brings Vowpal Wabbit on Spark, Speech to Text & more!

Microsoft Machine Learning for Apache Spark (MMLSpark) is an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning. It enables sending streaming data to Power BI.
Website: http://aka.ms/spark Paper: http://aka.ms/spark-paper



Global AI Nights 2019


The Global AI Night is a free evening event organized in London by community people, who are passionate about Artificial Intelligence on the Microsoft Azure. It is at the The Microsoft Reactor in London on Thursday September 5, 2019 5:45 PM – 10:00 PM Register here.

Friday, 16 August 2019

Database Trends Awards




Database Trends and applications have names have listed the best relational database and best big data platform.






Best relational database: SQL Server
"According to Craig S. Mullins, president & principal consultant, Mullins Consulting, Inc, relational continues to dominate: IDC forecasts that relational DBs will still account for more than 80% of the total operational database market through 2022, and Gartner forecasts that through 2020, relational technology will continue to be used for at least 70% of new applications and projects."

Best big data platform: Cloudera Enterprise Data Cloud

"To leverage the immense power of their data, organizations need a solid strategy that incorporates everything from security to data governance to the right big data technologies. Enabling both on-prem and cloud deployments—or a hybrid strategy—big data platforms today support data warehouses, data lakes, data science, engineering, machine learning, myriad database management systems, and much more.  And while Hadoop is a key element of big data platforms today, there are also many other open source components, support capabilities, and advanced features that round out a big data platform to give data-driven companies the big data capabilities they need"

Wednesday, 7 August 2019

Discover Datasets

There are many thousands of data repositories around the world and to make it easy to access this data Google have launched a Dataset Search service.

https://toolbox.google.com/datasetsearch



This aimed to be a companion of sorts to Google Scholar, the company’s popular search engine for academic studies and reports.

Read more about the service here.

Friday, 2 August 2019

The Big Data Problem

The article The real big-data problem and why only machine learning can fix it and video from the MIT CDO conference, Cambridge, MA contains an interesting discussion on why ETL and MDM don't scale and why placing a schema later doesn't deliver usable data. The key is using machine learning to classify and prep data.



Thursday, 25 July 2019

SQL Server 2019 Workshop Lab

SQL Server 2019 is a modern data platform designed to tackle the challenges of today's data professional.






















There is a new self-paced free lab is available to learn some of the concepts and how to solve modern data challenges using a hands-on lab approach.

SQL Server 2019 provides many new capabilities including:

  • Data Virtualization with Polybase and Big Data Clusters to reduce the need for data movement
  • Intelligent Performance to boost query performance with no application changes
  • Security enhancements such as Always Encrypted and Data Classification
  • Mission Critical Availability including Availability Groups on Kubernetes and Accelerated Database Recovery
  • Modern Development capabilities including Machine Learning Services and Extensibility with Java and the language of your choice
  • SQL Server on the platform of your choice with compatibility including Windows, Linux, Docker, Kubernetes, and Arm64 (Azure SQL Database Edge)

Monday, 22 July 2019

Data Relay 2019 Registration is open

Data Relay is open for registration at all 5 venues.


Newcastle Monday 7 October
Leeds Tuesday 8 October 
Nottingham 9 October
Birmingham Wednesday 10 October
Bristol Friday 11 October





Data Relay is returning for its 9th year, and is heading your way in October 2019!

Data Relay features top quality Microsoft Data Platform content from Microsoft and internationally renowned speakers.

With over 1000 registrations on the last Relay, reserve your place quickly

The full agenda will be published shortly. The day will comprise of a series of 55 minute technical presentations, at beginner, intermediate and advanced levels. You can switch tracks at will throughout the day, to select the sessions that are most relevant to you.



Thursday, 18 July 2019

MVP Award Package

So excited to receive my #MVP award package and disk. It is such an honour to receive this for a second year. What an amazing  #data community we have. Thank you #Microsoft #MVPBuzz

On the award it says "We recognize and value your exceptional contributions to technical communities worldwide."

The thing I  value the most is helping and sharing data innovations with the community.  There are so many exciting developments, opportunities and benefits for the communities that data can bring. People are having extraordinary visions for the future, brought about with data and advanced analytics working together to solve complex problems.

I am looking forward to the next year of exciting data events.

Tuesday, 16 July 2019

Microsoft Inspire 2019 Corenotes

The Microsoft Inspire 2019 Corenotes with Satya Nadella and Brad Smith is being livestreamed tomorrow starting at 8:30AM PT. It can be watched on You Tube.  Microsoft Inspire is where partners meet to connect, collaborate and celebrate as one community.


Tuesday, 9 July 2019

What is Azure SQL Database Hyperscale

Azure SQL Database is based on SQL Server Database Engine architecture for the cloud environment. There are three architectural models that are used in Azure SQL Database:

  • General Purpose/Standard
  • Hyperscale
  • Business Critical/Premium

The Hyperscale service tier in Azure SQL Database is a new service tier in the vCore-based purchasing model. The Hyperscale service tier in Azure SQL Database provides the following additional capabilities:

  • Support for up to 100 TB of database size
  • Nearly instantaneous database backups 
  • Fast database restores that are based on file snapshots which take in minutes rather than hours or days
  • Rapid scale out to provision one or more read-only nodes for offloading your read workload 
  • Rapid Scale up and to accommodate intermittent heavy workloads 

To learn more watch the video.

Monday, 1 July 2019

A Second Data Platform MVP Award

I am very excited to have received my second Data Platform MVP award. What an honour to receive this along with so many amazing Most Valuable Professionals (MVP).  It is such a privilege to share my passion for data with the community #MVPBuzz #SQLFamily 


The Microsoft Data Platform award is an amazing award. It recognizes exceptional technology community leaders worldwide who actively share their high quality, real world expertise with users and Microsoft.

There are so many exciting things approaching to share with the community. A few highlights are:

  • Microsoft Ignite
  • PASS Summit
  • Data Relay UK
  • SQL Server 2019 Big Data Clusters when it becomes available

I am keen to see what advancements may come from AI integration and always thinking about diversity and inclusion. It is great to think about data strategy and the power it holds over the future of business.

Thursday, 20 June 2019

The Common Data Model

The Common Data Model (CDM) is a standard and extensible collection of schemas (entities, attributes, relationships) that represents business concepts and activities with well-defined semantics, to facilitate data interoperability. 

The Common Data Model, that was announced at Ignite, as part of the Open Data Initiative, a jointly-developed vision by Microsoft, Adobe and SAP. CDM is already supported in the Common Data Service, Dynamics 365, PowerApps, Power BI, and upcoming Azure data services. This data is continually developing at Microsoft. I do wonder what consistency there will be between the Microsoft Common Data Model and the Splunk Common Information Model






























Tuesday, 18 June 2019

Lineage Power BI

The lineage view was announced at the Business Applications Summit. It is coming to the Power BI service so you can soon trace all your data from source to report.

Sunday, 16 June 2019

Shared and certified datasets

Microsoft certified data sets shared at the Buisness Applications Summit, will discover and reuse trusted data assets in your organization.


Tuesday, 11 June 2019

Microsoft Business Application Summit

It is the Microsoft Business Application Summit in at Atlanta, Georgia 10-11 June 2019 and there was a raft of new features launched yesterday for Power BI relating to AI and Enterprise features. The notes of the key features are below:





















New AI and Enterprise features for Power BI are covered in depth
What’s new and planned for business intelligence future release dates are shared

Microsoft Power BI: The future of modern BI - roadmap and vision video

Watch on-demand sessions from the event

June release of Power BI Desk. This incorporates some exciting Q & A changes.