Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Saturday 24 June 2023

Data Quality inspiration, erudition and expedience

Excited to share my session on Data Quality inspiration, erudition and expedience at 17:45 at Data Toboggan Cool Runnings Saturday 24 June 2023.

This session covers

Great data quality is a key output of data systems. We often get wrapped up in the technological, like Microsoft Fabric, Synapse and Purview. This session covers the what why and how of data quality, to help provide a deeper understanding and roadmap for adoption.

Wednesday 14 June 2023

Data Toboggan Cool Runnings 2023

The Piste maps for the Data Toboggan agenda have become a regular feature of our event. There is one for each room. They enable you to dream , learn and have fun. 

Event Date: Sat 24 June 2023

Register now: https://bit.ly/DTCR2023

Agenda: https://bit.ly/DTCR2023Agenda

Friday 9 June 2023

Fabric an enabler for better data quality

 I was reading this article

How Microsoft Fabric aims to beat Amazon and Google in the cloud war and it sparked some thoughts on data quality.

Technical architectures are always changing but over the last few weeks we have witnessed a pivotal change with the introduction of Microsoft Fabric, creating a paradigm shift.  One lake has advantages of cost saving, transparency, flexibility, data governance and data quality.  Data governance and data quality are two of the areas that I feel need the most work in the data stack. Both are heavily reliant on people and how they perceive its importance. New style data governance is an enabler through distributed  teams in the business. Increased data quality is high on the list of areas that need improvement, but that has not seen any significant change with the move to cloud based services. A tool in Fabric called shortcuts helps enterprises with that single virtualized data lake across multi-clouds and is a stepping stone to better data quality.

Previously Synapse combined services into a single place for data lake and data warehouse with integrated Microsoft Purview to help with providing a greater holistic view of data. Fabric, as a SaaS service, goes one step further to provide a single place for data management and data governance on OneLake. It is enabling improved consistency and trustworthiness of data.  Bridging the gap between BI and AI  which brings with it a unique opportunity to improve data quality. Good quality data you can trust is the foundation stone of successful business growth.

There is a vast amount of documentation currently available to help us learn about Fabric. A few ket resources are below.





Why Data Quality is important for your business

Original article I wrote was Published on LinkedIn 

Data Quality

#data volumes are increasing significantly. It is hard to keep track of the data used in a business and to know how accurate the data is, that is being used for business decisions. Accurate data is critical to the success of a business. Ensuring that data comes from the right source, is not duplicated, does not have syntax issues or is incorrectly updated requires good data management.

What is data quality?

DAMA UK defines data quality as ‘the planning , implementation , and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers’ So good data quality needs accurate data that meets the needs of a business.

There is also an international standard for data quality, ISO 8000. This states ‘Quality is actually the conformance of characteristics to requirements and, thus, any item of data can be of high quality for one use but not for another use that has differing requirements.’ The ISO8000 identifies that data quality has syntactic (format), semantic (meaning) and pragmatic (usefulness) characteristics.

Why is data quality important

Good data quality enables strong business decisions to be made leading to better business outcomes. These business decisions based on data can lead to greater profitability, helping improve situations and can be the starting place for predictive analytics using AI systems. Poor data quality often occurs due to human error and being able to put checks and balances in place to know the current state of the data is important.

Measuring data quality

It is important for every business to understand the level of their data quality maturity. This enables trustworthy decisions to be made and can even impact data ethical understanding. There are 6 main dimensions to consider.

  • Completeness – this metric addresses the requirements that all data sets and data items have all the relevant information. The measure is about that missing information, thinking about the proportion of data received, incomplete, missing, and data loss.
  • Accuracy – does the data reflect the data set real world, is it truthful containing the correct data entries.
  • Uniqueness – is a single view of the data. This metric looks at the extent of the duplication of data, with consideration of how data is controlled.
  • Validity – does the data match the rules such as syntax (format, type, range) of its definition.
  • Consistency – do the data sets match across data stored in two or more records. Does the pattern \ frequency of data match.
  • Timeliness – the degree the data represents reality at a point in time. Is the data available when required.

DAMA provide details on how to calculate these measure to provide those KPI’s to business. A useful technique to help provide a view on the current state of the data, is data profiling. This can look at things like counting nulls in data, the max/min value, max/min length, frequency distribution and data type and format.

Data enhancements

The main take away is to always consider the quality of the data you use and embed checks into the business processes. Always have a dashboard showing the current state of business data quality. If you are looking for a framework to help guide your thought processes the UK government has created a data management framework which is worth reviewing. In conclusion start with a review of the data quality of the core data sources you use in your data catalog and record the current state.

Key next steps to improve Data Quality

To start on the path of immediately improving the data quality, for the organisation and enable better analytics:

  • Understand what data exists
  • Assign a relevant data owner to the data sources
  • For each business use case understand the lineage of the data used for reports and dashboards.

Here are some helpful resources to assist with the above steps!

Creating a Data Catalog with Microsoft Purview - Cloud Adoption Framework

Improving Data Lineage with Microsoft Purview - Cloud Adoption Framework

Data quality considerations - Cloud Adoption Framework

Friday 2 June 2023

Microsoft Fabric ebook

Microsoft Fabric is an alignment of many tools and with governance being embedded it is an integral part of the solution. Governance is backed by Microsoft Purview which is one of the tools used to safeguard data.  It provides that single pane of governance and is just there to automatically help govern the solution.  

OneLake is the unified data foundation.

The unified tools

  • SaaS product  experience
  • Security and governance
  • Compute and storage
  • Business model

There is more high level details in the ebook about Microsoft Fabric. 

Unlocking Transformative Data Value with Microsoft Fabric