Chaos, complexity, curiosity and database systems. A place where research meets industry
Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP
"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein
#data volumes are increasing significantly. It is hard to keep track of the data used in a business and to know how accurate the data is, that is being used for business decisions. Accurate data is critical to the success of a business. Ensuring that data comes from the right source, is not duplicated, does not have syntax issues or is incorrectly updated requires good data management.
What is data quality?
DAMA UK defines data quality as ‘the planning , implementation , and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers’ So good data quality needs accurate data that meets the needs of a business.
There is also an international standard for data quality, ISO 8000. This states ‘Quality is actually the conformance of characteristics to requirements and, thus, any item of data can be of high quality for one use but not for another use that has differing requirements.’ The ISO8000 identifies that data quality has syntactic (format), semantic (meaning) and pragmatic (usefulness) characteristics.
Why is data quality important
Good data quality enables strong business decisions to be made leading to better business outcomes. These business decisions based on data can lead to greater profitability, helping improve situations and can be the starting place for predictive analytics using AI systems. Poor data quality often occurs due to human error and being able to put checks and balances in place to know the current state of the data is important.
Measuring data quality
It is important for every business to understand the level of their data quality maturity. This enables trustworthy decisions to be made and can even impact data ethical understanding. There are 6 main dimensions to consider.
Completeness – this metric addresses the requirements that all data sets and data items have all the relevant information. The measure is about that missing information, thinking about the proportion of data received, incomplete, missing, and data loss.
Accuracy – does the data reflect the data set real world, is it truthful containing the correct data entries.
Uniqueness – is a single view of the data. This metric looks at the extent of the duplication of data, with consideration of how data is controlled.
Validity – does the data match the rules such as syntax (format, type, range) of its definition.
Consistency – do the data sets match across data stored in two or more records. Does the pattern \ frequency of data match.
Timeliness – the degree the data represents reality at a point in time. Is the data available when required.
DAMA provide details on how to calculate these measure to provide those KPI’s to business. A useful technique to help provide a view on the current state of the data, is data profiling. This can look at things like counting nulls in data, the max/min value, max/min length, frequency distribution and data type and format.
The main take away is to always consider the quality of the data you use and embed checks into the business processes. Always have a dashboard showing the current state of business data quality. If you are looking for a framework to help guide your thought processes the UK government has created a data management framework which is worth reviewing. In conclusion start with a review of the data quality of the core data sources you use in your data catalog and record the current state.
Key next steps to improve Data Quality
To start on the path of immediately improving the data quality, for the organisation and enable better analytics:
Understand what data exists
Assign a relevant data owner to the data sources
For each business use case understand the lineage of the data used for reports and dashboards.
Here are some helpful resources to assist with the above steps!