Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Thursday 25 April 2024

The Age of Data Governance

Microsoft Purview is rapidly changing in the data governance space.  It is offering Data value creation with essential defense & response offense . This new addition helps business address the issues that the AI outputs are only as good as the quality of the data that resides behind it.

Peter Aiken new definition of data governance ' Managing data decisions with guidance’.  

Suma Manohar has written a great article talking about data quality in the era of AI.  Microsoft purview introduced domain and data products adding that clear business context and terminology mapping.  Enhanced search capability to provide more understanding using Copilot is available. It also can help with suggesting Data Quality rules.  These autogenerated rules are context specific.

Creating data quality rules manually in Purview should follow the 6 standard data quality metrics.

  • Freshness – confirms that all values are up to date.
  • Duplicate rows- checks rows to find repeated values across two or more columns.
  • Empty/blank files – looks for blank and empty fields in a column where there should be values.
  • Unique values – confirms that values in a column are unique.
  • Data type match – confirms that values in a column match data type requirements.
  • String format match – confirms that text values in a column match a specific format or other requirements.
  • Table lookup – confirms that a value in one table can be found in a specific column of another table
  • Custom – create a custom rule with the visual expression builder.
  • Regular expressions can be used for pattern matching in the above.

When working on data quality there are standard guidelines that can help. A method I use is firstly from the DAMA-DMBOK and then the Data Management Capability Assessment Model (DCAM)

Scans take place to show quality score and  trends in the data quality dashboard and scores are shown on the data product page

The rollout of the new solution across the regions is shared here.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.