Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Monday 29 November 2021

PASS Summit Key note Unified Data Governance with Azure Purview

Raghu Ramakrishnan CTO Data, Technical Fellow, Microsoft spoke at PASS Community Summit in November and explained the next part of the vision, policy, for data governance. Microsoft are seeing data governance as the emerging data pillar. Operational databases, unified analytics platform, and unified automated data governance. The unified part is the important element going forward, a unified single pane to extend governance across the entire data estate. Automated data classification to remove the PII headache of missing personal data and pushing the control up the stack to knowledge workers. Microsoft intend to have dynamic data providence that is fully integrated with the 6 responsible AI principles.  Azure Purview will operate a Central RBAC control and is the governing permission future state for SQL Server with full propagation. With AI integrated the policy feature will be human readable. The link to watch the session .

Data governance is increasingly interdisciplinary and the discovery of data core to a business. Questions often asked: what data do we have? where did the data originate? can I trust the data?

Compliance is an area which had been a major area of data governance. Questions often aske here:
what’s my exposure to risk? is my data usage compliant? how do I control access and use the data? what is required by regulation X?

Raghu talked about the data governance journey through the lens of GDPR Compliance. There approach was to create a 'Data Map' of all data across Microsoft and use that map to support GDPR compliance. The data discovery looking at search and discover, information supply chain, steward/curators and business glossary. Then looking at the data use governance and policy author/manage, reporting, access and governance enforcement and industry compliance. These two areas were built on intelligent data inventory - built on a data map with automated structure & lineage collection, automated & custom classification and publication / subscription APIs. 

Purview data catalog is a self service tool filled with details from knowledge workers. Areas include:
  • self-service search and browse
  • curated and standardized business glossaries
  • interactive lineage visualization
  • simplified data curation and stewardship
The data estate insights currently show these
  • data asset distribution
  • business glossary
  • data classification and labelling
  • data location and movement (in progress)
The Microsoft vision is: data in the Microsoft cloud is always governed and beyond Azure, Purview offers a single pane to extend governance across the entire data estate.

Still looking through the lens of GDPR Compliance data classification is an important feature

Dynamic lineage deep dive

He talked about the increased efficiently of extracting of dynamic SQL provenance and the 6 responsible AI areas of fairness, inclusiveness, reliability and safety, transparency, privacy and security and accountability. Talking about responsible AI and provenance with  machine learning  (ML) training and audit with the provenance of ML models as a requirement. There are a number of challenges address to enable this. 

Centralized data access control

Proactive governance controls look at things like

Policy enforcement inside data services - access control was explained (in the early stages so may change)

In the future there was mention of an ABAC Policy language ABAC = RBAC + Conditions. A human readable policy language for business users like data officers or data owners. A policy statement can be represented as a tuple of {Effect, Action, Data resource, Subject, Condition}.  The propagation of Purview polices to data repositories is asynchronous in design with Purview as the single source of the truth. SQL pull updates asynchronously,  and updates are thus not immediately visible locally like AAD logins. 

The summary of the presentation was Purview is creating a new data pillar of unified governance across the entire data state. It is deeply integrated with SQL Server, extending its governance capabilities significantly.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.