It is such a privilege to have an article published in the Communications of the ACM journal.
July 2021, Vol. 64 No. 7, Page 7
It can be read here
Initially published on the Coeo blog.
Data Governance is a core area that businesses need to adopt in the data-driven world. Data has been around since the earliest of times, from the first libraries in the ancient world that started to collect and store information.
Starting with Azure Purview requires a few prerequisites. The checklist sets out 4 phases.
Identify where to start to establish a Governance baseline foundation for your organization for general cloud governance.
Purview-Deployment-Checklist will help prepare for data governance and data democratization in your environment.
There are four sections
The Azure Purview deployment prerequisites can help with those proof of concept deployments
Azure Purview automated readiness checklist
A set of scripts have been written to help evaluate your exciting environment for missing configuration that might prevent data sources being scanned. The PowerShell scripts are
Data Catalogs are becoming an essential component in the new data world. They are an inventory of an organisations data assets. The meta data collected helps in finding the most appropriate data at speed, know what data is held, the security levels and lineage of data. Microsoft have tool in preview that helps with data governance called Azure Purview.
Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification and end-to-end data lineage. Empower data consumers to find valuable, trustworthy data.
Once installed it moves into the hands of data governors. There are predefined data plane roles dictating who can access what.
Purview Data Reader - access to the Purview portal and can read all content except for scan bindings
Purview Data Curator - access to the Purview portal and can read all content except for scan bindings, can edit information about assets, classification definitions and glossary terms, and can apply classifications and glossary terms to assets.
Purview Data Source Administrator - Can manage all aspects of scanning data into Azure Purview but does not have read or write access to content beyond those related to scanning. The role does not have access to the Purview Portal (the user needs to also be in the Data Reader or Data Curator roles)
Access to Purview is through Purview Studio. One the home page there are 2 menus. The quick access to menus in the centre of the page take you to
The left hand menu has 5 icons
Home- to see a summary of sources
Sources – to add and manage sources
Glossary – to add and manage glossary term collections
Insights – the visual dashboards of the assets
Management – a place to perform administrative tasks
There 3 areas Data Map, Data Catalog and Data Insights
The data map shows your sources in collection grouping of your choice. Many data types are added by default with more coming all the time. Azure Purview is built on top of Apache Atlas and with its API you can gain extra functionality.
Once a data source is add with the relate permissions granted on the asset regular scans can be scheduled. The is network isolation can give the ability to scan on-premises and Azure data sources behind a vnet using SHIR ensuring E2E network isolation.
Governing your scan has these step
This contains a set of terms the business uses. There might be multiple terms in the business that mean the same thing.
There are some default attributes that exist and can be enhanced by custom attributed. The default attributes are
Classification and labelling
Labelling of data is important to aid with communication. Consistency is important. Classifications can describe
Azure Purview provides a set of default classification rules which are automatically detected during scanning. Purview uses the same classifications, also known as sensitive information types, as Microsoft 365. Azure Purview integrates with Microsoft Information Protection Sensitivity Label. There is automated scanning and labelling for files Azure Blob storage, Azure Data Lake Storage Gen 1 and Gen 2. Automatic labelling for database columns for SQL Server, Azure SQL Database, Azure SQL Database Managed Instance, Azure Synapse, Azure Cosmos DB. The default classification rules are not editable although it is possible to define your own custom classification rules using Regex or custom expressions. Classification looks at the data e.g. select top 100 from customers for data profiling , customer pattern matching , expression matching and applying the classification tags afterwards.
This enables you to search your data , use workflows in the business glossary and view data linage for sources in the data ingestion pipeline. It connects with tools such as Power BI, Azure Data Factory and Azure Synapse. It enables curations and collaboration.
This is a key pillar in Purview to enable a single pane of glass in the catalog.
The insight reports that currently exist are
The answers lies in insights from data analytics whether diagnostic , predictive or prescriptive analytics.
The Gartner model looks at 4 areas to gain competitive advantage.
I Watching a presentation at Microsoft Build this slide was shared adding an additional dimension of cognitive analytics. Interesting to see the deeper insight showed in a diagram as AI as the driving force through the advanced analytics areas.
So excited, such amazing news to receive my 4th MVP award. Thank you #Microsoft Just so humbled to receive this at this time. There is no better time to be a part of such an amazing community #SQLfamily #DataToboggan #AzureSynapse #MVPBuzz
So many exciting things going on that i'm involved in with the community after my PhD, Data Toboggan (its 3 conferences and user group) Data Relay, SQLBits, SQL Saturday and data research #data #ai #bigdata #analytics #datascience #dataanalytics #research #innovation #datastrategy #datagovernance #phd #artificialintelligence