Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Wednesday, 28 July 2021

Data Governance: An Introduction

Initially published on the Coeo blog.  

Data Governance is a core area that businesses need to adopt in the data-driven world. Data has been around since the earliest of times, from the first libraries in the ancient world that started to collect and store information.

The collection of scientific research information, from census information about human populations, weather and spatial data to DNA genetic data, have all been contributing to the need to store data for analysis. The breadth of the information that is available for analysis covers our entire planet and beyond, and the population as well as different species. With our life and environment becoming documented to the finest degree the need for categorisation, data labelling and data management has become engrained into our society. Where research led the way for documentation of classification for data, business is now at a crucial time of growth and expansion to enable innovation.

With all data there becomes a continual need for its management and a core starting place is data governance. The DAMA Dictionary of Data Management defines Data Governance as “The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets".

The goal of data governance is to help an organisation to manage data as an asset efficiently and effectively. It provides the principles, policy, processes, framework, metrics and oversight that are required to drive the most business value. Data governance programs have a goal of creating sustainable data management, good data quality that is measured and defining policies and practices. A much-needed area that needs to be considered is that of culture and embedding that culture of data management into the business.

We start with understanding what data assets a business has from the core known data and dark data; data that is collected but not used. The proliferation of duplicate data around a business is key to document. Often the first thing that comes to mind with data governance these days is compliance with all the data breaches that keep occurring. The areas one thinks of here are:  

  • Policies
  • Transparency
  • Governance
  • Regulations, such as GDPR
  • Standards
  • Rules
  • Law

These require data inventories and audits to understand what personal data your organisation collects, where it is stored, how it is protected and who may have access to it.​ This is part of the picture that needs to be considered.

DAMA-DMBOK is an international guiding framework for the management of data. The framework includes areas such as:

  • Data Strategy – defining, communicating and driving execution​.
  • Policy – metadata management, access, usage, security, quality
  • Standards and quality – data architecture and data quality standards
  • Oversight/audit/stewardship
  • Compliance
  • Data issue management – compliance, ownership, policy, terminology, data quality, data access
  • Data management improvement projects 
  • Data asset valuation constantly define business value of data assets.

Consideration for the allocation of roles and responsibilities within an operating model helps guide the adoption of best practices.

In conclusion, managing data assets within a business requires it to be embedded in the culture of an organisation. Having high quality data leads to better business decisions. Having a core oversight function that is provided by a Chief Data Officer helps with keeping the day to day running of data in the fore front of everyone’s minds and you never know where the next innovation will come from.

More Information

Thursday, 22 July 2021

SQLBits the greatest data show

How exciting to receive this. Thank you SQLBits for being the most amazing data conference. Looking forward to when face to face events return.



Saturday, 10 July 2021

Purview Readiness one-pager checklist

Starting with Azure Purview requires a few prerequisites. The checklist  sets out 4 phases. 

Identify where to start to establish a Governance baseline foundation for your organization for general cloud governance. 












Azure Purview Readiness Checklist

Purview-Deployment-Checklist will help prepare for data governance and data democratization in your environment.

There are four sections

  • Readiness
  • Build foundation
  • Register data sources
  • Curate and consume data

The Azure Purview deployment prerequisites can help with those proof of concept deployments


Azure Purview automated readiness checklist

A set of scripts have been written to help evaluate your exciting environment for missing configuration that might prevent data sources being scanned. The PowerShell scripts are

  • Azure-Purview-automated-readiness-checklist.ps1
  • Azure-Purview-automated-readiness-checklist-csv-Input.ps1





















Tuesday, 6 July 2021

Azure Purview - A Data Catalog

Data Catalogs are becoming an essential component in the new data world. They are an inventory of an organisations data assets. The meta data collected helps in finding the most appropriate data at speed, know what data is held, the security levels and lineage of data. Microsoft have tool in preview that helps with data governance  called Azure Purview.

Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification and end-to-end data lineage. Empower data consumers to find valuable, trustworthy data.




Once installed it moves into the hands of data governors. There are predefined data plane roles dictating who can access what.

Purview Data Reader - access to the Purview portal and can read all content except for scan bindings

Purview Data Curator - access to the Purview portal and can read all content except for scan bindings, can edit information about assets, classification definitions and glossary terms, and can apply classifications and glossary terms to assets.

Purview Data Source Administrator - Can manage all aspects of scanning data into Azure Purview but does not have read or write access to content beyond those related to scanning. The role does not have access to the Purview Portal (the user needs to also be in the Data Reader or Data Curator roles)

Access to Purview is through Purview Studio. One the home page there are 2 menus. The quick access to menus in the centre of the page take you to

  • Knowledge centre
  • Register sources
  • Browse assets
  • Manage glossary

The left hand menu has 5 icons

Home-  to see a summary of sources

Sources – to add and manage sources

Glossary – to add and manage glossary term collections

Insights – the visual dashboards of the assets

Management – a place to perform administrative tasks

 












 

There 3 areas Data Map, Data Catalog and Data Insights

Data Map

The data map shows your sources in collection grouping of your choice. Many data types are added by default with more coming all the time. Azure Purview is built on top of Apache Atlas and with its API you can gain extra functionality.




Once a data source is add with the relate permissions granted on the asset regular scans can be scheduled. The is network isolation can give the ability to scan on-premises and Azure data sources behind a vnet using SHIR ensuring E2E network isolation.

Governing your scan has these step

  • Register your source
  • Apply and set up your credentials
  • Set up and run your scan
  • Discover your SQL server data

Glossary

This contains a set of terms the business uses. There might be multiple terms in the business that mean the same thing.

  • synonyms - different terms with the same definition
  • related - different name with similar definition

There are some default attributes that exist and can be enhanced by custom attributed. The default attributes are

  • Name
  • Definition
  • Data stewards
  • Data experts
  • Acronym
  • Synonyms
  • Related terms
  • Resources

Classification and labelling

Labelling of data is important to aid with communication. Consistency is important. Classifications can describe

  • A type of data that exists in a data asset or schema to help identify the content of a data asset.
  • It could describe a data preparation process
  • Can also help with compliance

Azure Purview provides a set of default classification rules which are automatically detected during scanning.  Purview uses the same classifications, also known as sensitive information types, as Microsoft 365. Azure Purview integrates with Microsoft Information Protection Sensitivity Label. There is automated scanning and labelling for files Azure Blob storage, Azure Data Lake Storage Gen 1 and Gen 2. Automatic labelling for database columns for SQL Server, Azure SQL Database, Azure SQL Database Managed Instance, Azure Synapse, Azure Cosmos DB.  The default classification rules are not editable although it is possible to define your own custom classification rules using Regex or custom expressions. Classification looks at the data e.g. select top 100 from customers for data profiling ,  customer pattern matching , expression matching and applying the classification tags afterwards.



 

 

















Data Catalog

This enables you to search your data , use workflows in the business glossary and view data linage for sources in the data ingestion pipeline. It connects with tools such as Power BI, Azure Data Factory and Azure Synapse. It enables curations and collaboration.

Data Insights

This is a key pillar in Purview to enable a single pane of glass in the catalog.

The insight reports that currently exist are

  • Asset Insights
  • Scan Insights
  • Glossary Insights
  • Classification Insights
  • Sensitivity labelling insights
  • File Extension Insights

Sunday, 4 July 2021

Operational Intelligence

The answers lies in insights from data analytics whether diagnostic , predictive or  prescriptive analytics. 

The Gartner model looks at 4 areas to gain competitive advantage.



I Watching a presentation at Microsoft Build this slide was shared adding an additional dimension of cognitive analytics. Interesting to see the deeper insight showed in a diagram as AI as the driving force through the advanced analytics areas.


 

Thursday, 1 July 2021

2021-2022 Microsoft Most Valuable Professional

So excited, such amazing news to receive my 4th MVP award. Thank you #Microsoft Just so humbled to receive this at this time. There is no better time to be a part of such an amazing community #SQLfamily #DataToboggan #AzureSynapse #MVPBuzz







So many exciting things going on that i'm involved in with the community after my PhD, Data Toboggan (its 3 conferences and user group) Data Relay, SQLBits, SQL Saturday and data research #data #ai #bigdata #analytics #datascience #dataanalytics #research #innovation #datastrategy #datagovernance #phd #artificialintelligence