Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Saturday, 10 July 2021

Purview Readiness one-pager checklist

Starting with Azure Purview requires a few prerequisites. The checklist  sets out 4 phases. 

Identify where to start to establish a Governance baseline foundation for your organization for general cloud governance. 












Azure Purview Readiness Checklist

Purview-Deployment-Checklist will help prepare for data governance and data democratization in your environment.

There are four sections

  • Readiness
  • Build foundation
  • Register data sources
  • Curate and consume data

The Azure Purview deployment prerequisites can help with those proof of concept deployments


Azure Purview automated readiness checklist

A set of scripts have been written to help evaluate your exciting environment for missing configuration that might prevent data sources being scanned. The PowerShell scripts are

  • Azure-Purview-automated-readiness-checklist.ps1
  • Azure-Purview-automated-readiness-checklist-csv-Input.ps1





















Tuesday, 6 July 2021

Azure Purview - A Data Catalog

Data Catalogs are becoming an essential component in the new data world. They are an inventory of an organisations data assets. The meta data collected helps in finding the most appropriate data at speed, know what data is held, the security levels and lineage of data. Microsoft have tool in preview that helps with data governance  called Azure Purview.

Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification and end-to-end data lineage. Empower data consumers to find valuable, trustworthy data.




Once installed it moves into the hands of data governors. There are predefined data plane roles dictating who can access what.

Purview Data Reader - access to the Purview portal and can read all content except for scan bindings

Purview Data Curator - access to the Purview portal and can read all content except for scan bindings, can edit information about assets, classification definitions and glossary terms, and can apply classifications and glossary terms to assets.

Purview Data Source Administrator - Can manage all aspects of scanning data into Azure Purview but does not have read or write access to content beyond those related to scanning. The role does not have access to the Purview Portal (the user needs to also be in the Data Reader or Data Curator roles)

Access to Purview is through Purview Studio. One the home page there are 2 menus. The quick access to menus in the centre of the page take you to

  • Knowledge centre
  • Register sources
  • Browse assets
  • Manage glossary

The left hand menu has 5 icons

Home-  to see a summary of sources

Sources – to add and manage sources

Glossary – to add and manage glossary term collections

Insights – the visual dashboards of the assets

Management – a place to perform administrative tasks

 












 

There 3 areas Data Map, Data Catalog and Data Insights

Data Map

The data map shows your sources in collection grouping of your choice. Many data types are added by default with more coming all the time. Azure Purview is built on top of Apache Atlas and with its API you can gain extra functionality.




Once a data source is add with the relate permissions granted on the asset regular scans can be scheduled. The is network isolation can give the ability to scan on-premises and Azure data sources behind a vnet using SHIR ensuring E2E network isolation.

Governing your scan has these step

  • Register your source
  • Apply and set up your credentials
  • Set up and run your scan
  • Discover your SQL server data

Glossary

This contains a set of terms the business uses. There might be multiple terms in the business that mean the same thing.

  • synonyms - different terms with the same definition
  • related - different name with similar definition

There are some default attributes that exist and can be enhanced by custom attributed. The default attributes are

  • Name
  • Definition
  • Data stewards
  • Data experts
  • Acronym
  • Synonyms
  • Related terms
  • Resources

Classification and labelling

Labelling of data is important to aid with communication. Consistency is important. Classifications can describe

  • A type of data that exists in a data asset or schema to help identify the content of a data asset.
  • It could describe a data preparation process
  • Can also help with compliance

Azure Purview provides a set of default classification rules which are automatically detected during scanning.  Purview uses the same classifications, also known as sensitive information types, as Microsoft 365. Azure Purview integrates with Microsoft Information Protection Sensitivity Label. There is automated scanning and labelling for files Azure Blob storage, Azure Data Lake Storage Gen 1 and Gen 2. Automatic labelling for database columns for SQL Server, Azure SQL Database, Azure SQL Database Managed Instance, Azure Synapse, Azure Cosmos DB.  The default classification rules are not editable although it is possible to define your own custom classification rules using Regex or custom expressions. Classification looks at the data e.g. select top 100 from customers for data profiling ,  customer pattern matching , expression matching and applying the classification tags afterwards.



 

 

















Data Catalog

This enables you to search your data , use workflows in the business glossary and view data linage for sources in the data ingestion pipeline. It connects with tools such as Power BI, Azure Data Factory and Azure Synapse. It enables curations and collaboration.

Data Insights

This is a key pillar in Purview to enable a single pane of glass in the catalog.

The insight reports that currently exist are

  • Asset Insights
  • Scan Insights
  • Glossary Insights
  • Classification Insights
  • Sensitivity labelling insights
  • File Extension Insights

Sunday, 4 July 2021

Operational Intelligence

The answers lies in insights from data analytics whether diagnostic , predictive or  prescriptive analytics. 

The Gartner model looks at 4 areas to gain competitive advantage.



I Watching a presentation at Microsoft Build this slide was shared adding an additional dimension of cognitive analytics. Interesting to see the deeper insight showed in a diagram as AI as the driving force through the advanced analytics areas.


 

Thursday, 1 July 2021

2021-2022 Microsoft Most Valuable Professional

So excited, such amazing news to receive my 4th MVP award. Thank you #Microsoft Just so humbled to receive this at this time. There is no better time to be a part of such an amazing community #SQLfamily #DataToboggan #AzureSynapse #MVPBuzz







So many exciting things going on that i'm involved in with the community after my PhD, Data Toboggan (its 3 conferences and user group) Data Relay, SQLBits, SQL Saturday and data research #data #ai #bigdata #analytics #datascience #dataanalytics #research #innovation #datastrategy #datagovernance #phd #artificialintelligence



Tuesday, 29 June 2021

Data Culture

Data culture is a term that has been talked about over many years. It is about using data to drive an organizations decisions.  Mckinsey state there are seven principles that underpin a healthy data culture

  • data culture is a decision culture
  • data culture is a C-Suite imperative, and that of the board
  • the democratization of data
  • data culture puts risk at its core
  • culture catalysts with people bridging data science and on the ground operations
  • sharing data beyond company walls is shifting for in house competitive advantage to assembling the breath of best data assets in the market
  • marrying talent and culture 

Alation have started doing quarterly State of Data Culture reports . The latest report is June 2021. Within that report they are sharing a Data Culture Index (DCI) which is a quantitative assessment of how well organisations are positioned to enable data driven decision making. The index they have is based on data search and discovery, data literacy and data governance. I do think that data culture is also about data ethics.












The report states the top initiative to foster data culture is managing data governance and improving that data quality. 

Monday, 28 June 2021

Distinguished Engineer and Research Fellows

I have always been fascinated by the job roles distinguished engineer and research fellows. To me it seems these role transcend research and industry. A distinguished engineer is a title applied to someone who is thought (by those conferring this title) to have achieved noteworthy technical, professional accomplishments while working as an engineer.  A research fellow is an academic research position at a university or research institution that is usually held by academic staff or faculty members. 

I came across a slide show on behaviours and qualities of an IBM Distinguished Engineer. An interesting quote is on the diagram "I want to be distinguished from the rest; to tell the truth, a friend to all mankind is not a friend for me" 





















Distinguished Engineer IBM Fellows are world famous inventors and theorists. A Distinguished Engineer has a unique and fascinating job that transcends many boundaries. A few of the attributes they mention:
 
  • Eminence takes responsibility 
  • Be learned, erudite 
  • Integrity and trust in all things
  • Learn from your mistakes
  • Apply common sense
  • Make decisions
  • Have a point of view
  • Provide hope
  • Inspire others
  • Be collaborative
  • Be optimistic and cheerful
  • Adapt proactively
  • Be curious and fearless
  • Build a track record and keep notes!
  • Know yourself and be true to you
  • Enhance your communication skills and image
  • Listen actively
  • Be a mentor and coach
  • Lead diverse teams
  • Be a member of a professional body 

Tuesday, 22 June 2021

Combining research and industry learning

I am very privileged to have an article about my career published in the ACM journal. Computing enabled me to... obtain a PhD. and a Career in Data.

The DOI reference for my paper is https://doi.org/10.1145/3464919 .

Communications of the ACM Volume 64 Issue 7 pp 7

About the journal

ACM, the world's largest educational and scientific computing society, delivers resources that advance computing as a science and a profession. ACM provides the computing field's premier Digital Library and serves its members and the computing profession with leading-edge publications, conferences, and career resources. They see a world where computing helps solve tomorrow’s problems – where we use our knowledge and skills to advance the profession and make a positive impact.

Research 

Being part of the research world is a huge part of who I am and it is very important to have research and industry working together to help shape the future of data innovation.