Dr Victoria Holt: life, the universe and everything: 2021

Wednesday, 22 December 2021

Open Data Campaign

Microsoft talk about closing the data divide and the need for open data. Thus helping remove barriers to data innovation. The data divide progress report on can be read here

Microsoft launched five data collaboration principles in connection with this.

Open: We will work to make data relevant to important social problems as open as possible, including by contributing open data ourselves.

Usable: We will invest in creating new technologies and tools, governance mechanisms, and policies to make data more usable for everyone

Empowering: We will help organizations generate value from their data according to their choices and develop their AI talent to use data effectively and independently

Secure: We will employ security controls to ensure data collaboration is operationally secure where it is desired.

Private: We will help organizations protect individuals’ privacy in data sharing collaborations that involve personally identifiable information

Sunday, 19 December 2021

Data Toboggan - our year 2021

We have had a very busy first year and enjoyed it all. Thank you to all who made our conferences possible, organisers, speakers, attendees, our logo designer and to those who helped share our event.

We are back 29 January 2022.

Call for Speakers https://bit.ly/DT22CFS

Wednesday, 15 December 2021

Azure Purview Dataset provisioning by data owner for Azure Storage (preview)

Guide: how a data owner can enable access to data stored in Azure Storage from Azure Purview

To enable access policy enforcement for the Azure Storage account the following PowerShell command needs executing in the subscription where the Azure Storage account resides. It relates to all Azure Storage accounts in that subscription.

# Install the Az module

Install-Module -Name Az -Scope CurrentUser -Repository PSGallery -Force

# Login into the subscription

Connect-AzAccount -Subscription <SubscriptionID>

# Register the feature

Note: Only new Storage accounts, created in the subscription after the feature AllowPurviewPolicyEnforcement is registered, will comply with access policies published from Purview.

Wednesday, 8 December 2021

Put Responsible AI into Practice

I attended a digital event, 7 December, where Microsoft launched the Ten Guidelines for Product Leaders to Implement AI Responsibly following their own journey. This is a really useful document and has been collated with diverse perspectives, lived and possessional skills sets. It is where technology meets society and business and research have been working together to enhance the output.

Microsoft shared their path to a responsible AI governance model.

AETHER - AI & Ethics in Engineering & Research
ORA - Office of Responsible AI
RAISE - Responsible AI Strategy in Engineering

The AI guidelines process has 3 stages:

Assess & prepare
Design, build, & document
Validate & support

The report explains the actionable steps

There is a Responsible AI dashboard which is helpful for actionable insights. The responsible AI dashboard includes: Error Analysis Model Statistics, Data Explorer, Aggregate feature importance, What-if counterfactuals, Causal analysis.

There is a Responsible AI Toolbox to get started with

Friday, 3 December 2021

The Chief Data Officer seat at the table

Microsoft have written a white paper called Microsoft Azure: The Chief Data Officer (CDO) Seat at the Cloud Table It is an interesting paper to read.

The Whitepaper looks at four areas

Journey
Framework
Product
Resources

The document mentions other related capabilities that need to be included in a data governance program:

Data Discoverability
Data Quality Management
Data Access Management
Data Compliance
Data Lifecycle Management
Data Health Scorecards

Microsoft’s data governance approach is listed as

Set the scope of data governance for your organization
Set enterprise data governance requirements through policies and standards
Set ownership and accountability for data governance
Start with a unified, metadata-driven vision with automation
Iterate, not big bang
Educate and enable change
Monitor and revise

The Data Management Capabilities Model Framework (DCAM) core capabilities closely align the Microsoft Framework. DCAM establishes the data strategy, position the business case, implement the operating model, ensure funding and supportive organizational collaboration.

The holistic list of capabilities highlighted in the CDMC from the EDM Council is:

Data Cataloguing and Discovery
Data Classification
Data Ownership
Data Security
Data Sovereignty and Cross-Border Data Sharing
Data Quality
Data Lifecycle Management
Data Entitlements and Access Tracking
Data Lineage
Data Privacy
Trusted Source Management and Data Contracts
Ethical Use and Purpose
Master Data Management

Monday, 29 November 2021

PASS Summit Key note Unified Data Governance with Azure Purview

Raghu Ramakrishnan CTO Data, Technical Fellow, Microsoft spoke at PASS Community Summit in November and explained the next part of the vision, policy, for data governance. Microsoft are seeing data governance as the emerging data pillar. Operational databases, unified analytics platform, and unified automated data governance. The unified part is the important element going forward, a unified single pane to extend governance across the entire data estate. Automated data classification to remove the PII headache of missing personal data and pushing the control up the stack to knowledge workers. Microsoft intend to have dynamic data providence that is fully integrated with the 6 responsible AI principles. Azure Purview will operate a Central RBAC control and is the governing permission future state for SQL Server with full propagation. With AI integrated the policy feature will be human readable. The link to watch the session .

Data governance is increasingly interdisciplinary and the discovery of data core to a business. Questions often asked: what data do we have? where did the data originate? can I trust the data?

Compliance is an area which had been a major area of data governance. Questions often aske here:

what’s my exposure to risk? is my data usage compliant? how do I control access and use the data? what is required by regulation X?

Raghu talked about the data governance journey through the lens of GDPR Compliance. There approach was to create a 'Data Map' of all data across Microsoft and use that map to support GDPR compliance. The data discovery looking at search and discover, information supply chain, steward/curators and business glossary. Then looking at the data use governance and policy author/manage, reporting, access and governance enforcement and industry compliance. These two areas were built on intelligent data inventory - built on a data map with automated structure & lineage collection, automated & custom classification and publication / subscription APIs.

Purview data catalog is a self service tool filled with details from knowledge workers. Areas include:

self-service search and browse
curated and standardized business glossaries
interactive lineage visualization
simplified data curation and stewardship

The data estate insights currently show these

data asset distribution
business glossary
data classification and labelling
data location and movement (in progress)

The Microsoft vision is: data in the Microsoft cloud is always governed and beyond Azure, Purview offers a single pane to extend governance across the entire data estate.

Still looking through the lens of GDPR Compliance data classification is an important feature

Dynamic lineage deep dive

He talked about the increased efficiently of extracting of dynamic SQL provenance and the 6 responsible AI areas of fairness, inclusiveness, reliability and safety, transparency, privacy and security and accountability. Talking about responsible AI and provenance with machine learning (ML) training and audit with the provenance of ML models as a requirement. There are a number of challenges address to enable this.

Centralized data access control

Proactive governance controls look at things like

Policy enforcement inside data services - access control was explained (in the early stages so may change)

In the future there was mention of an ABAC Policy language ABAC = RBAC + Conditions. A human readable policy language for business users like data officers or data owners. A policy statement can be represented as a tuple of {Effect, Action, Data resource, Subject, Condition}. The propagation of Purview polices to data repositories is asynchronous in design with Purview as the single source of the truth. SQL pull updates asynchronously, and updates are thus not immediately visible locally like AAD logins.

The summary of the presentation was Purview is creating a new data pillar of unified governance across the entire data state. It is deeply integrated with SQL Server, extending its governance capabilities significantly.

Sunday, 28 November 2021

Data Toboggan 2022

We are pleased to announce that Data Toboggan 2022 is back on 29 January 2022. 7:45 AM to 7:59 PM GMT

Call for Speakers https://bit.ly/DT22CFS

Details

Join us for our THIRD all-day event specializing in Azure Synapse Analytics !

Azure Synapse Analytics is a practically limitless analytics service that brings together data integration, enterprise data warehousing and big data analytics. Let's spend a day exploring and showcasing these capabilities.

It is this analytical power that will help enable any organisation to transform from being reactive into being truly proactive, generating actionable insights that enable both business flow and timely decision-support.

This is a virtual event and free to attend.

As it's our third event, there will be 3 session types:

Standard Sessions : 45 minutes long.
Live Short Talks : 5 to 10 minutes.
Recorded Short Talks : 5 to 10 minutes.

We'll have most awesome content we can find, with a wide range of speakers and experience. Check out our CFS page (https://sessionize.com/data-toboggan-2022/) if you're thinking of submitting !

Friday, 26 November 2021

Event Synopsis for Azure World Synapse Day

We were really excited to have run a new type of event, Azure World Synapse Day. The event crossed 3 time zones APAC, EMEA and AMER.

The aim of the unconference was to have a lighter format that allowed people to share their personal stories, to share things about Azure Synapse technology, provide demos etc. Our interpretation of the unconference was

Our session planning run list looked like

Followed by an intermission

We had fun, we learnt a lot about running this type of event and wanted to thank our amazing speakers and attendees for sharing the event with us.

We will be running the same type of event next year. We learnt a lot about this type of event. It will be coming further under the Data Toboggan brand as Data Toboggan - Alpine Coaster. So coasting through those shorter sessions.

Tuesday, 16 November 2021

T-SQL Tuesday #144 – Data Governance reimagination - Wrap up

This month’s T-SQL Tuesday attracted some great responses! Thank you to everyone who participated!

My invitation for this month’s #tsql2sday was 3 fold on sharing your experiences on data governance

The current cost of data governance versus its benefits
The amazing things data governance has enabled you to achieve or will enable you to achieve in the future
The potential uses for Azure Purview within your estates and the automated deployment options for that

Rob Farley published a post in reply

http://blogs.lobsterpot.com.au/2021/11/09/being-sure-of-your-data/

Rob raises some key points

But the checks that we do are more about things that the database can allow, but are business scenarios that should never happen.
You need to discover which situations cause people not to trust the data.
Data quality can lead to the trust, but only when it has been demonstrated repeatedly over time. Trust must be earned

Deborah Melkin published a post in reply

https://debthedba.wordpress.com/2021/11/09/t-sql-tuesday-144-data-governance/

Deborah Melkin talks about the switch to implement data governance.

It is about understanding your data from both the micro and macro level
It’s understanding where our data lives (data assets) and how data flows through data sources (data lineage) as well as how it’s consumed and used (data catalogs and data profiling). More importantly, this is knowledge that can be shared to make data even more valuable.
When you start expanding the number of databases and the complexity of how your systems work, the job of governance becomes a lot harder
Getting started with data governance seems like a very daunting task.

Data Governance is a broad topic with many different areas which can be be seen from the replies. There is plenty for us to get started with and I'm looking forward to using Azure Purview to help with this.

Thank you for taking the time to post insightful posts. That is the wrap up. If I’ve missed anyone please let me know and I’ll update the post.

Thursday, 11 November 2021

Drive a data culture to power a new class of data first applications

The PASS Data Community summit session keynote contained a section presented by Arun Ulag, Corporate Vice President of the Intelligence Platform at Microsoft.

From data to intelligence for everyone and for every decision at any scale. He talked about data integration, analytics and business intelligence. The 3 messages were:

Empower every individual with AI capabilities such as the automatic report insights in Power BI with descriptive and diagnostic insights and insights on the move.

Empower every team with Power BI and Teams, your data is where you chat. Power BI have goals that include being driven by data, built for teams, AI powered and automated action.

The third message was empower every organisation with a complete analytics fabric, Power BI + Synapse.

The public preview of Hybrid Tables automatic aggregations was announced..

Observation data is the fasted growing data segment with 175ZB of data expected by 2025 and 50 billion connected devices by 2030.

Also Azure Synapse Data Explorer was announced as now in public preview.

Wednesday, 10 November 2021

Bridge to a new universe: the end-to-end Azure Data Platform

Exciting to watch the Day 1 Keynote a Bridge to a new universe: the end-to-end Azure Data Platform delivered by Rohan Kumar and many other people.

A journey to a new universe is just waiting to inspire innovation, to tap into limitless possibilities and potential. It covered how to shape your data so you can harness its power to find a new galaxy of insights, answers, and predictions. Some amazing slides and discussion to set you on a new path.

Three universes bridged together unmatched analytics and insights, limitless cloud data services and unified data governance.

Rohan shared a great quote "If you want to go fast, go alone. If you want to go far, go together. " The SQL Server community has always worked together to achieve some amazing goals.

Three main data communities were discussed SQLSaturday, Data Saturdays and the Azure Data Community

SQL Server 2022 preview brings with it many new features.

There are to be two interlinking services Azure Synapse Link and Azure Purview.

More details were discussed in the keynote but I will share those separately. As ever an inspiring future for us in the data community.

Read More

Microsoft Ignite book of news

Announcing SQL Server 2022 preview: Azure-enabled with continued performance and security innovation

Data Toboggan Azure World Synapse Day: Speakers

Data Toboggan have an amazing line up of speakers. We would love you to join us and support our amazing speakers who have given up their time to speak.

Speakers

Lakehouse in a nutshell: Serverless SQL pool + Aggs + PowerBI - Armando Lacerda

DW Automation for EDW in Synapse - Demo Only - Bob Duffy

Manage Packages on Synapse Spark - Dustin Vannoy

Migrating a Data Warehouse to Synapse Analytics - Andy Cutler

Patterns with Synapse Notebooks - Damien O'Connor

From Housekeeping to Data Engineer - My journey to find my passion -Jean Joseph

Secrets of SQL Dedicated Pool - Dennes Torres

Spreading the word about Azure Synapse Analytics - Sidney Cirqueira

Synapse and Power BI - Intro to a great data mix - Gaston Cruz

dbt & Synapse: have you seen SQL do this before? - Anders Swanson

Distributed Data in Dedicated SQL Pool - Rob Farley

Tuesday, 9 November 2021

Data Toboggan extravaganza

There nothing so exciting as a surprise. Data Toboggan is trying something different. Take an international journey with us through 3 time zones:

APAC - 08:00 - 09.00 GMT;

EMEA - 12:00 - 13.20 GMT;

AMER - 17:00 - 18.20 GMT

Bit size sessions to share how Azure Synapse Analytics inspired you, empowers you, and how it accelerates your business analytics. Register now: https://bit.ly/RegisterDTWSD21

The Session Titles

The Abstracts

The Speakers

Jean Joseph, Dennes Torres, Anders Swanson, Rob Farley, Bob Duffy, Andy Cutler, Damien O'Connor, Sidney Cirqueira, Armando Lacerda, Gaston Cruz, Dustin Vannoy

Tuesday, 2 November 2021

Ignite Innovate Anywhere From Multicloud to Edge

Innovate Anywhere From Multi-cloud to Edge with Scott Guthrie shares a raft of technical updates. The themes

Hybrid and multicloud
End-to-end data platform
Cloud native development
Developer velocity

Hybrid and multicloud updates announced

Deeper integration with VMware vSphere and Azure Stack HCI
Azure Virtual Desktop on Azure Stack HCI
Azure Arc enabled data services updates
Extension of Microsoft Defender to AWS

A data platform with end to end capability

SQL Server 2022 was announced

Azure Synapse Analytics announcements

Azure Data Explorer
Synapse Link SQL Server 2022
Synapse Link Dataverse (GA)

Amazing to see Azure Purview built into the new SQL Server platform.

Great to see Azure Purview and data governance mentioned everywhere.

Microsoft Ignite November 2021

Microsoft Ignite was opened by Satya Nadella on 2 November 2021. An inspiring session.

The headline for the opening was, our economy and society is undergoing a sea change of digitization. Satya talked about emerging technology trends and innovations across the Microsoft Cloud that will transform every business and industry going forward.

We are a moment of real structural change. The case for digital transformation has been never so urgent. What will happen and what we need to do to support our business is core with the transition of mobile to a cloud era to ubiquitous computing and ambient intelligence.

There are four key trends that he mentioned

Hybrid work - when and where we work
The trend for a hyper connected business with omnichannel reach with freely flowing data and intelligence
Every business is a digital business - multi cloud multi edge
The need to protect everything end to end, with security being the biggest risk

There is also a need for business to meet sustainability goals and track our own carbon footprint.

Microsoft Loop was announced a new collaborative canvas.

There were some other great transformational announcements.

Azure Arc-enabled machine learning – inferencing: Customers can now build, train and deploy machine learning models in on-premises, multi-cloud and edge computing environments

There was a new Cognitive Service 'Azure OpenAI Service' announced . Azure OpenAI Service is a new Azure Cognitive Service that provides customers access to OpenAI’s GPT-3 models with enterprise capabilities such as security, compliance and scale. He also talked about the breakthroughs in natural language processing .

Teams Connect is the centre place for future collaboration.

A new platform layer, the metaverse, that brings together many tools. The Metaverse solutions

Mesh for Microsoft Teams was announced. An immersive experience in Microsoft Teams is using Mesh. Mesh for Microsoft Teams will enable new experiences with personalized avatars and immersive spaces where users can connect with presence and have shared immersive experiences.

Read the Microsoft Ignite Book of News: http://aka.ms/ignite-book-of-news for exciting news and updates. A major theme is about inclusivity and accessibility.

Dr Victoria Holt: life, the universe and everything

Welcome