Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Monday, 18 November 2024

Data Toboggan Winter Edition 2025

We are excited to announce that the winter edition, our main Data Toboggan conference,  has Call for Speakers open. This will be our 5th year so will be really exciting for us to reach that milestone.
This is our multi-track 12 hours conference to cover a many time zones.  Help us make this a really special event. 

Event Date: Saturday 25 January 2025  

If you  want to watch recordings from our summer conference or others,  Data Toboggan Cool Runnings 2024 now has 68 videos on You Tube




 

Thursday, 7 November 2024

PASS 24 Data Community Summit

This week it is PASS 24 Data Community Summit







The Microsoft Keynote was entitled, Fuel AI innovation with Azure Databases.

The session abstract: Data is the fuel for innovation, driving the development of transformative AI applications. In this keynote, join Shireesh and the Microsoft engineering team as they explore harnessing the power of data across the Azure databases portfolio to drive ground breaking advancements. Take your first step on the journey with vector search, multi-agent apps and more and learn to unlock new patterns like retrieval augmented generation (RAG).  Explore the latest database innovations in SQL Server, Azure SQL, Azure Cosmos DB, Azure Database for PostgreSQL and Azure Database for MySQL that enhance operational efficiency, deliver personalized user experiences, and revolutionize the way we interact with technology. 

There were some great announcements 

Announcing

  • SSMS 21 and Copilot in SSMS
  • Azure SQL Database Hyperscale enhancements (Storage up to 128TB (GA) Log throughput rate up to 150 MB per second (Preview) , Continuous priming (Preview))

Preview

  • Next gen general purpose on Azure SQL Managed Instance (32TB of storage, 500 DBs, Lower storage latency, improved storage importance, customisable I/O performance)
  • Migration Assessment from SQL Server enabled by Azure Arc
  • Mirroring in Fabric for Azure SQL Database and Azure Cosmos DB
  • In-database embedding generation
  • Vector Indexing and search in Azure Cosmos DB for NoSQL
  • Native vector type and functions in Azure SQL Database (Available now)
  • DiskANN indexing in Azure Database for PostgreSQL

Generally available

  • Bi-directional disaster recovery with link feature in Azure SQL Managed Instance
  • Flat and quantized flat vector indexes in Azure Cosmos DB for NoSQL
  • Serverless auto-pause delay to 15 min
  • Azure AI extension
  • Hyperscale elastic pools
  • Azure Cosmos DB for MongoDB enhancements

Tuesday, 5 November 2024

Data Toboggan Alpine Coaster 2024

Alpine Coaster is our shorter 3 time zone end of year unconference event. You know we like to mix things up with this version of Data Toboggan, so this time:

Sessions are around 20 minutes long.

There are three session editions to join. APAC 7:30 8:40 | EMEA 12:30 - 1:40 | AMER 5:30 - 6:10 GMT, join one or all

Please note, there will be no recordings taken on the day. We feel that kinda takes the pressure off a little for this short session more interactive type of event.

Register now for the Alpine Coaster event: Friday 8 November 2024

APAC Edition - https://www.meetup.com/data-toboggan/events/304285946/

EMEA Edition - https://www.meetup.com/data-toboggan/events/304285953/

AMER Edition - https://www.meetup.com/data-toboggan/events/304285956/





Wednesday, 30 October 2024

Microsoft Purview Data Catalog billing needs consent

There are changes to Microsoft Purview Data Catalog billing beyond 1 November 2024. 

NOTE: The new pricing model for Microsoft Purview Data Governance now goes into effect on January 6, 2025.

The article on Microsoft Purview Data Catalog billing consent explains the new pricing model for Microsoft Purview Data Governance. Users who have been using the data catalog for free will need to consent to the new pay-as-you-go billing model to continue using the service beyond this date. The consent process involves accepting the terms and conditions through a pop-up in the Microsoft Purview portal.

If the necessary consent is not provided by the Entra Global Admin, data governance concepts created in the unified portal will be retained for 120 days. The roles required to provide consent include the Microsoft Entra Global administrator or the Microsoft Entra Compliance administrator role, along with the DataMap curator role.


Read more about it here

https://learn.microsoft.com/en-us/purview/purview-portal-billing-consent

Monday, 21 October 2024

Microsoft Purview Data Loss Prevention policies have been extended to Fabric lakehouses

Microsoft Purview's Data Loss Prevention (DLP) policies have been extended to Fabric lakehouses and are now in public preview. This extension follows the success of DLP for Power BI and the general availability of Microsoft Fabric last year1. DLP policies help automatically detect sensitive information as it is uploaded into lakehouses and take risk remediation actions to comply with regulations like GDPR and HIPAA. Compliance and Security administrators will receive audit logs for every detection and can set up alerts for sensitive information found in their Fabric lakehouses. There will be no charge for lakehouses scanned by DLP policies during the preview period.

You can read more here



Microsoft AI Tour London 2024

The Microsoft AI Tour in London on 21 October 2024 talked about solving the biggest challenges and uncovering new opportunities to empower your organization. The Opening Keynote was with Clare Barclay, Satya Nadella and Jared Spataro.  There were discussions on building an agentic world where  personal agents, organisational agents, business process agents and cross organisation agents enable the use of AI in business to assist you with work. An example of this is Pages for AI artifacts that enables you to work with the organisation and AI . The 3 areas of innovations

  • Copilot
  • Copilot & AI Stack
  • Copilot devices

These innovative areas will increase value and reduce waste and Copilot Studio for autonomous agents was announced. 

A very important announcement, for us in the UK, is that Microsoft and UK Government have signed a five-year agreement to support the new era of digital transformation. So a very exciting time for the UK to grow.

There is gravity with data. AI needs to bring all the data together in one platform like Microsoft Fabric, to simplify data and have App servers, AI apps and tools. Microsoft showed its unified data offering in IaaS, PaaS and SaaS to unify the data estate, at what ever stage a business is at, for databases and analytics. 



Trust is the most important thing in #AI. You must have a set of principles considering Security, Privacy, Safety at the core



AI is redefining Britain. It can take longer to realise value as change and adoption management takes time. It is cheaper to innovate than before so we need to think big, think bold and be ambitious 

Satya finished with 4 important points to note: expand opportunity, earn trust, protect fundamental rights and advance sustainability.

The announcement today can be read here

Autonomous agent capabilities across Copilot Studio and Dynamics 365 to help scale the impact of every individual, team, and business function

Monday, 30 September 2024

What's new for Data Toboggan

There are plenty of things coming from Data Toboggan.

Videos from the last conference are being published on YouTube 

Alpine Coaster call for speakers is open https://sessionize.com/data-toboggan-alpine-coaster-2024/

The Winter conference date to add to your diaries: 25 January 2025.



Wednesday, 25 September 2024

European Microsoft Fabric Community Conference Keynote

The Microsoft Fabric Community Conference is taking place this week in Stockholm, Sweden. There are 3300 attendees which is amazing.  The Keynote was entitled Microsoft Fabric Vision & Roadmap – Analytics in the Era of AI.  The keynote speakers; Arun Ulag, Kim Manis, Amir Netz, Marco Casalaina, Wangui McKelvey and Rudra Mitra from Microsoft.

The announcements blog contains a vast number of innovative enhancements and capability updates. The blog is: European Fabric Community Conference 2024: Building an AI-powered data platform

Fabric is about providing a unified data platform for AI transformation with these 3 areas

  • AI-powered development: Fabric can give teams the AI-powered tools needed for any data project in a pre-integrated and optimized SaaS environment.
  • An AI-powered data estate: Fabric can help you access your entire multi-cloud data estate from a single, open data lake, work from the same copy of data across analytics engines, and use that data to power AI innovation 
  • AI-powered insights: Fabric can empower everyone to better understand their data with AI-powered visuals and Q&A experiences embedded in the Microsoft 365 apps they use every day. 

Microsoft Fabric is shown in the AI Copilot stack.

There is a new exam available to enhance skilling in data engineering DP-700 This Certification measures Fabric data engineer skills:

  • Implementing and managing an analytics solution
  • Ingesting and transforming data
  • Monitoring and optimizing an analytics solution

Bringing all the skills together for data engineering, data science, and data analytics. You can read more here

 There  are numerous technical announcements




You can see the list of GA features shipped for network security, data security and governance. There is an End-to-End Security Whitepaper to the read aka.ms/FabricSecurityWhitepaper 

The public preview of the new copy job item provides
  • Built-in support for copying data between workspaces
  • Streamline copying data into and across OneLake
  • Setup automatic incremental copy

To help with deployment, the public preview was announced of the Terraform Provider for Microsoft Fabric. This will help empower deployment, management, and governance. This Microsoft Fabric Infrastructure as Code (IaC) new Terraform Provider lets you use the same tools and processes already employed for Microsoft Azure and other cloud resources.

Access Azure Databricks Unity Catalog tables directly from Fabric via the new Mirrored Azure Databricks Catalog feature, now in Public Preview https://blog.fabric.microsoft.com/en-US/blog/databricks-unity-catalog-tables-available-in-microsoft-fabric/




All the in dept announcements are on the Fabric Blog

The unified data culture was mentioned between Fabric and Office connected through Power BI. 


There are lots more releases out there. This is such an exciting time in the Data and AI space. The exciting announcement is that the conference will be back next year. 


Microsoft Purview at the Fabric Conference

There are a  number of  Microsoft Purview sessions at the European Microsoft Fabric Conference this week in Sweden. 

Check out the schedule 



Good Morning European Microsoft Fabric Community Conference

 Good Morning for day 2 of the European Microsoft Fabric Community Conference



Tuesday, 10 September 2024

Microsoft Security Community

Microsoft Purview helps in the world against cyber attacks. There is a security community that you can join for updates.  This blog lists some of the useful upcoming webinars in the Microsoft Purview space. 



SEP 18 2024 Microsoft Purview | Microsoft Purview eDiscovery Modern UX

Discussing the new Purview eDiscovery UI, walk through the new features and discuss how the new UI has been optimized and how the new enhanced features that have been added to help simplify and boost user experience.

OCT 15 2024 Microsoft Purview | Navigating the Future of AI: Understanding the EU AI Act and Its Implications

The European Union Artificial Intelligence Act (EU AI Act) is a pioneering regulatory framework designed to oversee the development, deployment, and use of artificial intelligence (AI) technologies across the EU. This landmark legislation aims to ensure AI systems are safe, ethical, and transparent, prioritizing the protection of fundamental rights and user safety.

OCT 16 2024 Microsoft Purview | Introduction to Data Lifecycle Management (DLM) | Compliance Manager | Part 1

Data Lifecycle Management (DLM) is a comprehensive approach to managing data throughout its lifecycle, from creation and storage to archiving and deletion. DLM in Compliance Manager involves a structured approach to managing data through its entire lifecycle within the framework of regulatory and organizational compliance requirements.

Wednesday, 4 September 2024

Microsoft Purview Data Governance GA Rollout

The new Microsoft  Purview solution is currently being rolled out.  The details on when the region will be available is here: New Microsoft Purview Data Catalog deployment regions (Preview) | Microsoft Learn

This new updated tool is critical to make sure data is AI ready across the business. There is a good immersive tour of the Purview Data Governance solution: https://purviewdatagovernance.storylane.io/share/0fu0nmhvphk4





Monday, 2 September 2024

Microsoft Purview: Data Governance Solution is General Availability

The new Microsoft Purview: Data Governance Solution became General Availability 1 September.

This release brings with it a several new capabilities that have added since the public preview launched in April 2024.  This new release focuses on the practice of responsibly federated data governance in built in the application.  The release enables

  • modelling and use of durable business concepts to govern evolving and multi-cloud physical data and AI estate
  • scale the management of data quality and health in the data estate with the practice of federated data governance, with intelligent automation, and without displacing the people interaction 
  • confidently activate responsible discover, understand, and access data in the organization-wide teams and enable users help innovate.


Wednesday, 7 August 2024

Microsoft Fabric Known Issues

There is a Power BI report that contains the list of known feature issues for the various areas of Microsoft Fabric. The report can be found here.



Tuesday, 6 August 2024

The AI Hub in Microsoft Purview

The Microsoft Purview AI hub is a preview feature that helps you govern and gain insight to generative AI apps and helps protect through mitigating and managing risks. With the explosion of generative AI this has become a crucial addition in the governance world.

The AI Hub helps with

  • strengthening information protection for Copilot
  • supporting compliance management for Copilot
  • fortifying data security for AI
  • insights and analytics that enable that view on AI activity
  • policies to help protect and prevent data loss
  • compliance for optimal data handling

The use of Purview AI hub works well with other capabilities to improve data security.

  • Sensitivity labels and content encrypted by Microsoft Purview Information Protection
  • Data classification
  • Customer Key
  • Communication compliance
  • Auditing
  • Content search
  • eDiscovery
  • Retention and deletion
  • Customer Lockbox
The insights, policies and control secures data from Microsoft Copilot for Microsoft 365 and AI apps from third-party large language modules (LLMs).







Thursday, 1 August 2024

The changing new Microsoft Purview Portal

Microsoft unveiled the new Purview portal, a game-changer for organizations navigating the complex landscape of data security and risk management. This change is ongoing within Microsoft Purview with the relocation of portal features and retirement of some features. The new Microsoft Portal features changes


The portal's design facilitates easy navigation with the solution cards for quick access to the solutions within  Risk & Compliance, Data Governance, and Data Security sections.

The key features areas are now

Data Security: covers discovery and protecting sensitive information across your organization with a robust set of data security solutions, including Data Loss Prevention and Information Protection and adaptive protection .

Data Governance: It offers unified data governance solutions to manage services across your on-premises, multi cloud, and SaaS environments. This includes creating an up-to-date map of your data estate with classification, data catalog, data quality (preview), lineage, data management, and data estate insights.

Risk and Compliance: The platform includes solutions to help minimize compliance risks and meet regulatory requirements, ensuring your organization stays ahead of the curve in data protection standards with eDiscovery and audit, communication compliance, data lifecycle management, and records management.


Wednesday, 17 July 2024

The Growth of Microsoft Purview

It has been an interesting few years watching the growth of a data governance solution into a product that truly helps govern business data. The growth of  Purview has seen many changes over the last few years, to its current incarnation that becomes GA in September. The addition of in-product AI, the Responsible AI tooling, coming in the AI Hub, adds a new dimension, which will enable the next evolution for data governance.  The integration of Microsoft Fabric and Microsoft Purview will help analytics solutions have better quality data out of the box.


hashtagMicrosoftPurview


Microsoft Purview Data Governance generally available

The reimagined Microsoft Purview Data Governance will be generally available 1 September 2024. This is really exciting new as the reimagine portal has so much amazing capability to help business data be better, more secure and help drive business growth.

The convergence of cyberattacks, increasing regulations, an ever-expanding data estate, and business demand for insights  pressurizes business leaders to adopt a unified strategy to confidently ensure AI readiness. We see that unification of data security and governance capabilities to create a modern data governance solution that is easy for businesses to adopt democratising data and addresses the challenges of AI.

The most important feature to understand is that Data Governance require the business , people, processes as well as a technological tool to automate data understanding. It is built into Microsoft Fabric with the capability of creating custom reports from Fabric data with the data quality feature supporting any Fabric source.

As I shared before these are the collection of tools.



Sunday, 14 July 2024

That's a Wrap Data Toboggan Cool Runnings 2024

Data Toboggan Cool Runnings 2024 ran 13 July 2024. We ran our largest event error since inception in 2020. We were so pleased to be able to have 3 full tracks of sessions from speakers all round the world. The event ran for 12 hours, starting with presenters from Australia, through India, Europe, Scandinavia, UK and finishing in the USA. It was our largest event with over 400 attendees. It is our principle to never close registration until after event finishes, so the event is open.to register and join all day. It is the people who make the event, the speakers, the attendees, and helpers. The organising team had loads of fun putting the event together.

We hold 3 events a year.

  • Data Toboggan - Winter Edition (full conference | end of Jan)
  • Data Toboggan - Cool Runnings (full conference | beginning July)
  • Data Toboggan - Alpine Coaster  (unconference real world short talks | early November)

We also have a user group that meets from time to time when there are interesting areas to cover called Data Toboggan - Slide Preparation. 

The virtual event started covering Azure Synapse Analytics when it was launched and with the introduction of Microsoft Fabric we have grown with technology innovation and now cover Microsoft Fabric, Synapse, Data Engineering, Data warehousing, and Analysis. We have a track with Microsoft Purview and AI connected elements as well. 

We look forward to seeing you all at the next event.



Thursday, 11 July 2024

Seven time Microsoft MVP

I am pleased to share that I was renewed as a Microsoft Data Platform MVP yesterday. 

I am overjoyed and humbled to be a part of this amazing program for another year, my 7th year. It is such a privilege to be able to share knowledge and help others in the community grow. I am always excited about the amazing technology we have at our finger tips, such as Microsoft Fabric and Microsoft Purview. I am looking forward to another year contributing to and growing, the amazing data community.  Thank you Microsoft.

Award Category

Data Platform

Technology Area

Microsoft Purview, Azure Synapse Analytics


Tuesday, 9 July 2024

Data Toboggan Cool Runnings tracks

We have 3 full tracks at our next event on Saturday. 

Event Date: 13th July 2024 

Register Now: https://bit.ly/DTCR2024-Register

Agenda: https://bit.ly/DTCR2024-Agenda

This will be our biggest event yet.  The track names all have a story behind them.

Bramburg track



Pradaschier track



Rigi Kaltbad track



Saturday, 29 June 2024

Data Toboggan Cool Runnings 2024 - Microsoft Purview and Responsible AI

I am excited to announce that I will be speaking at Data Toboggan Cool Runnings on Saturday 13 July 2024.  

Register Now: https://bit.ly/DTCR2024-Register

Agenda: https://bit.ly/DTCR2024-Agenda


For those of you who don't know Data Toboggan is conference covering Microsoft Fabric, Synapse, Data Engineering, Data warehousing, and Analysis.  Microsoft Purview is an integral part of Microsoft Fabric and the data sets the scene for AI and generative AI to proceed. This session covers where we are with legislation and tools, why good data quality is a must for AI, how responsible AI fits and how to get started. 

Data Governance and Responsible AI, and the embellishment of AI within Microsoft Purview aid and prepare business for using AI. Moving forward I believe that combining the use of both Data Governance and Responsible AI into one actionable framework that  it will bring immediate rewards to every business use case.

Hope you can join me on 13 July 2024  8:00 - 20:00 UK Time online

The Growth of Microsoft Purview

What is Microsoft Purview is a question I often get asked. The answer is never what people expect. Over the last couple of years the Microsoft Purview solution has been growing in capability with many different applications being brought together under one umbrella term, Microsoft Purview. It is now described as having three areas.

  • Data security for information and cybersecurity teams 
  • Data governance for data consumers data engineers and data officers
  • Risk and compliance for risk compliance and legal teams

When we talk about Microsoft Purview it helps to know what the business problem is to identify which areas of the product are required.

I put together a diagram to help map out all the applications to date. In the diagram  it is clear to see the extent of the applications that Microsoft have brought together and added over the last year, to the suite of tools. The documentation refers to the 3 high level areas Risk and Compliance (shown in blue), Data Governance (shown in purple) and Security (shown in orange and green).  I have depicted the AI hub in green rather than orange, because it covers a different conceptual area of Responsible AI and that compliance protection for generative AI apps.

When I talk about data governance I am looking at apps such as data state health, roles and responsibilities for the data estate, the data catalogue, classification, lineage, all the new business domain areas such as a business glossary and the really new data quality set of tools to control and monitor quality and the health of the data. 

Thus, the new reimaged Microsoft Purview experience provides a holistic view of your data and enables better automated data management. 



Sunday, 23 June 2024

Data Toboggan Cool Runnings

We are excited to have our next event coming up on 13th July 2024. This will be the biggest event yet with 3 full tracks of sessions and we are grateful to all the speakers who give up there time to help the community learn and grow.  The event will have all the latest and greatest technical discissions.  We hope you can join us for a fun packed day of learning and community networking.

Register Now: https://bit.ly/DTCR2024-Register

Agenda: https://bit.ly/DTCR2024-Agenda





Wednesday, 22 May 2024

Microsoft Build Fabric: What's new and what's next

The Microsoft Fabric announcements were covered by Amir Netz, Arun Ulagaratchagan, Flavien Daussy, Adam Penhaul. The session is recorded and can be seen here Microsoft Fabric: What's new and what's next.

I live blogged this great main data session at Microsoft Build.

AI is changing the world. AI revolution is based on Data. Data is the fuel that powers AI. It is hard because of the amount of innovation and lots of diversity and complexity.




Purpose built workloads. AI is built into Fabric. Governance is particular important and built in and driven through Microsoft Purview.

Aka.ms/try-fabric



There are weekly Fabric released with 60-80 pages of blogs . The roadmap for these features can be found at

Aka.ms/FabricRoadmap


What is the point of having data in the lake if no one is using it. It is a bout immediate business access to the data

A SaaS product that looks like Office.  No knobs to optimise Fabric. Results in hours.

  • Starts with built-in CI/CD
  • Creating deployment pipelines
  • And Taskflows (public Preview) to provide help to create things like the medallion architecture.


In Fabric you can now bring in partner workloads such as MDM and ESRI. It was announced Microsoft Fabric Workload Development kit as Public Preview.


Al your data, all your teams in one place. You can publish to workload hub for a native fabric workload experience. Aka.ms/FabDevKit

There are multiple methods to get data into Fabric for multi-clouds. Shortcuts to On-Premises Sources for OneLake was announced as Public Preview.


Not everything stored in open formats like databases, so Mirroring helps with this. There is Free Mirroring storage for Replicas. 


Delta format is not the only open format. Iceberg is another major storage function.  There is transparent simultaneous support of Delta Lake and Iceberg formats just announced. It is now possible to also connect to Salesforce and not move the data.  Also now an expanded partnership with snowflake and Adobe.

To have unified API with the public preview of the developer friendly API for GraphQL to all data in OneLake. (GraphQL uses JSON structures).

Unified data culture requires real time data. Microsoft announced Real-Time Intelligence. It uses the the Real Time hub powered by AI for data in motion. (OneLake data hub is for date at Rest)


So Real-Time Intelligence in the real world.

Copilot is integrated in every Microsoft Fabric Experience. Copilot in Fabric is now Generally Available.  This means AI driven insights drive insights out of the box and with custom generative AI for your data. 


Announcing Public Preview of AI Skills in Fabric.  It allows you to build your own Generative AI in Fabric

Simple to get started

  • Create AI Skill
  • Add data – ground in data
  • Select tables to ground the data

Query in natural language

In conclusion come and Join the Microsoft Fabric Team in Stockholm, Sweden 24-27 September 2024

Aka.ms/FabCon-Europe



Tuesday, 7 May 2024

Responsible AI Transparency Report

Microsoft have shared how they work with AI responsible in this paper  Responsible AI Transparency Report How we build, support our customers, and grow.  The report outlines Microsoft’s approach to building generative AI applications responsibly, adhering to six core values of transparency, accountability, fairness, inclusiveness, reliability and safety, and privacy and security.  The framework is all based around the govern, map, measure and manage cycle.  

Govern 

Establishes the context for AI risk management, including adherence to policies and pre-deployment reviews.

  • Policies and principles
  • Procedures for pre-trained models
  • Stakeholder coordination
  • Documentation
  • Pre-deployment reviews

Map 

Involves identifying and prioritizing AI risks and conducting impact assessments to inform decisions.

  • Responsible AI Impact Assessments
  • Privacy and security review
  • Red teaming

Measure

Implements procedures to assess AI risks and the effectiveness of mitigations through established metrics.

  • Metrics for identified risks
  • Mitigations performance testing

Manage

Focuses on mitigating identified risks at both the platform and application levels, with ongoing monitoring and user feedback.

  • User agency
  • Transparency
  • Human review and oversight
  • Managing content risks
  • Ongoing monitoring
  • Defense in depth

These are all depicted in the diagram in the paper which is a very informative read.



References

Responsible AI Transparency Report How we build, support our customers, and grow

https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RW1l5BO

Thursday, 2 May 2024

Responsible AI – A Data Governance Approach

I am speaking at the Bath Azure User Group meeting about Responsible AI - a Data Governance approach. I see Responsible AI a subset of Data Governance. This session covers where we are with legislation and tools, why good data quality is a must for AI and how to get started. 

Data Governance and Responsible AI, and the embellishment of AI within Microsoft Purview aid and prepare business for using AI. Moving forward I believe that combining the use of both Data Governance and Responsible AI into one actionable framework that  it will bring immediate rewards to every business use case.

Hope you can join us join us 22 May 2024 18-20 in Bath

https://lnkd.in/eRT8RijE 



Monday, 29 April 2024

Open Lakes



This is an insightful article entitled Open Lakes, Not Walled Gardens by Raghu Ramakrishnan and Josh Caplan.  

The Fabric design principles consider the 

Open Ecosystem

Ensuring there are no proprietary barriers to data in OneLake, allowing integration with other services.

Security and Governance 

Data in OneLake must be secure and governed, integrating with Microsoft Purview for global policies.

Creating accessible data with no Silos 

Making the entire data estate easily accessible in OneLake without unnecessary data duplication.

SaaS Simplicity

Providing a suite of analytic engines in a secure, governed environment with single sign-on.

The article discusses the concept of open lakes for analytics, emphasizing the need for a unified view of data across an enterprise’s data estate to draw true insights. The advancements in big data tools, cloud storage, machine learning, and AI models, which offer opportunities to analyze core assets and processes through data in the Golden Age of Analytics.

The Microsoft implementation of the open lake vision with OneLake and Fabric focuses on data storage, analytics, sharing, and governance integrated with Microsoft Purview for data estate-wide governance. It outlines the importance of securing and governing enterprise data, detailing how OneLake and Fabric address these needs with built-in features and integration with Microsoft Purview for global data estate governance.

Governance for the organization, estate-level, and policy enforcement and sharing of data is a core tenant. Governance within Fabric and Onelake covers organizational governance, Estate-Level Governance where Microsoft Purview provides a global view of the entire data estate, offering a central catalog for all assets across all sources, global policies to secure sensitive data, and support for managing critical data risks and regulatory compliance. Policy Enforcement and Data Sharing are also discussed. 

Thursday, 25 April 2024

Data Governance, Data Ethics and Responsible AI video series

I wanted to be able to share some thoughts on 3 of my favourite topics, Data Governance, Data Ethics and Responsible AI. There are many tools that help frame the subject area, from a data management perspective and there are useful Microsoft Tools to help you down the responsible AI and Governance route. There is a wealth of information available and wanted to, in under 5 mins a video, empower people to quickly have useful tips to move forward in this important space.  So it is an easily digestible series that is time efficient, has standalone content with an overall theme.
  • Data Governance to help govern and manage that data to improve trust and data quality 
  • Data Ethics to help mitigate issues with data integrity and provenance
  • Responsible AI to look a bias, fairness and efficacy in decisions

Episode 1 Introduction

Episode 2 what is data governance

Episode 3 what is data ethics

Episode 4 What is Responsible AI

Episode 5 Responsible AI Tools Microsoft Standard v2

Episode 6 Responsible AI Tools Impact Assessment and guide

Episode 7 Responsible AI Tools HAX Toolkit

Episode 8 Responsible AI Tools Maturity Model

Episode 9 The EU Act

Episode 10 UK Government Assurance

Episode 11 Content Safety

Episode 12 Responsible AI Dashboard

Watch this space as the next set of videos will cover how this fits in with data quality and how Microsoft Purview can help with data preparation.

The Age of Data Governance

Microsoft Purview is rapidly changing in the data governance space.  It is offering Data value creation with essential defense & response offense . This new addition helps business address the issues that the AI outputs are only as good as the quality of the data that resides behind it.

Peter Aiken new definition of data governance ' Managing data decisions with guidance’.  


Suma Manohar has written a great article talking about data quality in the era of AI.  Microsoft purview introduced domain and data products adding that clear business context and terminology mapping.  Enhanced search capability to provide more understanding using Copilot is available. It also can help with suggesting Data Quality rules.  These autogenerated rules are context specific.

Creating data quality rules manually in Purview should follow the 6 standard data quality metrics.

  • Freshness – confirms that all values are up to date.
  • Duplicate rows- checks rows to find repeated values across two or more columns.
  • Empty/blank files – looks for blank and empty fields in a column where there should be values.
  • Unique values – confirms that values in a column are unique.
  • Data type match – confirms that values in a column match data type requirements.
  • String format match – confirms that text values in a column match a specific format or other requirements.
  • Table lookup – confirms that a value in one table can be found in a specific column of another table
  • Custom – create a custom rule with the visual expression builder.
  • Regular expressions can be used for pattern matching in the above.

When working on data quality there are standard guidelines that can help. A method I use is firstly from the DAMA-DMBOK and then the Data Management Capability Assessment Model (DCAM)

Scans take place to show quality score and  trends in the data quality dashboard and scores are shown on the data product page

The rollout of the new solution across the regions is shared here.