Chaos, complexity, curiosity and database systems. A place where research meets industry
Welcome
"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein
Monday, 18 November 2024
Data Toboggan Winter Edition 2025
Thursday, 7 November 2024
PASS 24 Data Community Summit
This week it is PASS 24 Data Community Summit
The Microsoft Keynote was entitled, Fuel AI innovation with Azure Databases.
The session abstract: Data is the fuel for innovation, driving the development of transformative AI applications. In this keynote, join Shireesh and the Microsoft engineering team as they explore harnessing the power of data across the Azure databases portfolio to drive ground breaking advancements. Take your first step on the journey with vector search, multi-agent apps and more and learn to unlock new patterns like retrieval augmented generation (RAG). Explore the latest database innovations in SQL Server, Azure SQL, Azure Cosmos DB, Azure Database for PostgreSQL and Azure Database for MySQL that enhance operational efficiency, deliver personalized user experiences, and revolutionize the way we interact with technology.
There were some great announcements
Announcing
- SSMS 21 and Copilot in SSMS
- Azure SQL Database Hyperscale enhancements (Storage up to 128TB (GA) Log throughput rate up to 150 MB per second (Preview) , Continuous priming (Preview))
Preview
- Next gen general purpose on Azure SQL Managed Instance (32TB of storage, 500 DBs, Lower storage latency, improved storage importance, customisable I/O performance)
- Migration Assessment from SQL Server enabled by Azure Arc
- Mirroring in Fabric for Azure SQL Database and Azure Cosmos DB
- In-database embedding generation
- Vector Indexing and search in Azure Cosmos DB for NoSQL
- Native vector type and functions in Azure SQL Database (Available now)
- DiskANN indexing in Azure Database for PostgreSQL
Generally available
- Bi-directional disaster recovery with link feature in Azure SQL Managed Instance
- Flat and quantized flat vector indexes in Azure Cosmos DB for NoSQL
- Serverless auto-pause delay to 15 min
- Azure AI extension
- Hyperscale elastic pools
- Azure Cosmos DB for MongoDB enhancements
Tuesday, 5 November 2024
Data Toboggan Alpine Coaster 2024
Alpine Coaster is our shorter 3 time zone end of year unconference event. You know we like to mix things up with this version of Data Toboggan, so this time:
Sessions are around 20 minutes long.
There are three session editions to join. APAC 7:30 8:40 | EMEA 12:30 - 1:40 | AMER 5:30 - 6:10 GMT, join one or all
Please note, there will be no recordings taken on the day. We feel that kinda takes the pressure off a little for this short session more interactive type of event.
Register now for the Alpine Coaster event: Friday 8 November 2024
APAC Edition - https://www.meetup.com/data-toboggan/events/304285946/
EMEA Edition - https://www.meetup.com/data-toboggan/events/304285953/
AMER Edition - https://www.meetup.com/data-toboggan/events/304285956/
Wednesday, 30 October 2024
Microsoft Purview Data Catalog billing needs consent
There are changes to Microsoft Purview Data Catalog billing beyond 1 November 2024.
NOTE: The new pricing model for Microsoft Purview Data Governance now goes into effect on January 6, 2025.
The article on Microsoft Purview Data Catalog billing
consent explains the new pricing model for Microsoft Purview Data Governance.
Users who have been using the data catalog for free will need to consent to the
new pay-as-you-go billing model to continue using the service beyond this date.
The consent process involves accepting the terms and conditions through a
pop-up in the Microsoft Purview portal.
If the necessary consent is not provided by the Entra Global
Admin, data governance concepts created in the unified portal will be retained
for 120 days. The roles required to provide consent include the Microsoft Entra
Global administrator or the Microsoft Entra Compliance administrator role,
along with the DataMap curator role.
Read more about it here
https://learn.microsoft.com/en-us/purview/purview-portal-billing-consent
Monday, 21 October 2024
Microsoft Purview Data Loss Prevention policies have been extended to Fabric lakehouses
Microsoft Purview's Data Loss Prevention (DLP) policies have been extended to Fabric lakehouses and are now in public preview. This extension follows the success of DLP for Power BI and the general availability of Microsoft Fabric last year1. DLP policies help automatically detect sensitive information as it is uploaded into lakehouses and take risk remediation actions to comply with regulations like GDPR and HIPAA. Compliance and Security administrators will receive audit logs for every detection and can set up alerts for sensitive information found in their Fabric lakehouses. There will be no charge for lakehouses scanned by DLP policies during the preview period.
You can read more here
Microsoft AI Tour London 2024
The Microsoft AI Tour in London on 21 October 2024 talked about solving the biggest challenges and uncovering new opportunities to empower your organization. The Opening Keynote was with Clare Barclay, Satya Nadella and Jared Spataro. There were discussions on building an agentic world where personal agents, organisational agents, business process agents and cross organisation agents enable the use of AI in business to assist you with work. An example of this is Pages for AI artifacts that enables you to work with the organisation and AI . The 3 areas of innovations
- Copilot
- Copilot & AI Stack
- Copilot devices
These innovative areas will increase value and reduce waste and Copilot Studio for autonomous agents was announced.
A very important announcement, for us in the UK, is that Microsoft and UK Government have signed a five-year agreement to support the new era of digital transformation. So a very exciting time for the UK to grow.
There is gravity with data. AI needs to bring all the data together in one platform like Microsoft Fabric, to simplify data and have App servers, AI apps and tools. Microsoft showed its unified data offering in IaaS, PaaS and SaaS to unify the data estate, at what ever stage a business is at, for databases and analytics.
Trust is the most important thing in #AI. You must have a set of principles considering Security, Privacy, Safety at the core
AI is redefining Britain. It can take longer to realise value as change and adoption management takes time. It is cheaper to innovate than before so we need to think big, think bold and be ambitious
Satya finished with 4 important points to note: expand opportunity, earn trust, protect fundamental rights and advance sustainability.
The announcement today can be read here
Monday, 30 September 2024
What's new for Data Toboggan
There are plenty of things coming from Data Toboggan.
Videos from the last conference are being published on YouTube
Alpine Coaster call for speakers is open https://sessionize.com/data-toboggan-alpine-coaster-2024/
Wednesday, 25 September 2024
European Microsoft Fabric Community Conference Keynote
The Microsoft Fabric Community Conference is taking place this week in Stockholm, Sweden. There are 3300 attendees which is amazing. The Keynote was entitled Microsoft Fabric Vision & Roadmap – Analytics in the Era of AI. The keynote speakers; Arun Ulag, Kim Manis, Amir Netz, Marco Casalaina, Wangui McKelvey and Rudra Mitra from Microsoft.
The announcements blog contains a vast number of innovative enhancements and capability updates. The blog is: European Fabric Community Conference 2024: Building an AI-powered data platform
Fabric is about providing a unified data platform for AI transformation with these 3 areas
- AI-powered development: Fabric can give teams the AI-powered tools needed for any data project in a pre-integrated and optimized SaaS environment.
- An AI-powered data estate: Fabric can help you access your entire multi-cloud data estate from a single, open data lake, work from the same copy of data across analytics engines, and use that data to power AI innovation
- AI-powered insights: Fabric can empower everyone to better understand their data with AI-powered visuals and Q&A experiences embedded in the Microsoft 365 apps they use every day.
Microsoft Fabric is shown in the AI Copilot stack.
There is a new exam available to enhance skilling in data engineering DP-700 This Certification measures Fabric data engineer skills:
- Implementing and managing an analytics solution
- Ingesting and transforming data
- Monitoring and optimizing an analytics solution
Bringing all the skills together for data engineering, data science, and data analytics. You can read more here
There are numerous technical announcementsYou can see the list of GA features shipped for network security, data security and governance. There is an End-to-End Security Whitepaper to the read aka.ms/FabricSecurityWhitepaper
- Built-in support for copying data between workspaces
- Streamline copying data into and across OneLake
- Setup automatic incremental copy
To help with deployment, the public preview was announced of the Terraform Provider for Microsoft Fabric. This will help empower deployment, management, and governance. This Microsoft Fabric Infrastructure as Code (IaC) new Terraform Provider lets you use the same tools and processes already employed for Microsoft Azure and other cloud resources.
- Check out the provider on the Terraform Registry: https://aka.ms/FabricTF
- To help you get started, there is a handy Quick Starts repository with examples: GitHub Examples for using the Terraform Provider for Microsoft Fabric
There are lots more releases out there. This is such an exciting time in the Data and AI space. The exciting announcement is that the conference will be back next year.
Microsoft Purview at the Fabric Conference
There are a number of Microsoft Purview sessions at the European Microsoft Fabric Conference this week in Sweden.
Check out the schedule
Tuesday, 10 September 2024
Microsoft Security Community
Microsoft Purview helps in the world against cyber attacks. There is a security community that you can join for updates. This blog lists some of the useful upcoming webinars in the Microsoft Purview space.
SEP 18 2024 Microsoft Purview | Microsoft Purview eDiscovery Modern UX
Discussing the new Purview eDiscovery UI, walk through the new features and discuss how the new UI has been optimized and how the new enhanced features that have been added to help simplify and boost user experience.
OCT 15 2024 Microsoft Purview | Navigating the Future of AI: Understanding the EU AI Act and Its Implications
The European Union Artificial Intelligence Act (EU AI Act) is a pioneering regulatory framework designed to oversee the development, deployment, and use of artificial intelligence (AI) technologies across the EU. This landmark legislation aims to ensure AI systems are safe, ethical, and transparent, prioritizing the protection of fundamental rights and user safety.
OCT 16 2024 Microsoft Purview | Introduction to Data Lifecycle Management (DLM) | Compliance Manager | Part 1
Data Lifecycle Management (DLM) is a comprehensive approach to managing data throughout its lifecycle, from creation and storage to archiving and deletion. DLM in Compliance Manager involves a structured approach to managing data through its entire lifecycle within the framework of regulatory and organizational compliance requirements.
Wednesday, 4 September 2024
Microsoft Purview Data Governance GA Rollout
The new Microsoft Purview solution is currently being rolled out. The details on when the region will be available is here: New Microsoft Purview Data Catalog deployment regions (Preview) | Microsoft Learn.
This new updated tool is critical to make sure data is AI ready across the business. There is a good immersive tour of the Purview Data Governance solution: https://purviewdatagovernance.storylane.io/share/0fu0nmhvphk4
Monday, 2 September 2024
Microsoft Purview: Data Governance Solution is General Availability
The new Microsoft Purview: Data Governance Solution became General Availability 1 September.
This release brings with it a several new capabilities that have added since the public preview launched in April 2024. This new release focuses on the practice of responsibly federated data governance in built in the application. The release enables
- modelling and use of durable business concepts to govern evolving and multi-cloud physical data and AI estate
- scale the management of data quality and health in the data estate with the practice of federated data governance, with intelligent automation, and without displacing the people interaction
- confidently activate responsible discover, understand, and access data in the organization-wide teams and enable users help innovate.
Wednesday, 7 August 2024
Microsoft Fabric Known Issues
There is a Power BI report that contains the list of known feature issues for the various areas of Microsoft Fabric. The report can be found here.
Tuesday, 6 August 2024
The AI Hub in Microsoft Purview
The Microsoft Purview AI hub is a preview feature that helps you govern and gain insight to generative AI apps and helps protect through mitigating and managing risks. With the explosion of generative AI this has become a crucial addition in the governance world.
The AI Hub helps with
- strengthening information protection for Copilot
- supporting compliance management for Copilot
- fortifying data security for AI
- insights and analytics that enable that view on AI activity
- policies to help protect and prevent data loss
- compliance for optimal data handling
- Sensitivity labels and content encrypted by Microsoft Purview
Information Protection
- Data classification
- Customer Key
- Communication compliance
- Auditing
- Content search
- eDiscovery
- Retention and deletion
- Customer Lockbox
Thursday, 1 August 2024
The changing new Microsoft Purview Portal
Microsoft unveiled the new Purview portal, a game-changer for organizations navigating the complex landscape of data security and risk management. This change is ongoing within Microsoft Purview with the relocation of portal features and retirement of some features. The new Microsoft Portal features changes
The portal's design facilitates easy navigation with the solution cards for quick access to the solutions within Risk & Compliance, Data Governance, and Data Security sections.
The key features areas are now
Data Security: covers discovery and protecting sensitive information across your organization with a robust set of data security solutions, including Data Loss Prevention and Information Protection and adaptive protection .
Data Governance: It offers unified data governance solutions to manage services across your on-premises, multi cloud, and SaaS environments. This includes creating an up-to-date map of your data estate with classification, data catalog, data quality (preview), lineage, data management, and data estate insights.
Risk and Compliance: The platform includes solutions to help minimize compliance risks and meet regulatory requirements, ensuring your organization stays ahead of the curve in data protection standards with eDiscovery and audit, communication compliance, data lifecycle management, and records management.
Wednesday, 17 July 2024
The Growth of Microsoft Purview
Microsoft Purview Data Governance generally available
The reimagined Microsoft Purview Data Governance will be generally available 1 September 2024. This is really exciting new as the reimagine portal has so much amazing capability to help business data be better, more secure and help drive business growth.
The convergence of cyberattacks, increasing regulations, an ever-expanding data estate, and business demand for insights pressurizes business leaders to adopt a unified strategy to confidently ensure AI readiness. We see that unification of data security and governance capabilities to create a modern data governance solution that is easy for businesses to adopt democratising data and addresses the challenges of AI.
The most important feature to understand is that Data Governance require the business , people, processes as well as a technological tool to automate data understanding. It is built into Microsoft Fabric with the capability of creating custom reports from Fabric data with the data quality feature supporting any Fabric source.
As I shared before these are the collection of tools.
Sunday, 14 July 2024
That's a Wrap Data Toboggan Cool Runnings 2024
Data Toboggan Cool Runnings 2024 ran 13 July 2024. We ran our largest event error since inception in 2020. We were so pleased to be able to have 3 full tracks of sessions from speakers all round the world. The event ran for 12 hours, starting with presenters from Australia, through India, Europe, Scandinavia, UK and finishing in the USA. It was our largest event with over 400 attendees. It is our principle to never close registration until after event finishes, so the event is open.to register and join all day. It is the people who make the event, the speakers, the attendees, and helpers. The organising team had loads of fun putting the event together.
We hold 3 events a year.
- Data Toboggan - Winter Edition (full conference | end of Jan)
- Data Toboggan - Cool Runnings (full conference | beginning July)
- Data Toboggan - Alpine Coaster (unconference real world short talks | early November)
We also have a user group that meets from time to time when there are interesting areas to cover called Data Toboggan - Slide Preparation.
The virtual event started covering Azure Synapse Analytics when it was launched and with the introduction of Microsoft Fabric we have grown with technology innovation and now cover Microsoft Fabric, Synapse, Data Engineering, Data warehousing, and Analysis. We have a track with Microsoft Purview and AI connected elements as well.
We look forward to seeing you all at the next event.
Thursday, 11 July 2024
Seven time Microsoft MVP
I am pleased to share that I was renewed as a Microsoft Data Platform MVP yesterday.
I am overjoyed and humbled to be a part of this amazing program for another year, my 7th year. It is such a privilege to be able to share knowledge and help others in the community grow. I am always excited about the amazing technology we have at our finger tips, such as Microsoft Fabric and Microsoft Purview. I am looking forward to another year contributing to and growing, the amazing data community. Thank you Microsoft.
Award Category
Data Platform
Technology Area
Microsoft Purview, Azure Synapse Analytics
Tuesday, 9 July 2024
Data Toboggan Cool Runnings tracks
We have 3 full tracks at our next event on Saturday.
Event Date: 13th July 2024
Register Now: https://bit.ly/DTCR2024-Register
Agenda: https://bit.ly/DTCR2024-Agenda
This will be our biggest event yet. The track names all have a story behind them.
Bramburg track
Pradaschier track
Rigi Kaltbad track
Saturday, 29 June 2024
Data Toboggan Cool Runnings 2024 - Microsoft Purview and Responsible AI
I am excited to announce that I will be speaking at Data Toboggan Cool Runnings on Saturday 13 July 2024.
Register Now: https://bit.ly/DTCR2024-Register
Agenda: https://bit.ly/DTCR2024-Agenda
Data Governance and Responsible AI, and the embellishment of AI within Microsoft Purview aid and prepare business for using AI. Moving forward I believe that combining the use of both Data Governance and Responsible AI into one actionable framework that it will bring immediate rewards to every business use case.
Hope you can join me on 13 July 2024 8:00 - 20:00 UK Time online
The Growth of Microsoft Purview
What is Microsoft Purview is a question I often get asked. The answer is never what people expect. Over the last couple of years the Microsoft Purview solution has been growing in capability with many different applications being brought together under one umbrella term, Microsoft Purview. It is now described as having three areas.
- Data security for information and cybersecurity teams
- Data governance for data consumers data engineers and data officers
- Risk and compliance for risk compliance and legal teams
When we talk about Microsoft Purview it helps to know what the business problem is to identify which areas of the product are required.
I put together a diagram to help map out all the applications to date. In the diagram it is clear to see the extent of the applications that Microsoft have brought together and added over the last year, to the suite of tools. The documentation refers to the 3 high level areas Risk and Compliance (shown in blue), Data Governance (shown in purple) and Security (shown in orange and green). I have depicted the AI hub in green rather than orange, because it covers a different conceptual area of Responsible AI and that compliance protection for generative AI apps.
When I talk about data governance I am looking at apps such as data state health, roles and responsibilities for the data estate, the data catalogue, classification, lineage, all the new business domain areas such as a business glossary and the really new data quality set of tools to control and monitor quality and the health of the data.
Thus, the new reimaged Microsoft Purview experience provides a holistic view of your data and enables better automated data management.
Sunday, 23 June 2024
Data Toboggan Cool Runnings
We are excited to have our next event coming up on 13th July 2024. This will be the biggest event yet with 3 full tracks of sessions and we are grateful to all the speakers who give up there time to help the community learn and grow. The event will have all the latest and greatest technical discissions. We hope you can join us for a fun packed day of learning and community networking.
Register Now: https://bit.ly/DTCR2024-Register
Agenda: https://bit.ly/DTCR2024-Agenda
Wednesday, 22 May 2024
Microsoft Build Fabric: What's new and what's next
The Microsoft Fabric announcements were covered by Amir Netz, Arun Ulagaratchagan, Flavien Daussy, Adam Penhaul. The session is recorded and can be seen here Microsoft Fabric: What's new and what's next.
I live blogged this great main data session at Microsoft Build.
AI is changing the world. AI revolution is based on Data. Data is the fuel that powers AI. It is hard because of the amount of innovation and lots of diversity and complexity.
Purpose built workloads. AI is built into Fabric. Governance is particular important and built in and driven through Microsoft Purview.
Aka.ms/try-fabric
There are weekly Fabric released with 60-80 pages of blogs . The roadmap for these features can be found at
Aka.ms/FabricRoadmap
What is the point of having data in the lake if no one is using it. It is a bout immediate business access to the data
A SaaS product that looks like Office. No knobs to optimise Fabric. Results in hours.
- Starts with built-in CI/CD
- Creating deployment pipelines
- And Taskflows (public Preview) to provide help to create things like the medallion architecture.
In Fabric you can now bring in partner workloads such as MDM and ESRI. It was announced Microsoft Fabric Workload Development kit as Public Preview.
Al your data, all your teams in one place. You can publish to workload hub for a native fabric workload experience. Aka.ms/FabDevKit
There are multiple methods to get data into Fabric for multi-clouds. Shortcuts to On-Premises Sources for OneLake was announced as Public Preview.
Not everything stored in open formats like databases, so Mirroring helps with this. There is Free Mirroring storage for Replicas.
Delta format is not the only open format. Iceberg is another major storage function. There is transparent simultaneous support of Delta Lake and Iceberg formats just announced. It is now possible to also connect to Salesforce and not move the data. Also now an expanded partnership with snowflake and Adobe.
To have unified API with the public preview of the developer friendly API for GraphQL to all data in OneLake. (GraphQL uses JSON structures).
Unified data culture requires real time data. Microsoft announced Real-Time Intelligence. It uses the the Real Time hub powered by AI for data in motion. (OneLake data hub is for date at Rest)
So Real-Time Intelligence in the real world.
Copilot is integrated in every Microsoft Fabric Experience. Copilot in Fabric is now Generally Available. This means AI driven insights drive insights out of the box and with custom generative AI for your data.
Announcing Public Preview of AI Skills in Fabric. It allows you to build your own Generative AI in Fabric
Simple to get started
- Create AI Skill
- Add data – ground in data
- Select tables to ground the data
Query in natural language
In conclusion come and Join the Microsoft Fabric Team in Stockholm, Sweden 24-27 September 2024
Aka.ms/FabCon-Europe
Tuesday, 7 May 2024
Responsible AI Transparency Report
Microsoft have shared how they work with AI responsible in this paper Responsible AI Transparency Report How we build, support our customers, and grow. The report outlines Microsoft’s approach to building generative AI applications responsibly, adhering to six core values of transparency, accountability, fairness, inclusiveness, reliability and safety, and privacy and security. The framework is all based around the govern, map, measure and manage cycle.
Govern
Establishes the context for AI risk management, including adherence to policies and pre-deployment reviews.
- Policies and principles
- Procedures for pre-trained models
- Stakeholder coordination
- Documentation
- Pre-deployment reviews
Map
Involves identifying and prioritizing AI risks and conducting impact assessments to inform decisions.
- Responsible AI Impact Assessments
- Privacy and security review
- Red teaming
Measure
Implements procedures to assess AI risks and the effectiveness of mitigations through established metrics.
- Metrics for identified risks
- Mitigations performance testing
Manage
Focuses on mitigating identified risks at both the platform and application levels, with ongoing monitoring and user feedback.
- User agency
- Transparency
- Human review and oversight
- Managing content risks
- Ongoing monitoring
- Defense in depth
These are all depicted in the diagram in the paper which is a very informative read.
Responsible AI Transparency Report How we build, support our customers, and grow
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RW1l5BO
Thursday, 2 May 2024
Responsible AI – A Data Governance Approach
I am speaking at the Bath Azure User Group meeting about Responsible AI - a Data Governance approach. I see Responsible AI a subset of Data Governance. This session covers where we are with legislation and tools, why good data quality is a must for AI and how to get started.
Data Governance and Responsible AI, and the embellishment of AI within Microsoft Purview aid and prepare business for using AI. Moving forward I believe that combining the use of both Data Governance and Responsible AI into one actionable framework that it will bring immediate rewards to every business use case.
Hope you can join us join us 22 May 2024 18-20 in Bath
Monday, 29 April 2024
Open Lakes
This is an insightful article entitled Open Lakes, Not Walled Gardens by Raghu Ramakrishnan and Josh Caplan.
The Fabric design principles consider the
Open Ecosystem
Ensuring there are no proprietary barriers to data in OneLake, allowing integration with other services.
Security and Governance
Data in OneLake must be secure and governed, integrating with Microsoft Purview for global policies.
Creating accessible data with no Silos
Making the entire data estate easily accessible in OneLake without unnecessary data duplication.
SaaS Simplicity
Providing a suite of analytic engines in a secure, governed environment with single sign-on.
The article discusses the concept of open lakes for analytics, emphasizing the need for a unified view of data across an enterprise’s data estate to draw true insights. The advancements in big data tools, cloud storage, machine learning, and AI models, which offer opportunities to analyze core assets and processes through data in the Golden Age of Analytics.
The Microsoft implementation of the open lake vision with OneLake and Fabric focuses on data storage, analytics, sharing, and governance integrated with Microsoft Purview for data estate-wide governance. It outlines the importance of securing and governing enterprise data, detailing how OneLake and Fabric address these needs with built-in features and integration with Microsoft Purview for global data estate governance.
Governance for the organization, estate-level, and policy enforcement and sharing of data is a core tenant. Governance within Fabric and Onelake covers organizational governance, Estate-Level Governance where Microsoft Purview provides a global view of the entire data estate, offering a central catalog for all assets across all sources, global policies to secure sensitive data, and support for managing critical data risks and regulatory compliance. Policy Enforcement and Data Sharing are also discussed.
Thursday, 25 April 2024
Data Governance, Data Ethics and Responsible AI video series
- Data Governance to help govern and manage that data to improve trust and data quality
- Data Ethics to help mitigate issues with data integrity and provenance
- Responsible AI to look a bias, fairness and efficacy in decisions
Episode 2 what is data governance
Episode 4 What is Responsible AI
Episode 5 Responsible AI Tools Microsoft Standard v2
Episode 6 Responsible AI Tools Impact Assessment and guide
Episode 7 Responsible AI Tools HAX Toolkit
Episode 8 Responsible AI Tools Maturity Model
Episode 10 UK Government Assurance
Episode 12 Responsible AI Dashboard
Watch this space as the next set of videos will cover how this fits in with data quality and how Microsoft Purview can help with data preparation.
The Age of Data Governance
Microsoft Purview is rapidly changing in the data governance space. It is offering Data value creation with essential defense & response offense . This new addition helps business address the issues that the AI outputs are only as good as the quality of the data that resides behind it.
Peter Aiken new definition of data governance ' Managing data
decisions with guidance’.
Suma Manohar has written a great article talking about data
quality in the era of AI. Microsoft purview
introduced domain and data products adding that clear business context and
terminology mapping. Enhanced search
capability to provide more understanding using Copilot is available. It also
can help with suggesting Data Quality rules. These autogenerated rules are context
specific.
Creating data
quality rules manually in Purview should follow the 6 standard data quality
metrics.
- Freshness – confirms that all values are up to date.
- Duplicate rows- checks rows to find repeated values across two or more columns.
- Empty/blank files – looks for blank and empty fields in a column where there should be values.
- Unique values – confirms that values in a column are unique.
- Data type match – confirms that values in a column match data type requirements.
- String format match – confirms that text values in a column match a specific format or other requirements.
- Table lookup – confirms that a value in one table can be found in a specific column of another table
- Custom – create a custom rule with the visual expression builder.
- Regular expressions can be used for pattern matching in the above.
When working on data quality there are standard guidelines that
can help. A method I use is firstly from the DAMA-DMBOK and then the Data
Management Capability Assessment Model (DCAM)
Scans take place to show quality score and trends in the data quality dashboard and
scores are shown on the data product page
The rollout of the new solution across the regions is shared
here.