Chaos, complexity, curiosity and database systems. A place where research meets industry
Welcome
"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein
Wednesday 17 July 2024
The Growth of Microsoft Purview
Microsoft Purview Data Governance generally available
The reimagined Microsoft Purview Data Governance will be generally available 1 September 2024. This is really exciting new as the reimagine portal has so much amazing capability to help business data be better, more secure and help drive business growth.
The convergence of cyberattacks, increasing regulations, an ever-expanding data estate, and business demand for insights pressurizes business leaders to adopt a unified strategy to confidently ensure AI readiness. We see that unification of data security and governance capabilities to create a modern data governance solution that is easy for businesses to adopt democratising data and addresses the challenges of AI.
The most important feature to understand is that Data Governance require the business , people, processes as well as a technological tool to automate data understanding. It is built into Microsoft Fabric with the capability of creating custom reports from Fabric data with the data quality feature supporting any Fabric source.
As I shared before these are the collection of tools.
Sunday 14 July 2024
That's a Wrap Data Toboggan Cool Runnings 2024
Data Toboggan Cool Runnings 2024 ran 13 July 2024. We ran our largest event error since inception in 2020. We were so pleased to be able to have 3 full tracks of sessions from speakers all round the world. The event ran for 12 hours, starting with presenters from Australia, through India, Europe, Scandinavia, UK and finishing in the USA. It was our largest event with over 400 attendees. It is our principle to never close registration until after event finishes, so the event is open.to register and join all day. It is the people who make the event, the speakers, the attendees, and helpers. The organising team had loads of fun putting the event together.
We hold 3 events a year.
- Data Toboggan - Winter Edition (full conference | end of Jan)
- Data Toboggan - Cool Runnings (full conference | beginning July)
- Data Toboggan - Alpine Coaster (unconference real world short talks | early November)
We also have a user group that meets from time to time when there are interesting areas to cover called Data Toboggan - Slide Preparation.
The virtual event started covering Azure Synapse Analytics when it was launched and with the introduction of Microsoft Fabric we have grown with technology innovation and now cover Microsoft Fabric, Synapse, Data Engineering, Data warehousing, and Analysis. We have a track with Microsoft Purview and AI connected elements as well.
We look forward to seeing you all at the next event.
Thursday 11 July 2024
Seven time Microsoft MVP
I am pleased to share that I was renewed as a Microsoft Data Platform MVP yesterday.
I am overjoyed and humbled to be a part of this amazing program for another year, my 7th year. It is such a privilege to be able to share knowledge and help others in the community grow. I am always excited about the amazing technology we have at our finger tips, such as Microsoft Fabric and Microsoft Purview. I am looking forward to another year contributing to and growing, the amazing data community. Thank you Microsoft.
Award Category
Data Platform
Technology Area
Microsoft Purview, Azure Synapse Analytics
Tuesday 9 July 2024
Data Toboggan Cool Runnings tracks
We have 3 full tracks at our next event on Saturday.
Event Date: 13th July 2024
Register Now: https://bit.ly/DTCR2024-Register
Agenda: https://bit.ly/DTCR2024-Agenda
This will be our biggest event yet. The track names all have a story behind them.
Bramburg track
Pradaschier track
Rigi Kaltbad track
Saturday 29 June 2024
Data Toboggan Cool Runnings 2024 - Microsoft Purview and Responsible AI
I am excited to announce that I will be speaking at Data Toboggan Cool Runnings on Saturday 13 July 2024.
Register Now: https://bit.ly/DTCR2024-Register
Agenda: https://bit.ly/DTCR2024-Agenda
Data Governance and Responsible AI, and the embellishment of AI within Microsoft Purview aid and prepare business for using AI. Moving forward I believe that combining the use of both Data Governance and Responsible AI into one actionable framework that it will bring immediate rewards to every business use case.
Hope you can join me on 13 July 2024 8:00 - 20:00 UK Time online
The Growth of Microsoft Purview
What is Microsoft Purview is a question I often get asked. The answer is never what people expect. Over the last couple of years the Microsoft Purview solution has been growing in capability with many different applications being brought together under one umbrella term, Microsoft Purview. It is now described as having three areas.
- Data security for information and cybersecurity teams
- Data governance for data consumers data engineers and data officers
- Risk and compliance for risk compliance and legal teams
When we talk about Microsoft Purview it helps to know what the business problem is to identify which areas of the product are required.
I put together a diagram to help map out all the applications to date. In the diagram it is clear to see the extent of the applications that Microsoft have brought together and added over the last year, to the suite of tools. The documentation refers to the 3 high level areas Risk and Compliance (shown in blue), Data Governance (shown in purple) and Security (shown in orange and green). I have depicted the AI hub in green rather than orange, because it covers a different conceptual area of Responsible AI and that compliance protection for generative AI apps.
When I talk about data governance I am looking at apps such as data state health, roles and responsibilities for the data estate, the data catalogue, classification, lineage, all the new business domain areas such as a business glossary and the really new data quality set of tools to control and monitor quality and the health of the data.
Thus, the new reimaged Microsoft Purview experience provides a holistic view of your data and enables better automated data management.
Sunday 23 June 2024
Data Toboggan Cool Runnings
We are excited to have our next event coming up on 13th July 2024. This will be the biggest event yet with 3 full tracks of sessions and we are grateful to all the speakers who give up there time to help the community learn and grow. The event will have all the latest and greatest technical discissions. We hope you can join us for a fun packed day of learning and community networking.
Register Now: https://bit.ly/DTCR2024-Register
Agenda: https://bit.ly/DTCR2024-Agenda
Wednesday 22 May 2024
Microsoft Build Fabric: What's new and what's next
The Microsoft Fabric announcements were covered by Amir Netz, Arun Ulagaratchagan, Flavien Daussy, Adam Penhaul. The session is recorded and can be seen here Microsoft Fabric: What's new and what's next.
I live blogged this great main data session at Microsoft Build.
AI is changing the world. AI revolution is based on Data. Data is the fuel that powers AI. It is hard because of the amount of innovation and lots of diversity and complexity.
Purpose built workloads. AI is built into Fabric. Governance is particular important and built in and driven through Microsoft Purview.
Aka.ms/try-fabric
There are weekly Fabric released with 60-80 pages of blogs . The roadmap for these features can be found at
Aka.ms/FabricRoadmap
What is the point of having data in the lake if no one is using it. It is a bout immediate business access to the data
A SaaS product that looks like Office. No knobs to optimise Fabric. Results in hours.
- Starts with built-in CI/CD
- Creating deployment pipelines
- And Taskflows (public Preview) to provide help to create things like the medallion architecture.
In Fabric you can now bring in partner workloads such as MDM and ESRI. It was announced Microsoft Fabric Workload Development kit as Public Preview.
Al your data, all your teams in one place. You can publish to workload hub for a native fabric workload experience. Aka.ms/FabDevKit
There are multiple methods to get data into Fabric for multi-clouds. Shortcuts to On-Premises Sources for OneLake was announced as Public Preview.
Not everything stored in open formats like databases, so Mirroring helps with this. There is Free Mirroring storage for Replicas.
Delta format is not the only open format. Iceberg is another major storage function. There is transparent simultaneous support of Delta Lake and Iceberg formats just announced. It is now possible to also connect to Salesforce and not move the data. Also now an expanded partnership with snowflake and Adobe.
To have unified API with the public preview of the developer friendly API for GraphQL to all data in OneLake. (GraphQL uses JSON structures).
Unified data culture requires real time data. Microsoft announced Real-Time Intelligence. It uses the the Real Time hub powered by AI for data in motion. (OneLake data hub is for date at Rest)
So Real-Time Intelligence in the real world.
Copilot is integrated in every Microsoft Fabric Experience. Copilot in Fabric is now Generally Available. This means AI driven insights drive insights out of the box and with custom generative AI for your data.
Announcing Public Preview of AI Skills in Fabric. It allows you to build your own Generative AI in Fabric
Simple to get started
- Create AI Skill
- Add data – ground in data
- Select tables to ground the data
Query in natural language
In conclusion come and Join the Microsoft Fabric Team in Stockholm, Sweden 24-27 September 2024
Aka.ms/FabCon-Europe
Tuesday 7 May 2024
Responsible AI Transparency Report
Microsoft have shared how they work with AI responsible in this paper Responsible AI Transparency Report How we build, support our customers, and grow. The report outlines Microsoft’s approach to building generative AI applications responsibly, adhering to six core values of transparency, accountability, fairness, inclusiveness, reliability and safety, and privacy and security. The framework is all based around the govern, map, measure and manage cycle.
Govern
Establishes the context for AI risk management, including adherence to policies and pre-deployment reviews.
- Policies and principles
- Procedures for pre-trained models
- Stakeholder coordination
- Documentation
- Pre-deployment reviews
Map
Involves identifying and prioritizing AI risks and conducting impact assessments to inform decisions.
- Responsible AI Impact Assessments
- Privacy and security review
- Red teaming
Measure
Implements procedures to assess AI risks and the effectiveness of mitigations through established metrics.
- Metrics for identified risks
- Mitigations performance testing
Manage
Focuses on mitigating identified risks at both the platform and application levels, with ongoing monitoring and user feedback.
- User agency
- Transparency
- Human review and oversight
- Managing content risks
- Ongoing monitoring
- Defense in depth
These are all depicted in the diagram in the paper which is a very informative read.
Responsible AI Transparency Report How we build, support our customers, and grow
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RW1l5BO
Thursday 2 May 2024
Responsible AI – A Data Governance Approach
I am speaking at the Bath Azure User Group meeting about Responsible AI - a Data Governance approach. I see Responsible AI a subset of Data Governance. This session covers where we are with legislation and tools, why good data quality is a must for AI and how to get started.
Data Governance and Responsible AI, and the embellishment of AI within Microsoft Purview aid and prepare business for using AI. Moving forward I believe that combining the use of both Data Governance and Responsible AI into one actionable framework that it will bring immediate rewards to every business use case.
Hope you can join us join us 22 May 2024 18-20 in Bath
Monday 29 April 2024
Open Lakes
This is an insightful article entitled Open Lakes, Not Walled Gardens by Raghu Ramakrishnan and Josh Caplan.
The Fabric design principles consider the
Open Ecosystem
Ensuring there are no proprietary barriers to data in OneLake, allowing integration with other services.
Security and Governance
Data in OneLake must be secure and governed, integrating with Microsoft Purview for global policies.
Creating accessible data with no Silos
Making the entire data estate easily accessible in OneLake without unnecessary data duplication.
SaaS Simplicity
Providing a suite of analytic engines in a secure, governed environment with single sign-on.
The article discusses the concept of open lakes for analytics, emphasizing the need for a unified view of data across an enterprise’s data estate to draw true insights. The advancements in big data tools, cloud storage, machine learning, and AI models, which offer opportunities to analyze core assets and processes through data in the Golden Age of Analytics.
The Microsoft implementation of the open lake vision with OneLake and Fabric focuses on data storage, analytics, sharing, and governance integrated with Microsoft Purview for data estate-wide governance. It outlines the importance of securing and governing enterprise data, detailing how OneLake and Fabric address these needs with built-in features and integration with Microsoft Purview for global data estate governance.
Governance for the organization, estate-level, and policy enforcement and sharing of data is a core tenant. Governance within Fabric and Onelake covers organizational governance, Estate-Level Governance where Microsoft Purview provides a global view of the entire data estate, offering a central catalog for all assets across all sources, global policies to secure sensitive data, and support for managing critical data risks and regulatory compliance. Policy Enforcement and Data Sharing are also discussed.
Thursday 25 April 2024
Data Governance, Data Ethics and Responsible AI video series
- Data Governance to help govern and manage that data to improve trust and data quality
- Data Ethics to help mitigate issues with data integrity and provenance
- Responsible AI to look a bias, fairness and efficacy in decisions
Episode 2 what is data governance
Episode 4 What is Responsible AI
Episode 5 Responsible AI Tools Microsoft Standard v2
Episode 6 Responsible AI Tools Impact Assessment and guide
Episode 7 Responsible AI Tools HAX Toolkit
Episode 8 Responsible AI Tools Maturity Model
Episode 10 UK Government Assurance
Episode 12 Responsible AI Dashboard
Watch this space as the next set of videos will cover how this fits in with data quality and how Microsoft Purview can help with data preparation.
The Age of Data Governance
Microsoft Purview is rapidly changing in the data governance space. It is offering Data value creation with essential defense & response offense . This new addition helps business address the issues that the AI outputs are only as good as the quality of the data that resides behind it.
Peter Aiken new definition of data governance ' Managing data
decisions with guidance’.
Suma Manohar has written a great article talking about data
quality in the era of AI. Microsoft purview
introduced domain and data products adding that clear business context and
terminology mapping. Enhanced search
capability to provide more understanding using Copilot is available. It also
can help with suggesting Data Quality rules. These autogenerated rules are context
specific.
Creating data
quality rules manually in Purview should follow the 6 standard data quality
metrics.
- Freshness – confirms that all values are up to date.
- Duplicate rows- checks rows to find repeated values across two or more columns.
- Empty/blank files – looks for blank and empty fields in a column where there should be values.
- Unique values – confirms that values in a column are unique.
- Data type match – confirms that values in a column match data type requirements.
- String format match – confirms that text values in a column match a specific format or other requirements.
- Table lookup – confirms that a value in one table can be found in a specific column of another table
- Custom – create a custom rule with the visual expression builder.
- Regular expressions can be used for pattern matching in the above.
When working on data quality there are standard guidelines that
can help. A method I use is firstly from the DAMA-DMBOK and then the Data
Management Capability Assessment Model (DCAM)
Scans take place to show quality score and trends in the data quality dashboard and
scores are shown on the data product page
The rollout of the new solution across the regions is shared
here.
Tuesday 9 April 2024
Fabric Mirroring Overview
There was a new feature announced last year that has been developing called Mirroring in Fabric which became Public Preview in March 2024. This enables bringing your databases into Fabric.
Fabric mirroring is a feature within Microsoft Fabric that allows for seamless and real-time data replication from various databases into a centralized analytics platform known as OneLake. This process is designed to be frictionless, eliminating the need for complex Extract, Transform, Load (ETL) pipelines, which are traditionally used to move and transform data from one system to another.
The primary advantage of fabric mirroring is its ability to provide near real-time insights by continuously updating the data in OneLake as changes occur in the source databases. This uses Change Data Capture (CDC) technology, to capture and replicate data changes to OneLake to ensure the data is always current and synchronized.
By mirroring data into OneLake, organizations can break down data silos and unify their data estate, allowing for more efficient data governance and analysis. The data which has been mirrored can be used for analytics with ease to perform various analytical tasks.
Fabric mirroring simplifies the data access process by allowing databases to be securely accessed and managed within Fabric without the need to switch database clients or install additional software. It is possible for a mirrored database to be cross joined with other databases, warehouses or lakehouses whether that be data in Azure Cosmos DB, Azure SQL DB, Snowflake, etc.
In summary, fabric mirroring is a transformative feature that streamlines data replication and analysis, providing businesses with a modern, fast, and safe way to access and ingest data, thereby accelerating the journey to valuable insights and informed decision-making.
Further Reading
https://learn.microsoft.com/en-us/fabric/database/mirrored-database/overview
https://aka.ms/MirrorSQLDBPublicPreviewBlog
https://devblogs.microsoft.com/cosmosdb/public-preview-mirroring-azure-cosmos-db-in-microsoft-fabric
Unify your data across domains, clouds, and engines in OneLake
Wednesday 3 April 2024
Microsoft Purview Fabric announcements
There were a number of announcements at the Microsoft Fabric Community Conference including the new Microsoft Purview for modern data governance was shared. With business moving towards federated governance models, managed by line of business to help with more local understanding and increasing volumes of data, Microsoft have launched in Purview the capability for organizations to create subdomains to refine the way the data estate is structured in Fabric. Security has also become easier with the ability to set security groups for default domains.
Microsoft Fabric is now natively integrated with Microsoft Purview Data Governance solution. There is a reimagined data governance experience for the data estate governance practice. The new experience includes data curation, an important new feature including data quality with insights. The new experience is available in preview 8 April 2024. This new experience is aiming to help accelerate measurable business value with key results, simplification and to help with implementing efficiency with natural language recommendations.
Purview enables business terminology linkage to
- Data Products (a collection of data assets used for a business function)
- Business Domains (ownership of Data Products)
- Data Quality (assessment of quality)
- Data Access, Actions
- Data Estate Health (reports and insights)
A really exciting new feature we have all been waiting for is the data quality capabilities. The is now the Data Quality model to set rules top down with business domains, data products, and the data assets. The model generates data quality scores at the asset, data product, or business domain level from the policies on terms or rules. The score rules show on the dashboard as red/yellow/green indicator scores. The 2 capabilities in this data quality model are:
- Profiling—quick sample set insights
- Data quality scans—in-depth scans of full data sets
It is great to see the Microsoft Purview continues to align to the EDM Council set of 14 rules.
There is now an actions centre showing the current health summarising actions by role, data product or business domain for governance. This actions centra aims to help improve governance posture for the business.
There is partnership with Ernst & Young LLP who will share playbooks and reports for US financial services customers on Azure Marketplace, throughout the preview.
References
Announcements from the Microsoft Fabric Community Conference
Easily implement data mesh architecture with domains in Fabric
Introducing modern data governance for the era of AI
The foundation for responsible analytics with Microsoft Purview
Crash Course in Microsoft Purview (azureedge.net)
Learning
- Sign up for the Microsoft Fabric free trial.
- Visit the Microsoft Fabric website.
- Join the Fabric community.
- technical blogs on the Microsoft Fabric Updates Blog.
- Explore the Fabric technical documentation.
- Join the Fabric Partner Community on Teams to engage with the Fabric partner and product teams and attend weekly Fabric Engineering Connection calls every Wednesday at 8 am PDT.
Monday 1 April 2024
Responsible AI dashboard training
There is a new MSLearn course to Learn how to debug an AI model using the Responsible AI dashboard in Azure Machine Learning studio to ensure it performs responsibly and is less harmful. It is important to understand and learn how to use the dashboard to set any projects up for success.
Train a model and debug it with Responsible AI dashboard
The objectives are
- Create a responsible AI dashboard.
- Identify where the model has errors.
- Discover data over or under representation to mitigate biases.
- Understand what drives a model outcome with explainable and interpretability.
- Mitigate issues to meet compliance regulation requirements.
You do need the ability to understand beginner level Python.
Saturday 30 March 2024
The Fabric Conference 2024
The first Microsoft Fabric Community Conference, took place from 26 to 28 March 2024, at the MGM Grand in Las Vegas, Nevada. It was an in person only conference and no sessions were recorded or streamed. Great to see so many back to in person conferences, although for those not able to attend it means limited learning.
The conference had more than 130 sessions covering various aspects of Microsoft Fabric from data warehousing to data movement, AI, real-time analytics, and business intelligence.
The Microsoft Intelligent Data Platform incorporates Microsoft Fabric, a suite of technologies that empowers organizations to harness the full power of their data. By natively integrating products across four critical workloads AI, analytics, database, and security, organizations can innovate without limits. The great advantage of fabric is that it brings together disconnected services from multiple vendors to focus on accelerating transformation.
The four core promises of Fabric:
- Fabric is a complete platform
- Fabric is lake-centric and open
- Fabric can empower every business user
- Fabric is AI powered
Announcing the Public Preview of Mirroring in Microsoft Fabric
Announcing Folder in Workspace in Public Preview
Wednesday 27 March 2024
Responsible AI Day at Microsoft
- The Responsible AI Maturity Assessment - (RAI MM) is a framework to help organizations identify their current and desired levels of RAI maturity
- Responsible AI Standard v2 - Microsoft’s six AI principles operationalized
- Responsible AI Impact Assessment Template - To define a process for assessing the impact an AI system may have on people, organizations, and society.
- HAX Playbook - The HAX Playbook is a tool for proactively and systematically exploring common human-AI interaction failures
- Overview of Responsible AI practices for Azure OpenAI models
- Azure AI Content Safety
Monday 25 March 2024
The SQLBits 2024 Event
SQLBits 2024 was amazing as ever. The organisers creating another well choreographed event. It was held in Farnborough, the birthplace of British aviation.
There was a huge number of tracks including all types of sessions.
The Sessions
The agenda covered various types of sessions
- Tuesday Training Day
- Wednesday 100 minute sessions to gain more depth into a variety of areas
- Thursday General Sessions Day One
- Friday General Sessions Day Two
- The Free Saturday
SQLBits Extra Events
There was a wide range of extracurricular things to get involved with to enable a different slant on networking:
- Meet the Trainer Monday 18th March, 6.30pm, The Aviator Hotel
- Welcome Drinks and Burgers & Board Games Night Wednesday 20th March, 6pm
- Ask the Experts Wednesday 20th March - Saturday 23rd March
- The SQLBits Run Wednesday 20th March, 6pm & Friday 22nd March, 6am
- User Group Bonus Sessions Thursday 21st March, 6pm
- The Pub Quiz Thursday 21st March, 7.30pm
- The Friday Night Party Friday 22nd March, 7.30pm
The Keynote
This was delivered by a number of speakers.
SQLBits announcements
Public Preview: Managed Instance General Purpose Next-Gen; Migration Assessment in Azure Arc; Database Watcher.
Private Preview : T-SQL Regex; Copilot in Azure SQL Database
Learn more:
Introducing Azure SQL Managed Instance Next-gen GP
Introducing database watcher for Azure SQL
Azure SQL migration assessment enabled by Azure Arc
Introducing Copilot in Azure SQL Database
Sunday 24 March 2024
SQLBits Buddies 2024
I was part of a team of helpers at SQLBits who are Bits Buddies. We are all experienced helpers and have attended lots of SQLBits. We are dedicated to help attendees who might want a bit of extra company and support, whether it’s a persons first time at the event or a regular attendee at the event.
We ran pre event meet up opportunities in the run up to SQLBits. The weekly drop ins for delegates and those interested in attending for an informal chat about the experience of attending SQLBits and to make connections before the event. It was nice to meet a few people before the event. This year the bits buddies wore orange hats so we could be seen easily around the venue. It was really nice to speak to people at the event and help with questions. Till next year.
Saturday 23 March 2024
Data Toboggan Slide Preparation at SQLBits
This year SQLBits Thursday 21 March 2024 , added User Group Bonus Sessions. It was announced as
After the main sessions, two UK user groups are running sessions that you’re welcome to join:
- London Fabric User Group - SQL Bits Special - Ask Me Anything Panel (Gate 4)
- Data Toboggan - Ask the Fabric Experts (Gate 1)
Running from 18:00 - 19:00, each with a panel of experts ready to answer questions or discuss hot topics.
Data Toboggan Slide Preparation ran its FIRST in person User Group hosted by Richard Munn at SQLBits.
The panelist were Richard Munn, Dr Victoria Holt, Cathrine Wilhelmsen, Mark Pryce-Maher, Emilie Rønning and Andy Cutler taking questions from the audience.James Reeves reported
'Data Toboggan User Group Celebrates First In-Person Meetup
🎉 The Data Toboggan user community recently celebrated an exciting milestone: their first-ever in-person meetup. After connecting and collaborating online, members finally had the chance to gather face-to-face and connect with fellow data enthusiasts who share a passion for uncovering insights through data.
💡 A highlight of the gathering was a session focused on Microsoft Fabric, a comprehensive analytics and data platform. Attendees engaged in a lively discussion about how tools like Microsoft Fabric are revolutionizing the field of data analytics and shaping the future of the industry.
🙌 The organizers expressed their gratitude to everyone who made the meetup possible and to all who participated. The energy, enthusiasm, and sense of community at the event were truly remarkable. They look forward to more opportunities for the Data Toboggan user group to connect, both virtually and in person.'
Saturday 16 March 2024
MVP Global Summit 2024
MVP Summit took place in person in Seattle or virtually this year from 12-14 March 2024. I attended virtually this year due to SQLBits being the following week. It was good to catch up with people and engaging in learning and sharing thoughts on new technology. It is always an amazing privilege to be a part of this community, that continually share knowledge with the community to help everyone grow and learn. The image of me was created using the prompt below in Microsoft designer.