Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Saturday, 27 June 2026

The End of the Governance Silo: Building a Unified AI & Data Strategy

There’s a pattern emerging across organizations adopting AI. They stand up an “AI Governance” function. They build a new ethics board. They create new policies for models, prompts, and outputs. And yet, at the same time, they leave Data Governance exactly where it was separate, disconnected, and often treated as a legacy concern. It feels progressive. It looks sensible. But in reality, it creates something far more dangerous, The Governance Silo and with it comes a hidden cost the Silo Tax:

  • Slower deployment
  • Conflicting rules
  • And, most critically, gaps in accountability and control

In truth, AI governance is not a separate discipline. It never has been. AI is not a new domain to govern. It is an extension of the data ecosystem you already have and when those two worlds are separated, governance doesn’t just weaken it fractures.

The Dangerous Illusion of AI Governance as a Separate Discipline

The instinct to separate AI governance often comes from a good place. AI introduces new risks: bias, explainability, ethical use, automated decision-making. These feel different from traditional data concerns like quality, ownership, and classification. But this separation ignores a fundamental truth that AI is entirely dependent on data. Without strong data governance covering lineage, quality, ownership, and control AI governance simply cannot function effectively. You cannot explain an AI decision if you cannot explain the data that shaped it. You cannot ensure fairness in outputs if you cannot trust the inputs. You cannot manage AI risk if the data pipeline itself is opaque and yet, many organizations are trying to do exactly that.

The Transparency Gap: When AI Works… But No One Knows Why

Imagine an AI model making the “right” decision. It performs well. It delivers value. The business is happy. But then comes a challenge from a regulator, a customer, or an internal audit. Why did the model make that decision? This is where the governance silo breaks down. AI governance demands explainabilityBut explainability depends on data lineage knowing where data came from, how it was transformed, and how it was used. Without that lineage, the organization is left with a model that work but cannot be trusted and in an AI-driven world, that is not a technical issue. It’s a business risk. The real question is no longer Does the model perform? It is Can we prove why it behaves the way it does?



The Feedback Loop: When AI Starts Creating Its Own Data

AI doesn’t just consume data. It creates it. Predictions, classifications, synthetic datasets, generated content all of these become new data assets flowing back into the organization and this is where the second major risk emerges. If that AI-generated data is not governed, catalogued, classified, and controlled it begins to operate outside the governance perimeter.

Over time, this creates feedback loops:

  • Models trained on outputs from previous models
  • Synthetic data reinforcing hidden biases
  • Decisions based on increasingly distorted sources

Unchecked, these loops can degrade accuracy, amplify bias, and erode trust in AI systems. This is the point where governance stops being about compliance and becomes about control of reality itself. because if you lose control of your data, you lose control of your AI.

The Blueprint for a Unified Governance Model

So what does a better model look like? Not two parallel governance structures. Not another layer of oversight. But a single, joined-up governance system that treats data and AI as one continuous pipeline. In practice, that means three fundamental shifts.

1. A Shared Language Across Data and AI

The simplest problems are often the most damaging. If your Data team defines “sensitive data” differently to your AI team. If “accuracy” means something different in a model than it does in a dataset. You don’t have governance. You have misalignment. A unified governance model starts with a shared taxonomy, common definitions, classifications, and standards that flow consistently from data creation through to AI output. This is what eliminates conflicting rules and the friction they create.

2. A Single Source of Truth for Data and AI Assets

Most organizations already have a data catalog. Few have one that extends into AI. A unified model requires a single, integrated metadata layer where:

  • Data is tagged, classified, and owned
  • AI datasets are labelled as “AI-ready” or “restricted”
  • Lineage connects data sources directly to model outputs

This creates visibility across the entire pipeline from ingestion to decision and that visibility is what enables trust because governance is not about documentation. It is about knowing what is happening, in real time, across your data and AI ecosystem.

3. One Governance Body, Not Two

The final and often most overlooked shift is organizational. Many organizations create separate AI ethics boards alongside existing data governance councils. This is a mistake. Effective governance requires joined-up decision making, where:

  • Data sources are assessed alongside model outputs
  • Ethical considerations are evaluated across the full lifecycle
  • Accountability is defined end-to-end

A cross-functional governance council bringing together business, data, AI, risk, and compliance is already the established model for governing enterprise data.  The answer is not to create another council. It’s to evolve the one you already have.

From Silos to Systems: A Shift in Thinking

The organizations that struggle with AI governance are often those still thinking in layers:

  • Data layer
  • AI layer
  • Governance layer

But in reality, these are not separate stacks. They are one system.

Data flows into models.
Models generate outputs.
Outputs become new data.

And governance must sit across that entire loop. This is why leading organizations are moving toward a single governance umbrella one that integrates data and AI governance to create consistency, transparency, and enforceable controls because in a world of continuous data and continuous automation, governance can no longer be fragmented. It has to be continuous too.

Conclusion: The Road to Scalable AI

There’s a tendency in AI discussions to focus on the models, the algorithms, the tools and the capabilities. But that’s not where success will be determined. The organizations that win the AI race will not be those with the most advanced models. They will be the ones with the most trusted, controlled, and governed data pipelinesBecause ultimately AI is the car. Data Governance is the road. And no matter how powerful the car is you cannot win a race on a road full of potholes.


Wednesday, 24 June 2026

Microsoft Purview Security Tooling Blog Series

The biggest data security risk in Microsoft 365 isn't external attackers. It's the controls you think you've already implemented. Most organisations believe their data is secure because they have Microsoft 365. The reality is often very different. Over the last few weeks, I've written a series exploring the Microsoft Purview data security capabilities that organisations regularly purchase but don't fully implement, configure, or operationalise.


The common assumption is that data security is a technology problem. In practice, it's a visibility, governance, and control problem. Knowing where your sensitive data is, who has access to it, how it moves, and how you respond when something goes wrong requires much more than switching on a licence.

The series explores:

🔹 Information Protection – classifying and protecting what matters
🔹 Data Loss Prevention – turning classifications into enforceable controls
🔹 Insider Risk Management – understanding risky behaviours before they become incidents
🔹 Information Barriers – controlling who can collaborate with whom
🔹 Data Security Investigations – turning alerts into evidence and action
🔹 DSPM for AI and Data – exposing hidden risks and overexposure across your estate

If you're working in data governance, security, compliance, or responsible AI, these capabilities are becoming increasingly important as organisations seek to balance productivity with protection. The challenge isn't buying the technology. It is implementing the controls that make the technology effective.



You can read the full series here:


References

The Reality of Data Security in M365 (Purview Protection)

Microsoft Purview Information Protection: The Control Most Organizations Think They Already Have 

Microsoft Purview Information Barriers: Controlling Who Can Work With What

Microsoft Purview Data Security Investigations: When Alerts Become Evidence

Microsoft Purview DSPM: Unmasking Your True Data Risks

Microsoft Purview Data Loss Prevention: Where Classification Becomes Control

Microsoft Purview Insider Risk Management: When Data Movement Becomes Behaviour


Tuesday, 23 June 2026

Scaling at Cloud Speed: Moving from Manual Checklists to CDMC Automation

For years, data governance has relied on a familiar model: committees, policies, spreadsheets, and periodic reviews. It worked when data moved slowly, systems were predictable, and change could be managed through human oversight but that world no longer exists.

Today, data is created, transformed, and consumed continuously across cloud platforms. AI models are trained on that data in near real time. Decisions happen in milliseconds. And yet, in many organizations, governance is still anchored in manual controls and retrospective checks. There’s an uncomfortable truth emerging: human-in-the-loop governance cannot scale to cloud speed. The question is no longer whether governance is important. It’s whether governance can keep up and this is where the industry has been quietly converging on a new answer.


The Missing Link: Why CDMC Exists

The EDM Council didn’t create the Cloud Data Management Capabilities (CDMC) framework to replace existing governance thinking. It created it because something was missing. Frameworks like DAMA-DMBOK remain foundational they define what good governance looks like across domains such as data quality, metadata, and security. But they were never designed for an environment where:

  • Data is distributed across cloud services
  • Access decisions are made dynamically via APIs
  • Policies must be enforced continuously not reviewed quarterly

CDMC fills that gap. It translates governance intent into 14 concrete, measurable controls, designed specifically for cloud environments, with a clear emphasis on automation and continuous enforcement

In other words, it moves governance from principle to execution.

From Policy to Enforcement: What Automation Really Means

The power of CDMC is not just that it defines controls, it defines controls that can be automated, monitored, and evidenced. This is a fundamental shift. Traditional governance asks: Do we have a policy? CDMC asks Is this control being executed automatically, right now, and can we prove it? Across its 14 controls spanning governance, classification, privacy, lifecycle, and architecture, CDMC embeds governance directly into the data pipeline itself. 

The impact of that shift becomes most visible when you look at a few critical controls.

Control #1: Governance Accountability in an AI World

One of the simplest, yet most powerful, requirements is this: every sensitive data asset must have a defined owner. This is not new in principle. DAMA has long emphasised stewardship and accountability but CDMC enforces it through automation ensuring that ownership fields are populated in data catalogs, monitored, and escalated when missing. In an AI-driven context, this becomes critical. If a model produces biased or incorrect outputs, the question is no longer abstract. It becomes operational:

Who owns the data that trained this model?

Without automated ownership tracking, accountability collapses. With it, organizations can trace responsibility back to the source.

Control #11: Data Privacy that doesn’t rely on Humans

Privacy has always been a governance priority. But manual processes, reviews, sign-offs, compliance checklists are no longer sufficient when data is constantly moving and being repurposed. CDMC embeds privacy into the flow of data itself. It requires automated triggers, such as data protection impact assessments for personal data, ensuring that privacy controls are activated consistently and at scale. This matters even more in AI scenarios, where training datasets can be assembled from multiple sources rapidly. You simply cannot rely on someone remembering to remove PII before it enters a pipeline. You need a system that ensures it never gets there in the first place.

Control #12: Stopping Data Swamps before they start

Data quality has always been a known challenge. What’s changed is the speed at which poor-quality data propagates. In traditional environments, issues might take weeks to surface. In AI pipelines, they surface instantly and at scale. CDMC addresses this by requiring data quality measurement as a built-in control, applied at ingestion and continuously monitored through metrics. This is a subtle but profound shift. Instead of discovering problems downstream, organizations prevent them upstream. Instead of cleaning data after the fact, they stop poor data from entering the ecosystem at all. This is how you avoid the modern equivalent of a data warehouse problem: the AI-era data swamp.



The joined-up Framework: DAMA as Constitution, CDMC as Enforcement

It’s tempting to position CDMC as a replacement for traditional frameworks but that misses the point. The real strength comes from how they work together.

  • DAMA-DMBOK defines the principles of governance, the constitution that outlines what good looks like
  • CDMC defines the execution, the enforcement layer that ensures those principles are actually applied

Where DAMA says:

Data must be secure.

CDMC operationalises it as:

Security controls must be enabled, monitored, and evidenced automatically for all sensitive data.

Where DAMA defines accountability, CDMC ensures accountability exists in the system. Where DAMA defines quality, CDMC ensures quality is measured continuously. This is the bridge many organizations have been missing.

From Governance Theatre to Operational Reality

There is a growing gap between organizations that talk about governance and those that have embedded it into their platforms.

Manual governance processes, however well designed, become governance theatre in cloud environments:

  • Policies exist, but are not enforced
  • Ownership is defined, but not maintained
  • Controls are documented, but not executed

CDMC changes the conversation. It forces organisations to move from:

  • Periodic assurance → continuous control
  • Documentation → instrumentation
  • Manual oversight → automated guardrails

And that’s what makes it so relevant in the age of AI.

AI doesn’t remove the need for governance, it increases it exponentially. But it also exposes the limits of traditional approaches. You cannot govern at cloud speed with spreadsheets, committees, and retrospective checks. You need governance that is:

  • Embedded
  • Automated
  • Measurable
  • Continuous

That’s the shift CDMC represents. Not a new theory of governance but a new way of making governance real.

References

Saturday, 20 June 2026

Microsoft Purview Information Protection: The Control Most Organizations Think They Already Have

The Reality: Most organizations think they have data classification in place. Very few have it working as a system.

Step into almost any enterprise environment, and you will find a similar story: a data classification policy exists on paper, some sensitivity labels are published, and users have completed basic training. It looks complete.

But the live telemetry tells a different story. Labels are applied inconsistently, vast swaths of data remain entirely unclassified, and sensitive intellectual property moves freely across Exchange, Teams, and SharePoint with zero control attached to it.

The issue is not that Information Protection is missing; it is that it has never been treated as a foundational, systemic control. In a modern data estate, that distinction changes everything.

What It Is vs. What It Actually Does

The Context Layer

Microsoft Purview Information Protection (MPIP) is the architectural baseline that allows organizations to discover, classify, label, and protect sensitive data at the point of creation and throughout its entire lifecycle.

Its primary purpose isn't just to add visual stamps to documents; it is to embed permanent, cryptographic context directly into the file metadata. Without this foundation, downstream security controls like Data Loss Prevention (DLP) and Insider Risk Management (IRM) are essentially operating blind, forced to guess the intent and value of the data they are monitoring.

The Core Technical Pillars

At an engineering level, Information Protection relies on three deeply integrated inspection and enforcement mechanisms:

Diagram – Information Protection as the Control Hub

1. Sensitive Information Types (SITs)

SITs are the pattern-matching engines used to detect highly structured data such as credit card numbers, government identifiers, or bank routing codes. They utilize regular expressions (regex) combined with precise proximity algorithms, confidence thresholds, and cryptographic checksum verifications to minimize false positives.

2. Trainable Classifiers

To tackle unstructured data (such as legal contracts, source code, or internal memos), Purview moves beyond basic pattern matching. Trainable Classifiers utilize machine learning to evaluate the overall semantic context and meaning of a document. By training the engine on specific organization-centric examples, it learns to classify content based on what the document is, rather than just the specific keywords it contains.

3. Sensitivity Labels (The Action Layer)

Labels are where passive classification transforms into active protection. When a sensitivity label is applied either manually by an end-user or automatically via system policy it writes clear-text metadata attributes into the file properties. Crucially, it can trigger native Azure Information Protection (AIP) actions, including:

  • Persistent, identity-driven encryption (AES-256) that stays with the file even when exfiltrated outside the corporate network.

  • Strict digital rights management (DRM) configurations (e.g., blocking printing, copying, or forwarding).

  • Dynamic visual markings, such as mandatory headers, footers, or watermarks.

The Root of the Security Ecosystem

Information Protection cannot be treated as an isolated standalone tool. It serves as the primary telemetry feeder for the entire Microsoft Purview and Defender security stack:

  • Data Loss Prevention (DLP): Uses sensitivity label metadata as its most reliable trigger to block external sharing, USB copies, or unauthorized cloud uploads.

  • Insider Risk Management (IRM): Leverages labels to immediately elevate a user's risk score if they begin downloading or staging highly classified data.

  • Data Security Posture Management (DSPM): Aggregates label distribution metrics to map the organization's overall vulnerability and exposure trends across multi-cloud estates.

  • Generative AI & Copilot Guardrails: Serves as the ultimate data safety valve. If an organizational file is labeled Highly Confidential, Microsoft 365 Copilot will natively respect that label's encryption and access policies ensuring sensitive data is never synthesized into a response for an unauthorized user.

The Business Problem It Solves

When an enterprise lacks a unified classification system, it faces a fundamental crisis: it does not know what its data actually is. This visibility gap cascades into critical business risks:

  • Data Oversharing: Highly proprietary data is treated exactly like low-risk administrative data, leading to accidental public or tenant-wide exposure.

  • Policy Fatigue: Security teams deploy overly broad, generic DLP rules that block legitimate business workflows, frustrating users and driving them toward unmanaged Shadow IT workarounds.

  • Unsafe AI Adoption: Organizations delay deploying productivity tools like Copilot because they cannot guarantee that sensitive internal HR data or financial forecasts won't accidentally surface in peer-level prompts.

Information Protection solves this by injecting context directly into the data payload, allowing automated controls to act with surgical precision.

Strategic Implementation: Moving from Policy to System

The most common failure point for data labeling projects is over-engineering the technical taxonomy before aligning with the business. A successful, sustainable deployment requires a highly disciplined, iterative approach:

1. Simplify the Taxonomy

Avoid the trap of creating dozens of hyper-specific labels that confuse end-users. Start with a lean, universally understood baseline such as Public, General, and Confidential. Ensure each tier has an airtight business definition before attempting to configure them in the admin center.

2. Transition from Manual to Automated

Do not place the entire burden of data security on the end-user. Utilize service-side auto-labeling policies to automatically apply sensitivity classifications when data matches high-fidelity SITs or Trainable Classifiers at rest within SharePoint, OneDrive, and Exchange.

3. Match Classification with Downstream Enforcement

A label that only applies a visual watermark provides very little protection. Ensure that your classification tiers are explicitly mapped to corresponding DLP blocking policies and conditional access requirements so that classification directly dictates control.

Conclusion

The primary roadblock to robust data security is rarely the underlying software; it is the architectural design.

Having a passive data protection policy means nothing if it is not operationalized across the entire digital estate. When configured as a unified, interconnected system, Microsoft Purview Information Protection turns data from an unmanaged compliance liability into a secure, searchable, and fully trusted business asset.

References and learning

https://learn.microsoft.com/en-us/purview/information-protection
https://learn.microsoft.com/en-us/purview/sensitivity-labels
https://learn.microsoft.com/en-us/purview/trainable-classifiers

Friday, 19 June 2026

Microsoft Purview Information Barriers: Controlling Who Can Work With What

The Reality: Most organizations rely on policy to dictate how people should collaborate. But collaboration tools are designed to break down barriers, not enforce them. Without structural technology controls, ethical walls remain a myth.

Data security is usually framed around protecting data from leaving the organization. But there is a secondary, structural risk that sits underneath data transfer: preventing unauthorized interactions entirely. Sometimes, the risk isn't just about a file being leaked; it is about the wrong two teams collaborating in the first place. Whether it is an individual having visibility into high-stakes corporate conversations they shouldn't be part of, or information flowing between internal groups that must remain separated for legal, ethical, or regulatory reasons, traditional DLP cannot fix this after the fact.

Ethical walls must be built natively into the collaboration layer itself.

What It Is vs. What It Actually Does

The Structural Guardrail

Microsoft Purview Information Barriers (IB) is an identity-driven capability that restricts communication and collaboration between defined segments of users across Microsoft 365.

Unlike other Purview components, Information Barriers does not inspect data classification labels or scan file contents. Instead, it enforces structural, organizational boundaries within the collaboration platform, preventing prohibited connections from ever occurring.

The Technical Mechanics

At an engineering level, Information Barriers shifts security from a reactive monitoring loop into a preventative design control across three technical steps:




1. Identity Segment Definition

The foundation of any barrier relies on the absolute accuracy of your identity data. Users are grouped into distinct organizational Segments using specific, directory-level attributes pulled directly from Microsoft Entra ID (such as Department, JobTitle, MemberOf, or UsageLocation).

2. Policy Logic Configuration

Once segments are defined, administrators configure barrier policies to establish communication permissions. These policies dictate three distinct operational modes:

  • Blocked Interactions: Segment A cannot communicate with Segment B (e.g., Investment Banking vs. Research).

  • Isolated Interactions: Segment C can only communicate with Segment C, completely cut off from the rest of the company.

  • Assisted Interactions: Segment D can only communicate with specific designated segments, but no one else.

3. Deep Service-Level Interception

Information Barriers does not just block a file transfer; it completely alters the user experience natively within Microsoft Teams, SharePoint, and OneDrive:

  • Microsoft Teams: Restricts 1:1 chats, group chats, and channel invites between blocked segments. If a user tries to add a blocked colleague to a chat, the action is hard-blocked.

  • SharePoint & OneDrive: When a SharePoint site or OneDrive folder is provisioned, it inherits the segment properties of its owner or group. Users in unauthorized segments are explicitly blocked from accessing the site or viewing shared links.

  • Discovery & Presence: Blocked users cannot see each other’s active presence status, nor will they appear in the Microsoft 365 People Picker search results.

How It Fits Into the Security Ecosystem

While the rest of the Microsoft Purview suite monitors data and behavioral signals, Information Barriers defines the core architectural layout where those tools operate.

  • Data Loss Prevention (DLP): DLP policies operate within the strict boundaries already enforced by Information Barriers, providing double-layered defense-in-depth.

  • Insider Risk Management (IRM): Uses barrier segments to establish normal baseline behaviors, instantly flagging an anomaly if a user attempts to bypass an organizational boundary.

  • Data Security Posture Management (DSPM): Leverages these structural segments to evaluate overall data exposure maps across disparate corporate business units.

The Critical AI Frontier

As generative AI tools like Microsoft 365 Copilot and AI agents are introduced to the enterprise, Information Barriers serves as a vital safeguard.

If an AI system can instantly surface and summarize data from across the entire corporate estate, access control lists (ACLs) alone are no longer enough. Information Barriers ensures that your underlying communication boundaries remain intact. Because Copilot natively respects the identity segments defined by IB, it prevents an AI instance from accidentally surfacing or synthesizing information from a blocked segment to a user on the other side of an ethical wall.

Real-World Business Use Cases

Information Barriers converts theoretical ethical frameworks into technical realities for highly regulated sectors:

  • Financial Services: Enforcing absolute segregation between insider trading groups and corporate advisory teams to comply with global market manipulation and conflict-of-interest regulations.

  • Legal Practices: Preventing conflicts of interest by blocking legal teams representing opposing clients from accidentally discovering case files or chatting in shared digital workspaces.

  • Mergers & Acquisitions (M&A): Establishing temporary, high-security data islands to ensure early-stage deal teams can collaborate confidentially without leaking pre-acquisition details to the broader enterprise.

Strategic Deployment: Getting Started Properly

Because Information Barriers fundamentally changes how users collaborate, successful implementation is an operational challenge rather than a technical one.

1. Audit Identity Cleanliness First

Before writing a single policy rule, validate that your Microsoft Entra ID attributes are clean, standardized, and synchronized with your HR management systems. If user attributes are out-of-date, you risk blocking legitimate workflows or leaving gaps in your ethical walls.

2. Map Use Cases Prior to Code

Do not attempt a massive, company-wide rollout on day one. Sit down with legal, compliance, and business unit leaders to define exactly which groups require absolute isolation and why. Document these boundaries on paper before translating them into Purview rules.

3. Deploy and Validate Phase-by-Phase

Start by deploying a barrier policy between two small, highly specific pilot segments. Monitor operational workflows, verify that Teams and SharePoint sites adhere to the rules, and gather user feedback before expanding enforcement across full business units.

Conclusion

Traditional data protection relies heavily on tracking files and monitoring user actions. Information Barriers operates one step earlier: it designs out the risk entirely.

When your business model, compliance framework, or ethics demand clear separation between teams, Microsoft Purview Information Barriers embeds that separation directly into the daily workspace. It transitions compliance from an idealistic policy guide into an automated, unyielding technical reality.

References and learning

Microsoft Purview Information Barriers overview

Set up Information Barriers in Microsoft 365 

Sunday, 14 June 2026

Microsoft Purview Data Security Investigations: When Alerts Become Evidence

The Reality: An alert tells you something happened, it doesn’t tell you what it means, and very few organizations can actually prove the full extent of the impact.

When a policy triggers or behavior deviates, the immediate questions from leadership are always the same: What data was exposed? Who interacted with it? How far did it spread? In most security operations centers (SOCs), answering these questions triggers a chaotic, manual scramble. Analysts open multiple tool sets, export disjointed logs, and attempt to piece together fragments of data activity, hoping they haven't missed a critical pivot point.

Detection tells you a boundary was crossed. Data Security Investigations tells you the actual narrative behind the breach.

What It Is vs. What It Actually Does

The Definition

Data Security Investigations in Microsoft Purview is an integrated, AI-driven capability that allows organizations to identify, analyze, and forensically reconstruct data security incidents within a structured workspace. It acts as the central hub where raw telemetry from Data Loss Prevention (DLP), Insider Risk Management (IRM), and Endpoint activity is synthesized into concrete context and legally defensible evidence.

The Technical Lifecycle

Rather than forcing analysts to audit passive text-based log files, this capability allows teams to investigate the actual content involved across three distinct stages:



1. Targeted Identification (Scoping the Incident)

Investigations rarely start from scratch; they are initiated directly from high-fidelity triggers like a DLP incident, an IRM case, a Microsoft Defender alert, or a targeted search across the estate. Once a case is initialized, the engine automatically aggregates the relevant data footprint across the entire Microsoft 365 ecosystem including emails, SharePoint libraries, OneDrive content, Teams conversations, and conversational histories from Microsoft 365 Copilot.

2. Semantic Content Analysis (Deep Contextual Insights)

This is where the platform moves beyond legacy keyword matching. Data Security Investigations leverages built-in machine learning and semantic parsing to analyze the collected content itself:

  • Vector-Based Semantic Search: Locates conceptually relevant data even if exact keyword terms were omitted or obfuscated.

  • Risk Categorization: Automatically classifies content by subject matter, regulatory framework, and severity level.

  • Conceptual Grouping: Identifies structural and thematic relationships across disparate documents or communication threads.

Instead of merely asking, "Where did this file go?" investigators can answer, "What exact sensitive concepts exist within this extracted data, and what is our true liability footprint?"

3. Forensic Remediation (Closing the Loop)

Within a unified, audited case view, investigators can correlate user behavioral timelines with direct data access, uncover hidden document relationships, and securely collaborate across internal silos (Security, Legal, HR, and Compliance).

From there, definitive mitigation actions can be executed natively such as revoking file permissions, deleting exposed content from target locations, or escalating the findings directly into formal legal workflows or eDiscovery Premium.

The Unified Security Control Loop

Data Security Investigations serves as the ultimate analytical core of the Microsoft Purview ecosystem. It is the mechanism that transitions your posture from simple detection to decisive interpretation.

Connected SystemThe Mutual Telemetry Exchange
Data Loss Prevention (DLP)Investigations ingest DLP alerts to analyze the raw data payload, using the findings to refine DLP detection rules and eliminate false positives.
Insider Risk Management (IRM)Enriches behavioral risk cases by overlaying deep content-level intent onto user activity timelines.
Microsoft Sentinel & DefenderExtends traditional infrastructure/endpoint alerts into comprehensive, data-centric root-cause analyses.
Data Security Posture Management (DSPM)Feeds incident outcomes back into visibility dashboards to update the organization's overarching data vulnerability maps.
Compliance & Legal WorkflowsPackages verified digital evidence into structured, chain-of-custody-compliant formats for regulatory or judicial review.

Solving the Enterprise Operational Crisis

The primary bottleneck for modern security teams isn't a lack of detection; it is scale. The overwhelming volume of data and alerts forces analysts into manual verification cycles that can stretch from hours into weeks. This lag introduces severe operational hazards:

  • Delayed containment windows during active data exfiltration.

  • Incomplete or inaccurate definitions of your data breach blast radius.

  • An inability to provide a defensible, audited timeline to regulatory authorities or insurance auditors.

Data Security Investigations mitigates this by replacing disjointed forensics with a scalable, structured workflow. It automates data collection, leverages AI to surface hidden risks, and dramatically compresses the mean time to resolve (MTTR) complex data incidents.

Strategic Guidance: Getting Started Properly

To prevent an investigation workflow from becoming overwhelming or unstructured, organizations should implement the following deployment framework:

1. Maintain a Trigger-Led Workflow

Never use the investigation engine as a blind, open-ended search utility. Every case should possess a clear entry point tied directly to an active DLP infraction, an elevated Insider Risk threshold, or a specific, tightly scoped risk scenario.

2. Practice Iterative Scoping

Avoid pulling massive, unrestricted data sets into a single case on day one. Start with a highly focused, targeted dataset based on the immediate incident triggers, and iteratively expand the search scope only as semantic analysis reveals new conceptual leads.

3. Establish Cross-Functional Governance

Because data investigations inherently touch sensitive intellectual property and employee privacy, establish a clear, cross-functional operating model early. Define explicit Role-Based Access Controls (RBAC) separating the security analysts who triage alerts from the compliance or legal officers who hold Content Viewer permissions to review the actual underlying data.

Conclusion

Most organizations operate under the assumption that security investigations are merely about finding where a file went. Modern investigation are about understanding the systemic risk contained within that data. Without a centralized data investigation capability, enterprise defense relies on fragmented tools, manual correlation, and educated guesswork. Microsoft Purview Data Security Investigations closes this gap completely providing a clear, defensible path from alert, to understanding, to definitive containment.

References and learning

Learn about Data Security Investigations (Microsoft Learn)

Microsoft Purview overview (Microsoft Learn) 

Friday, 12 June 2026

The Foundations of Intelligence: Why Your AI is Only as Good as Your DAMA Score

There is a quiet but critical misconception at the heart of today’s AI boom. Organizations believe they are investing in artificial intelligence. In reality, they are investing in data and often, that data isn’t ready. AI is a sophisticated engine. But it doesn’t run on innovation, hype, or vendor capability. It runs on data. And if that data is incomplete, inconsistent, poorly understood, or ethically questionable, the outcome isn’t just suboptimal it’s dangerous.

We are starting to see this play out at scale. AI projects stall, models produce biased outputs, and trust erodes. The narrative often focuses on the technology, but the root cause is rarely the model itself. It is almost always the data. Or more precisely: the absence of effective data governance. The uncomfortable truth is this for most AI failures are not AI failures at all. They are data governance failures in disguise. Frameworks like DAMA-DMBOK2 have spent years defining what good looks like in data management. What has changed is not the principles, but the stakes. In a reporting world, weak data might produce a misleading dashboard. In an AI-driven world, it can drive automated decisions at scale. This is why the conversation needs to shift from AI readiness to something far more grounded: data maturity.


The Four DAMA Pillars That Actually Matter for AI

DAMA-DMBOK outlines eleven knowledge areas, but when it comes to AI, four stand out as foundational. These are not optional capabilities. They are prerequisites.

1. Data Quality: Where AI Success Begins (and Ends)

For decades, organizations have lived with the idea of good enough data.

Reports can tolerate missing fields. Dashboards can work around anomalies. Humans are remarkably good at compensating for imperfect information. AI is not. An AI model does not “interpret” data in context—it learns patterns from it. If those patterns are flawed, biased, or inconsistent, the model will embed those flaws into its outputs. Worse, once learned, these patterns are incredibly difficult to remove. Dimensions like accuracy, completeness, and consistency are no longer operational concerns; they are existential ones.  The principle of garbage in, garbage out has never been more relevant. Even the most advanced models will produce unreliable results if the data they are trained on is flawed. This is not theoretical. Organizations are already seeing AI initiatives fail due to poor data quality, with research indicating that only a small fraction of companies believe their data is sufficiently ready for AI. Data Quality is not just a pillar. It is the foundation.

2. Metadata Management: The Missing Layer of Intelligence

If data quality determines whether AI works, metadata determines whether it makes sense. Metadata is often misunderstood as technical documentation, schemas, tables, field names. But for AI, it is far more than that. It is context. AI needs to understand:

  • What the data represents (business meaning)
  • Where it came from (lineage)
  • How it should be used (rules, classifications)
  • When it was last updated (timeliness)

Without this context, even the most advanced models become guesswork engines.

This is particularly critical for large language models interacting with enterprise data. These models are powerful, but they struggle with ambiguity and organizational nuance. Without metadata, they cannot distinguish between similar concepts, interpret domain-specific language, or validate the “truth” of a data point. Metadata effectively becomes the translation layer between human intent and machine interpretation. And yet, it is one of the most neglected areas in AI initiatives. Many organizations rush into model development while overlooking metadata strategy only to discover later that their AI cannot scale beyond experimentation. There is a growing recognition that metadata is not just supportive it is determinative. Without it, AI initiatives falter, regardless of model sophistication. 

3. Data Architecture: Designing for Machines, Not Just Reports

Traditional data architectures were designed for people.

Data warehouses centralised structured data for reporting and dashboards slow, stable, and human-interpreted. But AI does not consume data in the same way. It requires real-time access, integration across sources, and the ability to handle both structured and unstructured information. This is where modern architectural patterns come into play. Concepts like Data Fabric and Data Mesh, both explored within DAMA, represent a shift from centralisation to connectivity. Instead of moving data into a single repository, these approaches focus on making data accessible, governed, and usable wherever it resides. A data fabric, for example, creates a unified layer across distributed systems, enabling real-time integration and governance without physically moving data. This matters because AI thrives on:

  • Diverse data sources
  • Real-time signals
  • Context-rich environments

Traditional warehouses, designed for retrospective analysis, struggle to meet these demands. Modern architectures are not just technical upgrades, they are enablers of AI capability. If data cannot flow, AI cannot function.

4. Data Security and Ethics: The Line You Cannot Cross

The final pillar is where data governance transitions into AI governance. AI models do not inherently understand privacy, consent, or regulatory boundaries. They will learn from whatever data they are given. If that data includes sensitive, restricted, or biased information, the consequences can be severe. DAMA has long emphasised data security, privacy, and stewardship. In the AI era, these are no longer compliance exercises—they are ethical imperatives. Regulations like GDPR are not just legal constraints; they define the boundaries of what is acceptable in data usage. If an organization does not have clarity over data ownership, access rights, and usage permissions, it cannot claim to be operating ethical AI. More broadly, this is about trust. Without governance, organizations risk:

  • Embedding bias into automated decisions
  • Exposing sensitive data through AI outputs
  • Losing control over how data is used and reused

Strong governance ensures that AI is not only effective, but also accountable, transparent, and fair. 

The Real Question: How AI-Ready Are You?

For the C-suite, the implication is clear.

AI readiness is not about how many models you have deployed. It is not about how advanced your platform is. It is not even about how much data you hold.

It is about how well that data is governed.

Frameworks like DAMA-DMBOK provide a structured way to assess this. They define maturity across areas like quality, metadata, architecture, and security. And that maturity directly correlates to AI risk. If your organization is:

  • Immature in data quality → expect unreliable AI outcomes
  • Weak in metadata → expect confusion and inconsistency
  • Fragmented in architecture → expect scalability issues
  • Unclear on governance → expect ethical and regulatory risk

In other words, your DAMA maturity is your AI readiness. This is not theoretical. Research consistently shows that organizations struggle to make AI work not because of technology limitations, but because they lack the data foundations to support it. 

Final Thought: The Age of Data Governance Has Arrived

We are entering a phase where data governance is no longer a background function. It is becoming the defining capability of successful AI organizations. The companies that succeed with AI will not be those with the most advanced models. They will be those with the most disciplined data practices, those who understand that intelligence is not created by algorithms, but enabled by trust in data. AI is not a shortcut around governance. It is the ultimate test of it.