The Ontology Engineering Challenge is Not Building It, but Mirroring the Delta

Minimal Ontology Principle, Ontological Convergence, and What They Mean for Enterprise AI Platform Engineering

Jun 09, 2026

Every language model has built a map of your business concepts. The real engineering challenge isn’t constructing an ontology, but deciding where your enterprise’s meaning diverges from the one the model already has, and correcting only that.

TOC

Two ontologies walk into a bar
The source of truth problem
How the data platform corrects the model
The feedback loop that makes the system learn
The minimal ontology principle
What this implies for your architecture

Let’s start with the simplest possible idea. What does the word “revenue” mean?

You might say: the total income from goods sold or services rendered. That’s fine. But “revenue” in a model’s mind doesn’t float in a vacuum. It sits in geometric proximity to “cost,” “margin,” “booking,” “ARR,” “GAAP,” “recognised,” “deferred.”

The model has learned, from reading millions of financial documents, analyst reports, accounting textbooks, and earnings call transcripts, that these concepts exist near each other. That proximity is the meaning.

Meaning, in a language model’s world, is geometry.

Now ask yourself: what does “revenue” mean in your company?

If you’re a SaaS business, “revenue” might technically mean contracted ARR but operationally mean recognised MRR. If you’re a marketplace, it might mean take-rate applied to GMV. If you’re a professional services firm, it might be a function of hours billed against retainer.

The model has a reasonable prior while You have an idiosyncratic truth.

This gap between the model’s prior and your enterprise’s reality is where the interesting engineering happens.

Concept: Latent Ontology

The structured, geometric representation of how concepts relate to each other inside a large language model’s high-dimensional embedding space. It was not designed. It was inferred, distilled from the full weight of human text, and it is remarkably coherent.

A Latent Ontology is not a knowledge graph you can browse. It is a field of gravitational relationships between concepts, existing in dimensions you cannot visualise. But it is real, measurable via probing, and surprisingly aligned with how experts reason about domains.

Tune in to a conversation on the concepts discussed in this article 🎧
0:00
-11:44

Two Ontologies Walk into a Bar

Modern enterprise AI architectures have two distinct layers, and each layer develops or exposes its own understanding of what your business concepts mean.

The first is the agentic layer: the LLM-powered components that reason, plan, summarise, draft, and decide. This layer has a Latent Ontology. It came pre-loaded.

The model already knows what a “customer” is, what “churn” implies, what “pipeline” means in a sales context versus a data engineering context. It learned this before you hired it.

The second is the data platform layer: your data warehouse, semantic layer, metrics catalog, or data products. This layer has what we might call a Structural Ontology: a formally defined, explicitly curated mapping of what your specific concepts mean, how they’re calculated, and where they come from.

Concept: Structural Ontology

The explicit, versioned, human-curated definition of your enterprise’s semantic layer. It exists in your dbt models, metric store, data catalog, or OpenAPI specs. It says: “In this company, ‘active user’ means a user who completed at least one transaction in the last 28 days, excluding internal test accounts, denominated in the cohort’s activation timezone.”

A Structural Ontology is opinionated, auditable, and slow to change. It reflects institutional decisions: finance sign-offs, legal definitions, board-approved metrics. It is your source of truth.

Here’s what many architecture discussions would overlook: these two ontologies are almost entirely compatible. The vast majority of what your data platform says about your business aligns with what the AI layer already assumes. Where they diverge is a small but load-bearing set of idiosyncratic definitions.

The real project isn’t building an enterprise ontology for your AI. It’s finding the specific points where your meaning deviates from the model’s prior and correcting only those deviations.

The Source of Truth Problem

If you ask a naïve AI agent “what was our Q3 revenue?”, it will do something quite dangerous: it will use its Latent Ontology to interpret what “revenue” should mean, probably something like GAAP revenue, and then go looking for data that matches that interpretation.

If your company books revenue on contract signature date rather than delivery date, the agent has introduced an error which is challenging to detect, and not just at scale. Because it was wrong about what revenue means here. It is almosts always correct about what it generally means.

This is the source of truth problem in agentic AI. The latent ontology is confident, fluent, and as a result, will not warn you when it’s working from the wrong map.

The Structural Ontology, your data platform’s semantic layer, is the correction mechanism. But correction only works if the agentic layer can actually access and consume it.

How the Data Platform Corrects the Model

Correction happens through context injection. The unified data platform exposes its Structural Ontology through a semantic layer, a metric store, a system prompt, a tool schema, a retrieval system… and the agentic layer reads it into its working context.

The model’s Latent Ontology doesn’t change. But its active interpretation of your domain gets overridden at the points of divergence.

Think of it like a local coordinate system. The model has a global map. Your semantic layer installs a local coordinate system on top of it. “Revenue” in the global map might point north. Your local system says: in this jurisdiction, north is rotated seventeen degrees. Every downstream calculation adjusts accordingly.

The practical implementation looks like this.

Your semantic layer defines a metric: net_revenue. It provides a natural-language description: “Total billed revenue minus refunds and credits, recognised on cash receipt date, excluding intercompany transactions.”
The agent reads this definition as part of its tool context. When a user asks about revenue, the agent uses your definition along with its prior global map.

But this is not fine-tuning. Fine-tuning updates the model’s weights — it changes the Latent Ontology permanently, expensively, and often irreversibly. Context injection is cheaper, auditable, and reversible. When your CFO changes the revenue recognition policy, you update the semantic layer. The agent adapts at the next request. No retraining required.

The Feedback Loop that Makes the System Learn

The relationship between the Latent Ontology and the Structural Ontology is not static. It is a feedback loop, and the loop runs in both directions.The data platform corrects the model’s priors. But over time, the model’s behaviour also surfaces gaps in the data platform’s definitions. When an agent consistently misinterprets “customer,” that is a signal.

The Structural Ontology hasn’t disambiguated the term, but locally manifested the context of the query.

Concept: Ontological Convergence

The process by which the Latent Ontology of an agentic system and the Structural Ontology of a data platform progressively align through iterative correction and feedback. The model’s behaviour reveals gaps in the structural definitions; the structural definitions correct the model’s priors; repeat.

A system that has reached Ontological Convergence behaves as if the enterprise’s idiosyncratic meanings have been internalised because the correction surface is comprehensive, consistent, and machine-readable.

In mature implementations, this feedback loop gets formalised.

Agent queries and their semantic interpretations are logged.
Discrepancy patterns (places where the agent’s inferred meaning and the user’s intended meaning diverge) are surfaced to data platform teams as definition candidates.
The semantic layer grows not from top-down ontology engineering workshops, but from the bottom-up evidence of where interpretation fails.

This is a significant inversion of how enterprise data governance has historically worked. Instead of a committee deciding what concepts mean and hoping the rest of the organisation, including AI systems, adopts those definitions, the AI system’s confusion becomes the instrument by which the organisation discovers which concepts need formal definition in the first place.

Today, we have one of the most powerful tools at our side: AI. This was not the case before, which is when it made sense to toil and grind on every aspect of the engineering. But that has changed today, and smart work no longer just applies to the individual, but to entire systems. Let AI help the system and let the system help it.

The Minimal Ontology Principle

This reframing has a practical implication that should excite anyone who has suffered through a multi-year enterprise ontology project: you do not need to define everything. Language has a geometry, which is almost how it came to be. And artificial intelligence infers a lot of meaning through that very angle.

The Latent Ontology has already done most of the work. It understands what a customer is, what an invoice is, what a churn event implies. Its definitions are imperfect but serviceable for a large fraction of everyday queries. The Structural Ontology only needs to be as large as the set of deviations between your enterprise’s usage and the model’s prior.

Call this the Minimal Ontology Principle: define only the delta.

Invest in the Structural Ontology proportional to the divergence, not proportional to the total conceptual surface area of your business.

This means that before building any semantic layer artefact, the right first question is: does the model already have a good-enough definition for this concept in our context? If yes, document it and move on. If not, that’s where engineering effort is warranted.

The exercise of finding that delta (probing where the model’s prior breaks down in your specific domain) is itself enormously valuable. It forces explicit conversations about what concepts actually mean in your organisation, conversations that data governance programs have been trying and failing to provoke for decades. The AI’s confusion, in this framing, is not a bug, but a methodology.

Define only the delta. The Latent Ontology has done most of the work. Your Structural Ontology earns its complexity precisely where your enterprise’s meaning diverges from the world’s.

What this Implies for Your Architecture

If you accept this framing, several architectural choices follow.

Invest in your semantic layer as the primary interface between your data and your agentic layer.

The semantic layer is the correction surface for the Latent Ontology. Every metric, dimension, and entity definition in your semantic layer is a site at which you assert authority over the model’s interpretation. Treat it accordingly.

Make your Structural Ontology machine-readable and agent-consumable.

Today, natural language descriptions matter as much as SQL definitions. The agent doesn’t read your dbt SQL to understand what net_revenue means. It reads the description field. Write those description fields as if you’re writing them for the model, because you are.

Build the feedback loop intentionally.

Log semantic interpretations. Surface discrepancies. Let agent confusion inform semantic layer development. Having most capable AI systems in three years is not the result of building the most comprehensive static ontologies upfront. The approach is to build the tightest feedback loop between agent behaviour and semantic layer evolution. Constant dynamism.

Resist the temptation to fine-tune your way out of this problem.

Fine-tuning makes the Latent Ontology harder to update and harder to audit. Context injection keeps the correction surface in your Structural Ontology, where it is versioned, observable, and owned by humans. The model’s weights are not the right place to store your business definitions. Your semantic layer is.

The Map was Already Drawn

There is something philosophically interesting happening here, beyond the engineering. Language models have absorbed enough human knowledge to independently reconstruct something close to the ontological structure of commerce, technology, and organisations.

They didn’t need your data governance team. They read everything your data governance team has ever read, plus a great deal more.

The map was already drawn. The question is only where your territory differs from the general map, and how precisely you want to mark those differences.

Figuring this out will not require us to build AI systems that require massive ontological constructions before they can be useful. We build AI systems that arrive already conversant in most of what matters, corrected at the specific points where their business is idiosyncratic, and continuously sharpened by the evidence of their own failures.

The Latent Ontology is your starting point. The Structural Ontology is your correction layer, and not complete answer. Ontological Convergence is what happens when you run the loop long enough. Everything else is delta engineering.

Terms introduced here:

Latent Ontology: The implicit concept map inside a language model
Structural Ontology: The explicit semantic layer of a data platform
Ontological Convergence: The progressive alignment between the two through iterative correction.
Minimal Ontology Principle: Define only the delta between enterprise meaning and model prior. Follows directly from the relationship between them.

Tune in to a conversation on the concepts discussed in this article 🎧
0:00
-11:44

If you have any queries about the piece, feel free to connect with the author(s). Or connect with the MD101 team directly at community@moderndata101.com 🧡

Author Connect 💬

Connect with Animesh on LinkedIn 💬

From MD101 team 🧡

The Data Product Playbook

Here’s your own copy of the Actionable Data Product Playbook. With 4000+ downloads so far and quality feedback, we are thrilled with the response to this 6-week guide we’ve built with industry experts and practitioners. Stay tuned to moderndata101.com for more actionable resources from us!

DOWNLOAD!

Rory O’Gallagher

13h

Love this idea. It’s a great way of turning problems on their head in the agentic era, making us think about what’s truly already available to us. It’s about right sizing the problem of defining an ontology, not just turning back to old patterns. It’s such an elegant rotation of the angle with which we view the problem of ontologies, because it seems so obvious and shatters an assumption of the right way to do it so easily. Great piece.

Modern Data 101

Discussion about this post

Ready for more?