The Need for a Structured Approach, Elements of the Ontology Pipeline, the Pipeline as a Framework for Developing Knowledge Management Systems, and More!
Great feedback and thank you! In subsequent chapters (in my book) I will dedicate time to each stage in the pipeline, deep diving into tooling and technologies that will support the work. Introducing a high level view or even that there are supporting technologies may be necessary to set the stage for what’s to come-;) !
Jessica, this is an awesome read. I have two comments:
1. You provide a methodology for evolution. Unlike the idea of the semantic spectrum and the similar scale found in the Information Architecture book, you specifically advice on how to enable the continued refinement from the simple stage towards the knowledge graph. Keep that up, go deep in the methodological explanations of this, it’s really original.
2. You point to a tech stack. I remember your talk at Data Day Texas, and that for each stage in the pipeline, you had specific technologies that performs the task at that particular stage. Just like a data pipeline has tools for etl, manager by data engineers, your pipeline has a tooling to. Depict that in the diagrams of the ontology pipeline, it explains my first comment visually.
(I’m on danish keyboard, apologies in advance for typos)
Let's not forget the science and discipline of Terminology (not the same as "Lexicology") which is concept oriented (synonyms), multilingual, and accounts for semantic relations, both taxonomies and ontologies. The essential principles of concept orientation, synonyms and variants, term (word/expression/microcontent) autonomy, concept relations, in multiple languages, have been practiced for decades. This is very close to knowledge graphs.
I was genuinely happy to read a text that addresses something I'm very interested in and also inspired me to create a working framework based on Library and Information Science: "The Semantic-aware SEO Workflow" (search for "SEMANTIC SEO: The Semantic-aware SEO workflow" and You will find it). I'm a library science student here in Brazil, and I've worked with SEO since 2008. When I first learned about a Librarian's work organizing information, I immediately saw the connection with information retrieval on the Web, which is what Google and other search engines do. It seems more than logical to me now, but at the time it was a mind-blowing discovery! Since I started university, I've been making connections between Library Science and SEO, which led to the creation of "The Semantic-aware SEO Workflow" described in the book I mentioned. It's truly invigorating to see that my work can be useful to websites that want to perform well on LLM-powered agents too.
Great post, Jessica! It clearly outlines the steps and deliverables involved in transforming implicit knowledge into explicit knowledge. The pipeline that emerges from sequencing these steps reflects the activities of an iteration within a process that is, by nature, iterative. Since an organization’s knowledge is constantly evolving, managing it isn’t a one-time event but an ongoing process. That’s why knowledge management should be approached with a product rather than a project mindset. What do you think?
Hi Andrea, Thank you! The Ontology Pipeline is a repeatable workflow, where each phase of the process IS a product. The controlled vocabulary is a product and therefore has value and several ways or methods to implement. In total, there are six data products within the pipeline, the last step and stage being a knowledge graph.
The workflow supports the repeatable and iterative process. Agile rarely works well with semantic engineering. I see this akin to a Design Thinking type of approach and product management style.
I completely agree. Knowledge shouldn't just be treated as a first-class citizen—it should also be seen as a product, or better yet, a collection of interconnected products. When building a shared enterprise ontology, it’s helpful to consider having multiple knowledge products delivered at each pipeline stage. This means multiple vocabularies, taxonomies, thesauri, and, of course, sub-ontologies. Such modularization makes it easier to manage the shared enterprise ontology in a federated and scalable way across the organization.
In your opinion, what is the best operating model at the organizational level to support this iterative pipeline that turns implicit knowledge into explicit knowledge?
Given that The Ontology Pipeline is a workflow, representing iterative steps that incorporate and prioritize data quality paradigms, the pipeline demands a product management oriented operating model and framework. With multiple, related and interconnected products within the Pipeline workflow, I do believe the operational aspect is best served with Program Management methodologies, to support change management and adoption. Product cannot go at it alone as organizational and domain adoption is critical to implementation and success. Knowledge management is both social and political, requiring education around information and data literacy, and specifics around practices and applications of knowledge management systems. In the end, a knowledge graph is a knowledge management tool and system, and exists as dynamic, not static. Knowledge is living, breathing and evolving, to expand and contract as data, information and knowledge necessitates. These accommodations for system expansion and contraction must be part of a business operating framework.
The operating model is NOT Agile (contentious, for sure). Stuffing knowledge management work into tidy sprints is not realistic. For that, I’m sure.
I'm a bit confused by this phrasing: "A thesaurus handles ambiguity by forming associative relationships between terms beyond the relationships within a hierarchy: the parent-child relations." It sounds like it's implying that parent-child relationships are associative, but I don’t think that’s the intent. My understanding is that associative relationships exist in addition to hierarchical (parent-child) ones, rather than being part of them. Is that correct?
Read: Beyond the parent-child relationship.Associative relationships amongst the terms represented within the hierarchy. So yes-this means that there are hierarchical relations defined by the taxonomy. A thesaurus extends these relations beyond a parent-child constraint. Examples: related, is a.
So you are correct in your understanding. I will attempt to clarify here. Thanks, Ramona!
My instinct is to think of mathematical relations; rel is associative iff (R rel S) rel T = R rel (S rel T). But I knew that associative in this context is something else. Which is fitting, given the context of this chapter :)
Great feedback and thank you! In subsequent chapters (in my book) I will dedicate time to each stage in the pipeline, deep diving into tooling and technologies that will support the work. Introducing a high level view or even that there are supporting technologies may be necessary to set the stage for what’s to come-;) !
Jessica, this is an awesome read. I have two comments:
1. You provide a methodology for evolution. Unlike the idea of the semantic spectrum and the similar scale found in the Information Architecture book, you specifically advice on how to enable the continued refinement from the simple stage towards the knowledge graph. Keep that up, go deep in the methodological explanations of this, it’s really original.
2. You point to a tech stack. I remember your talk at Data Day Texas, and that for each stage in the pipeline, you had specific technologies that performs the task at that particular stage. Just like a data pipeline has tools for etl, manager by data engineers, your pipeline has a tooling to. Depict that in the diagrams of the ontology pipeline, it explains my first comment visually.
(I’m on danish keyboard, apologies in advance for typos)
Let's not forget the science and discipline of Terminology (not the same as "Lexicology") which is concept oriented (synonyms), multilingual, and accounts for semantic relations, both taxonomies and ontologies. The essential principles of concept orientation, synonyms and variants, term (word/expression/microcontent) autonomy, concept relations, in multiple languages, have been practiced for decades. This is very close to knowledge graphs.
I’ll be expanding on this in my book.
I was genuinely happy to read a text that addresses something I'm very interested in and also inspired me to create a working framework based on Library and Information Science: "The Semantic-aware SEO Workflow" (search for "SEMANTIC SEO: The Semantic-aware SEO workflow" and You will find it). I'm a library science student here in Brazil, and I've worked with SEO since 2008. When I first learned about a Librarian's work organizing information, I immediately saw the connection with information retrieval on the Web, which is what Google and other search engines do. It seems more than logical to me now, but at the time it was a mind-blowing discovery! Since I started university, I've been making connections between Library Science and SEO, which led to the creation of "The Semantic-aware SEO Workflow" described in the book I mentioned. It's truly invigorating to see that my work can be useful to websites that want to perform well on LLM-powered agents too.
Great post, Jessica! It clearly outlines the steps and deliverables involved in transforming implicit knowledge into explicit knowledge. The pipeline that emerges from sequencing these steps reflects the activities of an iteration within a process that is, by nature, iterative. Since an organization’s knowledge is constantly evolving, managing it isn’t a one-time event but an ongoing process. That’s why knowledge management should be approached with a product rather than a project mindset. What do you think?
Hi Andrea, Thank you! The Ontology Pipeline is a repeatable workflow, where each phase of the process IS a product. The controlled vocabulary is a product and therefore has value and several ways or methods to implement. In total, there are six data products within the pipeline, the last step and stage being a knowledge graph.
The workflow supports the repeatable and iterative process. Agile rarely works well with semantic engineering. I see this akin to a Design Thinking type of approach and product management style.
Curious to hear your thoughts on this-!
I completely agree. Knowledge shouldn't just be treated as a first-class citizen—it should also be seen as a product, or better yet, a collection of interconnected products. When building a shared enterprise ontology, it’s helpful to consider having multiple knowledge products delivered at each pipeline stage. This means multiple vocabularies, taxonomies, thesauri, and, of course, sub-ontologies. Such modularization makes it easier to manage the shared enterprise ontology in a federated and scalable way across the organization.
https://www.linkedin.com/posts/andreagioia_knowledgemesh-thedatajoy-ontologies-activity-7302583621381480448-bMHo
In your opinion, what is the best operating model at the organizational level to support this iterative pipeline that turns implicit knowledge into explicit knowledge?
Given that The Ontology Pipeline is a workflow, representing iterative steps that incorporate and prioritize data quality paradigms, the pipeline demands a product management oriented operating model and framework. With multiple, related and interconnected products within the Pipeline workflow, I do believe the operational aspect is best served with Program Management methodologies, to support change management and adoption. Product cannot go at it alone as organizational and domain adoption is critical to implementation and success. Knowledge management is both social and political, requiring education around information and data literacy, and specifics around practices and applications of knowledge management systems. In the end, a knowledge graph is a knowledge management tool and system, and exists as dynamic, not static. Knowledge is living, breathing and evolving, to expand and contract as data, information and knowledge necessitates. These accommodations for system expansion and contraction must be part of a business operating framework.
The operating model is NOT Agile (contentious, for sure). Stuffing knowledge management work into tidy sprints is not realistic. For that, I’m sure.
Ever since I discovered this - I learn more than from gartner or other sources! Kudos for the great work.
I'm a bit confused by this phrasing: "A thesaurus handles ambiguity by forming associative relationships between terms beyond the relationships within a hierarchy: the parent-child relations." It sounds like it's implying that parent-child relationships are associative, but I don’t think that’s the intent. My understanding is that associative relationships exist in addition to hierarchical (parent-child) ones, rather than being part of them. Is that correct?
Read: Beyond the parent-child relationship.Associative relationships amongst the terms represented within the hierarchy. So yes-this means that there are hierarchical relations defined by the taxonomy. A thesaurus extends these relations beyond a parent-child constraint. Examples: related, is a.
So you are correct in your understanding. I will attempt to clarify here. Thanks, Ramona!
Thank you, Jessica!
My instinct is to think of mathematical relations; rel is associative iff (R rel S) rel T = R rel (S rel T). But I knew that associative in this context is something else. Which is fitting, given the context of this chapter :)