Understanding Knowledge Graphs-Encoding Knowledge at Scale in Open, Evolving, Decentralised Systems

Introduction

Knowledge Graphs (KGs) have revolutionized the way we structure and query knowledge, enabling efficient information retrieval in large-scale systems. Inspired by Google's Knowledge Graph (introduced in 2012) and research initiatives such as the Alan Turing Institute’s Knowledge Graph Research (source), KGs now serve as the foundation for various AI applications, including search engines, recommendation systems, and enterprise knowledge management.

The Evolution of Knowledge Graphs

The transition from keyword-based search to knowledge-based retrieval began when Google introduced “things, not strings” (Google Blog). This shift was a fundamental step in representing real-world entities and their relationships, enabling the search engine to provide contextualized and semantically relevant information.

A notable feature derived from this evolution is “People also search for”, which enhances search results by linking related concepts based on user behavior and structured entity relationships.

Exploring Knowledge Graphs through Kaggle

To understand how knowledge graphs function, we explored a Kaggle project: Build Knowledge Graph using Python. The dataset used, wiki_sentences_v2.csv, contained textual data from Wikipedia, which we leveraged to extract meaningful relationships and represent them as a graph.

The project involved:

Extracting entities using spaCy NLP models (spaCy).
Identifying relationships between entities by detecting verbs and linking subject-object pairs.
Building a knowledge graph using the NetworkX library (NetworkX) to visualize connections.

Enhancing Knowledge Graphs with Weights

One critical improvement to the original approach was introducing edge weights based on relationship frequency. Instead of treating all relationships equally, we assigned weights proportional to how frequently connections appeared in the dataset. This technique helps in:

Prioritizing strong associations over weak ones.
Visualizing more relevant connections by adjusting edge thickness.
Improving graph traversal for AI applications.

Sample Graph Representation

Given that the original dataset was vast, visualising all entities resulted in an overly dense graph. To mitigate this, we:

Filtered relationships based on frequency.
Selected a sample of 100 nodes to generate a readable subgraph.
Incorporated edge weights to reflect connection strength.

The updated knowledge graph effectively represents a structured knowledge base that AI systems can use for reasoning and inference. The code implementation for this project can be found in our GitHub Repository (GitHub Repo).

Learnings

Knowledge Graphs enable semantic search, reducing reliance on keyword matching.
Google’s approach in 2012 remains relevant today, as modern AI systems rely on structured knowledge to improve NLP models and search relevance.
By incorporating relationship strengths (weights), we can enhance the quality of automated reasoning and graph-based AI applications.
The combination of NLP and graph theory provides a powerful framework for structuring and retrieving information efficiently.

Why Knowledge Graphs Still Matter

Google’s implementation of knowledge graphs over a decade ago marked a shift from information search to knowledge search. The modern web is increasingly focused on:

Semantic Search: AI models like GPT-4 rely on knowledge structures to generate context-aware responses.
AI-Powered Assistants: Virtual assistants such as Google Assistant and Siri leverage entity-based knowledge.
Enterprise Knowledge Management: Businesses use KGs to improve decision-making, link data silos, and enhance recommendation engines.

Conclusion

Knowledge Graphs are instrumental in transforming unstructured data into structured knowledge. By leveraging open datasets, NLP models, and graph processing libraries, we can build scalable and evolving knowledge systems. With enhancements such as weighted relationships, KGs provide even more meaningful insights, making them crucial for AI-driven search, recommendation engines, and decision support systems.

Future improvements include integrating real-time knowledge updates, cross-domain knowledge fusion, and semantic reasoning algorithms to refine entity relationships further.