KGA
KGA: Inference-Time Knowledge Graph Fusion by Rewiring Attention for Adaptive Information Aggregation
📢 News
✨ 2025-12: KGA is officially released!
This repository contains the official implementation of the paper "KGA: Inference-Time Knowledge Graph Fusion by Rewiring Attention for Adaptive Information Aggregation".
We propose a novel, parameter-free framework for dynamically integrating Knowledge Graphs (KGs) into Large Language Models (LLMs) exclusively at inference-time via cognitive-inspired attention rewiring.
📌 Table of Contents
- Overview
- Methodology
- Experimental Results
- Usage
- Reproducing Experiments
- Repository Structure
- Quick Start
- License & Contact
🎯 Overview
Core Problem: How can Large Language Models (LLMs) integrate up-to-date, structured knowledge from Knowledge Graphs (KGs) without updating internal parameters, thus avoiding catastrophic forgetting and preserving generalization?
Limitation of Prior Work:
-
Fine-tuning (SFT/LoRA): Parameter-invasive, causes catastrophic forgetting, and lacks adaptability to real-time updates.
-
RAG/ICL: Suffers from "context stuffing," quadratic complexity, and attention dispersion.
Our Solution: We introduce Knowledge Graph-guided Attention (KGA), a test-time adaptation framework that enables dynamic knowledge fusion through non-invasive attention rewiring. KGA mimics the human cognitive process (Dual-Pathway) to establish a synergistic information flow between the input query and KG triples, requiring zero new parameters.
✨ Methodology
Inspired by cognitive neuroscience, KGA rewires the standard self-attention mechanism into two synergistic pathways:
1. Bottom-Up Pathway (Fusion: Input → KG)
- Cognitive Role: Stimulus-driven attention (Input query acts as stimulus).
- Function: Dynamically integrates external information. The Input
Query($Q^X$) attends to the Triple'sKeyandValue($K^Z$, $V^Z$) to aggregate knowledge. - Mechanism: Projects KG triples using the frozen LLM's native $W^K, W^V$ matrices to ensure feature space alignment.
2. Top-Down Pathway (Guidance: KG → Input)
- Cognitive Role: Goal-directed attention (Verification).
- Function: Assesses the contextual relevance of each triple to filter noise. The Triple
Query($Q^Z$) probes the InputKey($K^X$) to find supporting evidence. - Key Output: Computes an adaptive weight $\alpha_Z$ for each triple, which modulates the Bottom-Up fusion process to suppress irrelevant noise and amplify critical signals.
This bidirectional design allows KGA to achieve adaptive weighted fusion without training any adapters.
📈 Experimental Results
KGA was evaluated across three distinct tasks (KGQA, KG Reasoning, Model Editing) using Llama3 and Qwen2.5 models.
| Task | Dataset | Key Result (KGA) | vs. ICL | | :--- | :--- | :--- | :--- | | KGQA | SimpleQuestions | 80.79% (Qwen2.5-0.5B-Instruct) | +30% | | KG Reasoning | PathQuestion (2-hop) | 96.43% (Qwen2.5-0.5B-Instruct) | +70% | | Consecutive Model Editing | ZsRE | 60.79% Efficacy (Llama3-8B) | +15% | | Consecutive Model Editing | CounterFact | 67.64% Efficacy (Llama3-8B) | +60% |
🛠️ Usage
-
Clone the repository
bash git clone https://github.com/SonglinZhai/KGA cd KGA -
Set up a Python environment
bash conda create -n kga python=3.9 -y conda activate kga -
Install dependencies
bash pip install -r requirements.txt
🔬 Reproducing Experiments
To reproduce the main results from the paper (Just two steps):
-
Configure the experiment
- Download the pretrained LLMs. Then, modify the Json files in
config/fusion.jsonto set correct model paths. - Download FB2M.txt and FB2M.name.txt and place these two files into the folder
data/kg/.
- Download the pretrained LLMs. Then, modify the Json files in
-
Run script
bash # Example: Run SimpleQuestions experiment bash scripts/kga_simplequestions_qwen05b.sh
📂 Repository Structure
KGA/
├── README.md # This file
├── requirements.txt # Python dependencies
├── config/ # Configuration files for datasets
│ └── fusion.json
├── src/ # Core source code
│ ├── fusion/ # KG-Fusion source code
│ │ ├── dataset.py/ # Data sets preprocessing
│ │ ├── dataloader.py/ # Data loaders and preprocessing
│ │ ├── kga.py # KGA Useage
│ │ ├── main.py # Main file
│ │ └── kga_llms # LLMs equipped with KGA
│ │ ├── modeling_llama.py # Llama model equipped with KGA
│ │ └── modeling_qwen2.py # Qwen model equipped with KGA
│ └── retrieve # Code of candidates retrieval
├── data/ # Datasets
├── scripts/ # Running scripts for reproducing results
├── assets/
└── docs/
🚀 Quick Start
KGA operates at inference-time only. Here is a minimal example:
```
import torch
from src.fusion.kga_llms.modeling_llama import LlamaForCausalLM
from transformers import AutoTokenizer
1. Load Base KGA-enhanced LLM
KGA automatically rewires the attention layers
model = LlamaForCausalLM.from_pretrained('path/to/llama') model.eval() tokenizer = AutoTokenizer.from_pretrained('path/to/llama')
2. Prepare Candidate Triples
knowledge_triples = [ ("WWW_2026", "held_in", "Dubai"), ("Dubai", "located_in", "UAE"), ("WWW_2025", "held_in", "Sydney"), # Noise triple ] cache_kga = { 'attn_mask': None, 'attn_weights': list(), 'attn_output': list(), 'query': list(), 'kv_cache': None, 'adaptive_weights': list(), 'hidden_states_from_triples': None, } inputs_triples = tokenizer(knowledge_triples, padding=True, return_tensors="pt") outputs = self.model(**inputs_triples, cache_kga=cache_kga) cache_kga['kv_cache'] = outputs.past_key_values
3. Generation with KGA
question = "In which country is WWW_2026 held?"
KGA will automatically calculate relevance and filter the noise
inputs = tokenizer(question, return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model.generate( **inputs, triples=knowledge_triples, tau=1.0, max_new_tokens=50 ) answer = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Question: {question}") print(f"Answer: {answer}") ```
数据与资源
其他信息
| 域 | 价值 |
|---|---|
| 作者 | Songlin Zhai |
| 维护者 | Songlin Zhai |
| 最近更新 | 十二月 29, 2026, 08:21 (UTC) |
| 创建的 | 十二月 28, 2025, 07:56 (UTC) |