KGA

KGA: Inference-Time Knowledge Graph Fusion by Rewiring Attention for Adaptive Information Aggregation

License: MIT PyTorch Version

📢 News

✨ 2025-12: KGA is officially released!

This repository contains the official implementation of the paper "KGA: Inference-Time Knowledge Graph Fusion by Rewiring Attention for Adaptive Information Aggregation".

We propose a novel, parameter-free framework for dynamically integrating Knowledge Graphs (KGs) into Large Language Models (LLMs) exclusively at inference-time via cognitive-inspired attention rewiring.

📌 Table of Contents

🎯 Overview

Core Problem: How can Large Language Models (LLMs) integrate up-to-date, structured knowledge from Knowledge Graphs (KGs) without updating internal parameters, thus avoiding catastrophic forgetting and preserving generalization?

Limitation of Prior Work:

  • Fine-tuning (SFT/LoRA): Parameter-invasive, causes catastrophic forgetting, and lacks adaptability to real-time updates.

  • RAG/ICL: Suffers from "context stuffing," quadratic complexity, and attention dispersion.

Our Solution: We introduce Knowledge Graph-guided Attention (KGA), a test-time adaptation framework that enables dynamic knowledge fusion through non-invasive attention rewiring. KGA mimics the human cognitive process (Dual-Pathway) to establish a synergistic information flow between the input query and KG triples, requiring zero new parameters.

✨ Methodology

Inspired by cognitive neuroscience, KGA rewires the standard self-attention mechanism into two synergistic pathways:

1. Bottom-Up Pathway (Fusion: Input → KG)

  • Cognitive Role: Stimulus-driven attention (Input query acts as stimulus).
  • Function: Dynamically integrates external information. The Input Query ($Q^X$) attends to the Triple's Key and Value ($K^Z$, $V^Z$) to aggregate knowledge.
  • Mechanism: Projects KG triples using the frozen LLM's native $W^K, W^V$ matrices to ensure feature space alignment.

2. Top-Down Pathway (Guidance: KG → Input)

  • Cognitive Role: Goal-directed attention (Verification).
  • Function: Assesses the contextual relevance of each triple to filter noise. The Triple Query ($Q^Z$) probes the Input Key ($K^X$) to find supporting evidence.
  • Key Output: Computes an adaptive weight $\alpha_Z$ for each triple, which modulates the Bottom-Up fusion process to suppress irrelevant noise and amplify critical signals.

This bidirectional design allows KGA to achieve adaptive weighted fusion without training any adapters.

📈 Experimental Results

KGA was evaluated across three distinct tasks (KGQA, KG Reasoning, Model Editing) using Llama3 and Qwen2.5 models.

| Task | Dataset | Key Result (KGA) | vs. ICL | | :--- | :--- | :--- | :--- | | KGQA | SimpleQuestions | 80.79% (Qwen2.5-0.5B-Instruct) | +30% | | KG Reasoning | PathQuestion (2-hop) | 96.43% (Qwen2.5-0.5B-Instruct) | +70% | | Consecutive Model Editing | ZsRE | 60.79% Efficacy (Llama3-8B) | +15% | | Consecutive Model Editing | CounterFact | 67.64% Efficacy (Llama3-8B) | +60% |

🛠️ Usage

  1. Clone the repository bash git clone https://github.com/SonglinZhai/KGA cd KGA

  2. Set up a Python environment bash conda create -n kga python=3.9 -y conda activate kga

  3. Install dependencies bash pip install -r requirements.txt

🔬 Reproducing Experiments

To reproduce the main results from the paper (Just two steps):

  1. Configure the experiment

    • Download the pretrained LLMs. Then, modify the Json files in config/fusion.json to set correct model paths.
    • Download FB2M.txt and FB2M.name.txt and place these two files into the folder data/kg/.
  2. Run script bash # Example: Run SimpleQuestions experiment bash scripts/kga_simplequestions_qwen05b.sh

📂 Repository Structure

KGA/ ├── README.md # This file ├── requirements.txt # Python dependencies ├── config/ # Configuration files for datasets │ └── fusion.json ├── src/ # Core source code │ ├── fusion/ # KG-Fusion source code │ │ ├── dataset.py/ # Data sets preprocessing │ │ ├── dataloader.py/ # Data loaders and preprocessing │ │ ├── kga.py # KGA Useage │ │ ├── main.py # Main file │ │ └── kga_llms # LLMs equipped with KGA │ │ ├── modeling_llama.py # Llama model equipped with KGA │ │ └── modeling_qwen2.py # Qwen model equipped with KGA │ └── retrieve # Code of candidates retrieval ├── data/ # Datasets ├── scripts/ # Running scripts for reproducing results ├── assets/ └── docs/

🚀 Quick Start

KGA operates at inference-time only. Here is a minimal example:

```
import torch from src.fusion.kga_llms.modeling_llama import LlamaForCausalLM from transformers import AutoTokenizer

1. Load Base KGA-enhanced LLM

KGA automatically rewires the attention layers

model = LlamaForCausalLM.from_pretrained('path/to/llama') model.eval() tokenizer = AutoTokenizer.from_pretrained('path/to/llama')

2. Prepare Candidate Triples

knowledge_triples = [ ("WWW_2026", "held_in", "Dubai"), ("Dubai", "located_in", "UAE"), ("WWW_2025", "held_in", "Sydney"), # Noise triple ] cache_kga = { 'attn_mask': None, 'attn_weights': list(), 'attn_output': list(), 'query': list(), 'kv_cache': None, 'adaptive_weights': list(), 'hidden_states_from_triples': None, } inputs_triples = tokenizer(knowledge_triples, padding=True, return_tensors="pt") outputs = self.model(**inputs_triples, cache_kga=cache_kga) cache_kga['kv_cache'] = outputs.past_key_values

3. Generation with KGA

question = "In which country is WWW_2026 held?"

KGA will automatically calculate relevance and filter the noise

inputs = tokenizer(question, return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model.generate( **inputs, triples=knowledge_triples, tau=1.0, max_new_tokens=50 ) answer = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Question: {question}") print(f"Answer: {answer}") ```

数据与资源

其他信息

价值
作者 Songlin Zhai
维护者 Songlin Zhai
最近更新 十二月 29, 2026, 08:21 (UTC)
创建的 十二月 28, 2025, 07:56 (UTC)