InstructProteinKG

Introduction

InstructProteinKG is a protein knowledge graph tailored for protein sequence-text alignment and instruction learning. Primarily extracted from the high-quality structured annotations of UniProtKB/Swiss-Prot, it organizes associations between proteins and annotation entities in the form of “(Protein, relation, Annotation)” triples. It encompasses the three major branches of GO (Biological Process/Molecular Function/Cellular Component) as well as key semantics from InterPro, such as family/superfamily/domain and conserved/active/binding sites. By further introducing Knowledge Causal Modeling (KCM), it transforms structural features into traceable causal chains for functional/localization knowledge, enabling debiased sampling and generation of high-quality protein instruction data. Additionally, it supports applications like protein functional annotation and knowledge-enhanced reasoning.

Biology

Domain

0

Entity

0

Triple

Scroll to Top