SCP-InstructProteinKG
Biology
|
@ZJU
|
2026.02.06
InstructProteinKG is a protein knowledge graph tailored for protein sequence-text alignment and instruction learning. Primarily extracted from the high-quality structured annotations of UniProtKB/Swiss-Prot, it organizes associations between proteins and annotation entities in the form of “(Protein, relation, Annotation)” triples.
InstructProteinKG SCP Server
🧪 InstructProteinKG is a protein knowledge graph tailored for protein sequence-text alignment and instruction learning. Primarily extracted from the high-quality structured annotations of UniProtKB/Swiss-Prot, it organizes associations between proteins and annotation entities in the form of “(Protein, relation, Annotation)” triples. It encompasses the three major branches of GO (Biological Process/Molecular Function/Cellular Component) as well as key semantics from InterPro, such as family/superfamily/domain and conserved/active/binding sites. By further introducing Knowledge Causal Modeling (KCM), it transforms structural features into traceable causal chains for functional/localization knowledge, enabling debiased sampling and generation of high-quality protein instruction data. Additionally, it supports applications like protein functional annotation and knowledge-enhanced reasoning.
🛠️ Tool List
| Tool Name | Functional Description |
|---|---|
| query_cypher | Execute any Cypher query statement, supporting flexible graph database operations |
| get_kg_statistics | Obtain statistical information such as nodes, relationships, and type distribution of the knowledge graph |
| get_entity_details | Obtain detailed information and relationships of entities based on entity identifiers |
| get_experiment_workflow | Obtain the complete workflow of chemical experiments |
🚀 Quick Start
1. Dependence
It is recommended to use Python 3.10+ and only need to install mcp :
pip install mcp
2. Configuration
Please refer to the following code to define the Server client:
# Python
----------
import asyncio
import json
from mcp.client.streamable_http import streamablehttp_client
from mcp.client.session import ClientSession
class MultiDomainKGClient:
def __init__(self, server_url: str = "https://scp.intern-ai.org.cn/api/v1/mcp/37/SciGraph"):
self.server_url = server_url
self.session = None
async def connect(self):
"""建立连接并初始化会话"""
print(f"\n{'='*80}")
print("连接到 SciGraph SCP Server")
print(f"{'='*80}")
print(f"服务器地址: {self.server_url}")
try:
self.transport = streamablehttp_client(
url=self.server_url,
headers={"SCP-HUB-API-KEY": "sk-xxx"}
)
self.read, self.write, self.get_session_id = await self.transport.__aenter__()
self.session_ctx = ClientSession(self.read, self.write)
self.session = await self.session_ctx.__aenter__()
await self.session.initialize()
session_id = self.get_session_id()
print(f"✓ 连接成功")
print(f"✓ 会话ID: {session_id}")
print(f"{'='*80}\n")
return True
except Exception as e:
print(f"✗ 连接失败: {e}")
import traceback
traceback.print_exc()
return False
async def disconnect(self):
"""断开连接"""
try:
if self.session:
await self.session_ctx.__aexit__(None, None, None)
if hasattr(self, 'transport'):
await self.transport.__aexit__(None, None, None)
print("\n✓ 已断开连接\n")
except Exception as e:
print(f"✗ 断开连接时出错: {e}")
async def list_tools(self):
"""列出所有可用工具"""
tools_list = await self.session.list_tools()
print(f"\n可用工具 (共{len(tools_list.tools)}个):\n")
for i, tool in enumerate(tools_list.tools, 1):
print(f"{i:2d}. {tool.name}")
if tool.description:
desc_line = tool.description.split('\n')[0]
print(f" {desc_line}")
return tools_list.tools
def parse_result(self, result):
"""解析 MCP 工具调用结果"""
try:
if hasattr(result, 'content') and result.content:
content = result.content[0]
if hasattr(content, 'text'):
return json.loads(content.text)
return str(result)
except Exception as e:
return {"error": f"解析结果失败: {e}", "raw": str(result)}
📊 Usage
Taking InstructProteinKG (Directive Protein Knowledge Graph) as an example, the query method is demonstrated as follows:
# Python
----------
async def main():
## 客户端创建和连接
client = MultiDomainKGClient()
if not await client.connect():
print("连接失败")
return
## 示例1:获取知识图谱统计信息
result = await client.session.call_tool(
"get_kg_statistics",
arguments={"kg_name": "InstructProteinKG"} # 不指定 kg_name,返回所有图谱统计
)
stats = client.parse_result(result)
print(stats)
## 示例2:查询 InstructProteinKG 实验的完整工作流
result = await client.session.call_tool(
"get_experiment_workflow",
arguments={"experiment_id": "experiment_1"}
)
workflow = client.parse_result(result)
print(workflow)
## 示例3:使用 Cypher 查询 InstructProteinKG 相关信息
result = await client.session.call_tool(
"query_cypher",
arguments={
"cypher": "MATCH (e:Experiment:InstructProteinKG) RETURN e.id as experiment_id",
"kg_name": "InstructProteinKG",
"limit": 5
}
)
experiment_id = client.parse_result(result)
print(experiment_id)
## 示例4:获取 InstructProteinKG 实体详情
result = await client.session.call_tool(
"get_entity_details",
arguments={
"entity_identifier": "experiment_1",
"kg_name": "InstructProteinKG"
}
)
entity = client.parse_result(result)
print(entity)
## 客户端断开
await client.disconnect()
if __name__ == '__main__':
asyncio.run(main())
🙏 Acknowledgements
We would like to express our gratitude to Zhejiang University and Shanghai Artificial Intelligence Laboratory for their strong support in the organization, deployment, and release of the InstructProteinKG SCP Server.
query_cypher
{
"properties": {
"cypher": {
"type": "string"
},
"kg_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null
},
"limit": {
"default": 100,
"type": "integer"
}
},
"required": [
"cypher"
],
"type": "object"
}
get_kg_statistics
{
"properties": {
"kg_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null
}
},
"type": "object"
}
get_entity_details
{
"properties": {
"entity_identifier": {
"type": "string"
},
"kg_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null
}
},
"required": [
"entity_identifier"
],
"type": "object"
}
get_experiment_workflow
Args: experiment_id: experiment ID
Returns: JSON-formatted experiment workflow (including steps and reagents)
Example: get_experiment_workflow(“experiment_1”)
{
"properties": {
"experiment_id": {
"type": "string"
}
},
"required": [
"experiment_id"
],
"type": "object"
}
How to use?
1.Install MCP SDK
pip install mcp
2.Apply for an API Key
3.Configuration Information
https://scp.intern-ai.org.cn/api/v1/mcp/37/SciGraph
{
"mcpServers": {
"SciGraph": {
"type": "streamableHttp",
"description": "这是一款面向科学研究的统一知识查询服务,集成了化学、生物等多个学科领域的知识图谱数据,支持跨学科知识检索、实体关系查询、领域知识问答等操作",
"url": "https://scp.intern-ai.org.cn/api/v1/mcp/37/SciGraph",
"headers": {
"SCP-HUB-API-KEY": "{API-KEY}"
}
}
}
}
