Evaluation

Evaluation Datasets

Scientific Capability Subject Benchmark Modality
Scientific Multimodal Perception Life Science SLAKE Image
Scientific Multimodal Reasoning Earth Science MSEarth Image
Scientific Multimodal Understanding Multidisciplinary SFE Image
Earth Science OmniEarth Image
Life Science OmniMedVQA Image
Physics PhyX Image
Scientific Knowledge Understanding Chemistry ChemBench Text
Chemistry ChemBench4K Text
Chemistry LLM4Chem Image
Earth Science ClimaQA Text
Earth Science EarthSE Text
Life Science ProteinLMBench Text
Life Science BioProbench Text
Materials Science MaScQA Text
Life Science TRQA Text
Life Science Biology-Instructions Text
Life Science Mol-Instructions Text
Life Science PEER Text
Scientific Code Generation Multidisciplinary SciCode Text
Astronomy AstroVisBench Image
Scientific Symbolic Reasoning Physics CMPhysBench Text
Physics PHYSICS Text

Evaluation Results

Models SLAKE MSEarth SFE OmniEarth OmniMedVQA PhyX ChemBench ChemBench4K LLM4Chem ClimaQA EarthSE ProteinLMBench BioProbench MaScQA TRQA Biology-Instructions Mol-Instructions PEER SciCode AstroVisBench CMPhysBench PHYSICS ResearchBench
Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph Vanilla With SciGraph
Intern-S1
DeepSeek-R1
deepseek-v3.2
GPT-5.2
Claude-sonnet-4
Gemini-2.5-Pro
Grok-4
GLM-4.7
Qwen3-32B
Qwen3-8B
Scroll to Top