Week 41, 2025

2510.08354v1

Mephisto: Self-Improving Large Language Model-Based Agents for Automated Interpretation of Multi-band Galaxy Observations

Theme match 3/5

Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai

First listed 2025-10-09 | Last updated 2025-10-09

Abstract

Astronomical research has long relied on human expertise to interpret complex data and formulate scientific hypotheses. In this study, we introduce Mephisto -- a multi-agent collaboration framework powered by large language models (LLMs) that emulates human-like reasoning for analyzing multi-band galaxy observations. Mephisto interfaces with the CIGALE codebase (a library of spectral energy distribution, SED, models) to iteratively refine physical models against observational data. It conducts deliberate reasoning via tree search, accumulates knowledge through self-play, and dynamically updates its knowledge base. Validated across diverse galaxy populations -- including the James Webb Space Telescope's recently discovered "Little Red Dot" galaxies -- we show that Mephisto demonstrates proficiency in inferring the physical properties of galaxies from multi-band photometry, positioning it as a promising research copilot for astronomers. Unlike prior black-box machine learning approaches in astronomy, Mephisto offers a transparent, human-aligned reasoning process that integrates seamlessly with existing research practices. This work underscores the possibility of LLM-driven agent-based research for astronomy, establishes a foundation for fully automated, end-to-end artificial intelligence (AI)-powered scientific workflows, and unlocks new avenues for AI-augmented discoveries in astronomy.

Short digest

Introduces Mephisto, an LLM-driven multi‑agent framework that interfaces with CIGALE to iteratively propose and test SED models from multi‑band photometry using tree search, temporal memory, and self‑play knowledge distillation. Validated on COSMOS2020 galaxies and a JWST Little Red Dot case (JADES 90354), it recovers physical properties with transparent, step‑by‑step reasoning and a synthesized report. Against a 360‑million‑point exhaustive grid, Mephisto consistently reaches solutions within ~20% while exploring ~100× smaller grids, and sometimes outperforms the baseline fits. This positions it as a scalable, human‑aligned copilot for triaging rare sources and interpreting diverse galaxy populations.

Key figures to inspect

  • Figure 2 (LRD case study): Follow how iterative model refinements (dust, nebular, and potential AGN components) shift the SED to reproduce the extreme red colors of JADES 90354, and read the reasoning chain that weighs competing interpretations for Little Red Dots.
  • Figure 4 (performance vs exhaustive grid): Inspect the fractional‑difference scatter to confirm the ~20% agreement band, identify points below zero where Mephisto beats the exhaustive search, and gauge computational savings versus fit quality.
  • Figure 3 (COSMOS2020 exemplars): Use the component‑decomposed SEDs to see when an AGN contribution is required (panel c) and compare derived M*, attenuation, and SFR across dusty SF, dwarf, and massive systems.
  • Figure 1 (architecture): Trace the tree‑based workflow, noting prompt templates tied to CIGALE docs, the knowledge‑learning example that adjusts AGN parameters, and how temporal memory steers/prunes branches before the final summarized report.

Discussion

Log in to view the paper discussion, see votes, and leave your own feedback.