2506.03130v1
AGNBoost: A Machine Learning Approach to AGN Identification with JWST/NIRCam+MIRI Colors and Photometry
First listed 2025-06-03 | Last updated 2026-03-02
Abstract
We present AGNBoost, a machine learning framework utilizing XGBoostLSS to identify AGN and estimate redshifts from JWST NIRCam and MIRI photometry. AGNBoost constructs 66 input features from 7 NIRCam and 4 MIRI bands to predict the fraction of mid-IR $3$--$30\,μ$m emission attributable to an AGN power law ($\text{frac}_{\text{AGN}}$) and photometric redshift. Each model is trained on $10^6$ simulated galaxies from CIGALE. Models are tested on mock CIGALE galaxies, an independent set of empirically-derived templates, and 748 observations from the JWST MIRI EGS Galaxy and AGN (MEGA) survey. On idealized noise-free mock CIGALE galaxies, AGNBoost achieves $15\%$ outlier fractions of $1.63\%$ ($\text{frac}_{\text{AGN}}$) and $0.15\%$ (redshift), with $σ_{\text{RMSE}} = 0.045$ for $\text{frac}_{\text{AGN}}$ and $σ_{\text{NMAD}} = 0.004$ for redshift. When realistic photometric uncertainties are introduced, performance remains robust with median predictions on the 1:1 relation, though outlier fractions increase to $4.38\%$ and $3.35\%$, respectively. On the independent template set, AGNBoost identifies $92.6\%$ of AGN candidates with $\text{frac}_{\text{AGN}} > 0.3$ and $100\%$ with $\text{frac}_{\text{AGN}} > 0.5$, demonstrating generalization beyond the training distribution. On MEGA galaxies with spectroscopic redshifts, AGNBoost achieves $σ_{\text{NMAD}} = 0.056$ and $19.79\%$ outliers. AGNBoost $\text{frac}_{\text{AGN}}$ estimates broadly agree with CIGALE fitting ($σ_{\text{RMSE}} = 0.178$, $11.96\%$ outliers). The flexible framework allows straightforward incorporation of additional photometric bands and re-training for other variables. AGNBoost's computational efficiency makes it well-suited for wide-sky surveys requiring rapid AGN identification and redshift estimation.
Short digest
Introduces AGNBoost, an XGBoostLSS framework that ingests NIRCam+MIRI magnitudes, colors, and color-squared terms to jointly predict the mid-IR AGN power-law fraction (frac_AGN) and photometric redshift, trained on 10^6 CIGALE-simulated galaxies. On idealized mocks it attains low scatter (RMSE 0.045 in frac_AGN; NMAD 0.004 in z) with 1.63% and 0.15% outliers, remaining robust under realistic photometric errors (outliers ≈4.38% and 3.35%). It generalizes to empirical templates (recovering 92.6% of AGN with frac_AGN>0.3 and 100% with >0.5) and, on MEGA sources with spectroscopic redshifts, achieves σ_NMAD ≈ 0.056 with 19.79% outliers while yielding frac_AGN consistent with CIGALE (RMSE 0.178; 11.96% outliers). Fast retraining and easy band augmentation make it practical for wide-area JWST catalogs needing rapid AGN screening and photo-zs.
Key figures to inspect
- Figure 1: Use the color–color tracks to see why pure mid-IR cuts fail without redshift—PAH features and AGN power laws overlap as they shift, motivating simultaneous photo‑z + frac_AGN prediction.
- Figure 2: Compare MEGA points against the CIGALE bagplots to verify that the training set spans the observed MIRI color space and to spot where MEGA sources fall outside (checked later for performance impacts).
- Figure 3: Inspect F770W distributions to confirm simulations cover MEGA fluxes and note the few very low‑z (z≲0.2) outliers beyond the simulated range that could challenge the model.
- Figure 4: Follow the two‑stage tuning/early‑stopping workflow to understand how hyperparameters and boosting rounds were optimized for speed and stability before final training.
Discussion
Log in to view the paper discussion, see votes, and leave your own feedback.