2507.00129v1
Lya2pcf: an efficient pipeline to estimate two- and three-point correlation functions of the Lyman-$α$ forest
First listed 2025-06-30 | Last updated 2025-06-30
Abstract
Studying the matter distribution in the universe through the Lyman-$α$ forest allows us to constrain small-scale physics in the high-redshift regime. Spectroscopic quasar surveys are generating increasingly large datasets that require efficient algorithms to compute correlation functions. Moreover, cosmological analyses based on Lyman-$α$ forests can significantly benefit from incorporating higher-order statistics alongside traditional two-point correlations. In this work, we present Lya2pcf, a pipeline designed to compute three-dimensional two-point and three-point correlation functions using Lyman-$α$ forest data. The code implements standard algorithms widely used in current spectroscopic surveys for computing the two-point correlation function with its distortion matrix, covariance matrices; and it naturally extends the two-point estimator to three-point correlations. Thanks to GPU optimization, Lya2pcf achieves a substantial reduction in computational time for both the two-point correlation function and its distortion matrix when compared to the widely used PICCA code. We apply Lya2pcf to data from the Sloan Digital Sky Survey (SDSS) sixteenth data release (DR16) and a Dark Energy Spectroscopic Instrument Year-5 (DESI Y5) mock dataset, demonstrating overall performance gains over PICCA, especially on GPUs. We show the first measurement of the anisotropic three-point correlation function on a large spectroscopic sample for all possible triangles with scales up to 80 Mpc/h. The estimator's fast computation and the resulting signal-to-noise ratio -- above one for many triangle configurations -- demonstrate the viability of incorporating three-point statistics into future cosmological inference analyses, particularly with the larger datasets expected from Stage IV spectroscopic surveys.
Short digest
Lya2pcf introduces a GPU-optimized pipeline for three-dimensional two- and three-point correlation measurements of the Lyman-α forest, including the standard distortion matrix and covariance estimation. Applied to SDSS DR16 and a DESI Year-5 mock, it achieves substantial computational speedups over PICCA, most notably for the 2PCF and its distortion matrix. The code delivers the first large-sample anisotropic 3PCF measurement for all triangle configurations out to 80 Mpc/h, with signal-to-noise above one for many bins. This demonstrates that higher-order Lyα statistics are ready for inclusion in Stage IV cosmological inference.
Key figures to inspect
- Geometry and binning of the 2PCF estimator: inspect the sky-patch visualization and the (r⊥, r∥) histogram grid (50×50 up to 80 Mpc/h) to see how inter-forest pairs are accumulated and why same-forest pairs are omitted.
- Timing benchmarks versus PICCA: look for CPU/GPU wall-clock comparisons for the 2PCF and the distortion matrix, and scaling trends with number of quasars and deltas per skewer, to quantify the reported speedups.
- Distortion matrix M_ij: examine its structure and comparison to the PICCA-based model to understand continuum-fitting–induced mixing and its impact on the recovered 2PCF.
- Anisotropic 3PCF maps: check triangle-bin or wedge plots up to 80 Mpc/h and the corresponding S/N heatmaps to identify which triangle shapes drive S/N > 1.
- Validation on SDSS DR16 and DESI Y5 mock: compare measured 2PCF/3PCF and covariances against expectations to assess robustness before applying to Stage IV data.
Discussion
Log in to view the paper discussion, see votes, and leave your own feedback.