Reproduces the data generating process from Ewald et al. (2024) for benchmarking feature importance methods. Includes correlated features and interaction effects.
Value
A regression task (mlr3::TaskRegr) with data.table backend.
Details
Mathematical Model: $$X_1, X_3, X_5 \sim \text{Uniform}(0,1)$$ $$X_2 = X_1 + \varepsilon_2, \quad \varepsilon_2 \sim N(0, 0.001)$$ $$X_4 = X_3 + \varepsilon_4, \quad \varepsilon_4 \sim N(0, 0.1)$$ $$Y = X_4 + X_5 + X_4 \cdot X_5 + \varepsilon, \quad \varepsilon \sim N(0, 0.1)$$
Feature Properties:
X1, X3, X5: Independent uniform(0,1) distributions
X2: Nearly perfect copy of X1 (correlation ≈ 0.99)
X4: Noisy copy of X3 (correlation ≈ 0.67)
Y depends on X4, X5, and their interaction
References
Ewald F, Bothmann L, Wright M, Bischl B, Casalicchio G, König G (2024). “A Guide to Feature Importance Methods for Scientific Inference.” In Longo L, Lapuschkin S, Seifert C (eds.), Explainable Artificial Intelligence, 440–464. ISBN 978-3-031-63797-1, doi:10.1007/978-3-031-63797-1_22 .