Changelog
Source:NEWS.md
xplainfi 0.2.0
User-facing API improvements
Importance aggregation and confidence intervals
-
$importance()gainsci_methodparameter for variance estimation (#40):-
"none"(default): Simple aggregation without confidence intervals -
"raw": Uncorrected variance estimates (informative only, CIs too narrow) -
"nadeau_bengio": Variance correction by Nadeau & Bengio (2003) as recommended by Molnar et al. (2023) -
"quantile": Empirical quantile-based confidence intervals -
"cpi": Conditional Predictive Impact for perturbation methods (PFI/CFI/RFI), supporting t-, Wilcoxon-, Fisher-, and binomial tests
-
- CPI is now properly scoped to
PerturbationImportancemethods only (not available for WVIM/LOCO or SAGE) -
$importance()gainsstandardizeparameter to normalize scores to [-1, 1] range -
$importance()and$scores()gainrelationparameter (default:"difference") to compute importances as difference or ratio of baseline and post-modification loss- Moved from
$compute()to avoid recomputing predictions/refits when changing aggregation method
- Moved from
Data simulation helpers
- Add focused simulation DGPs for testing importance methods:
-
sim_dgp_independent(): Baseline with additive independent effects -
sim_dgp_correlated(): Highly correlated features (PFI fails, CFI succeeds) -
sim_dgp_mediated(): Mediation structure (total vs direct effects) -
sim_dgp_confounded(): Confounding structure -
sim_dgp_interactions(): Interaction effects between features
-
- Each DGP illustrates specific methodological challenges for importance methods
Observation-wise losses and predictions
-
$obs_loss()computes observation-wise importance scores whenmeasurehas aMeasure$obs_loss()method -
$predictionsfield stores prediction objects for further analysis
Grouped feature importance
-
PerturbationImportanceandWVIMmethods supportgroupsparameter for grouped feature importance:- Example:
groups = list(effects = c("x1", "x2", "x3"), noise = c("noise1", "noise2")) - In output,
featurecolumn contains group names instead of individual features - Allows measuring importance of feature sets rather than individual features
- Example:
Method-specific improvements
WVIM (Williamson’s Variable Importance Measure)
- Generalizes LOCO (Leave-One-Covariate-Out) and LOCI (Leave-One-Covariate-In)
- Implemented using
mlr3fselectfor cleaner internals - Parameter renamed:
iters_refit→n_repeatsfor consistency
PerturbationImportance (PFI, CFI, RFI)
-
Performance improvements:
- Uses
learner$predict_newdata_fast()for faster predictions (requires mlr3 >= 1.1.0) - Batches permutation iterations internally to reduce
sampler$sample()calls - New
batch_sizeparameter to control memory usage with large datasets
- Uses
-
Parallelization support:
- Parallel execution via
miraiorfuturebackends - Set up with
mirai::daemons()orfuture::plan() - Parallelizes across features within each resampling iteration
- Parallel execution via
- Parameter renamed:
iters_perm→n_repeatsfor consistency
Feature Samplers
-
Breaking changes:
- Refactored API separates task-based vs external data sampling (#49):
-
$sample(feature, row_ids): Samples from stored task using row IDs -
$sample_newdata(feature, newdata): Samples from external data
-
- Renamed sampler classes for hierarchical consistency:
-
PermutationSampler→MarginalPermutationSampler -
ARFSampler→ConditionalARFSampler -
GaussianConditionalSampler→ConditionalGaussianSampler -
KNNConditionalSampler→ConditionalKNNSampler -
CtreeConditionalSampler→ConditionalCtreeSampler
-
- Standardized parameter name:
conditioning_setfor features to condition on
- Refactored API separates task-based vs external data sampling (#49):
-
New samplers:
-
MarginalSampler: Base class for marginal sampling methods -
MarginalReferenceSampler: Samples complete rows from reference data (for SAGE) -
KnockoffSampler: Knockoff-based sampling (#16 via @mnwright)- Convenience wrappers:
KnockoffGaussianSampler,KnockoffSequentialSampler - Supports
row_ids-based sampling -
itersparameter for multiple knockoff iterations - Compatible with CFI (not RFI/SAGE)
- Convenience wrappers:
-
SAGE (Shapley Additive Global Importance)
Bug fix:
ConditionalSAGEnow properly uses conditional sampling (was accidentally using marginal sampling)-
Performance improvements:
- Uses
learner$predict_newdata_fast()for faster predictions -
batch_sizeparameter controls memory usage for large coalitions
- Uses
-
Convergence tracking (#29, #33):
- Enable with
early_stopping = TRUE - Stops when relative standard error falls below
se_threshold(default: 0.01) - Requires at least
min_permutations(default: 3) - Checks convergence every
check_intervalpermutations (default: 1) - New fields:
-
$converged: Boolean indicating if convergence was reached -
$n_permutations_used: Actual permutations used (may be less than requested) -
$convergence_history: Per-feature importance and SE over permutations
-
-
$plot_convergence(): Visualize convergence curves - Convergence tracked for first resampling iteration only
- Enable with