Samples complete observations from reference data to replace feature values. This approach samples from the marginal distribution while preserving within-row feature dependencies.
Details
This sampler implements what is called "marginal imputation" in the SAGE literature (Covert et al. 2020). For each observation, it samples a complete row from reference data and takes the specified feature values from that row. This approach:
Samples from the marginal distribution \(P(X_S)\) where S is the set of features
Preserves dependencies within the sampled reference row
Breaks dependencies between test and reference data
Terminology note: In SAGE literature, this is called "marginal imputation" because
features outside the coalition are "imputed" by sampling from their marginal distribution.
We use MarginalReferenceSampler to avoid confusion with missing data imputation and to
clarify that it samples from reference data.
Comparison with other samplers:
MarginalPermutationSampler: Shuffles each feature independently, breaking all row structureMarginalReferenceSampler: Samples complete rows, preserving within-row dependenciesConditionalSampler: Samples from \(P(X_S | X_{-S})\), conditioning on other features
Use in SAGE:
This is the default approach for MarginalSAGE. For a test observation x and features
to marginalize S, it samples a reference row x_ref and creates a "hybrid" observation
combining x's coalition features with x_ref's marginalized features.
References
Covert I, Lundberg S, Lee S (2020). “Understanding Global Feature Contributions With Additive Importance Measures.” In Advances in Neural Information Processing Systems, volume 33, 17212–17223. https://proceedings.neurips.cc/paper/2020/hash/c7bf0b7c1a86d5eb3be2c722cf2cf746-Abstract.html.
Super classes
xplainfi::FeatureSampler -> xplainfi::MarginalSampler -> MarginalReferenceSampler
Public fields
reference_data(
data.table) Reference data to sample from for marginalization.
Methods
Method new()
Creates a new instance of the MarginalReferenceSampler class.
Usage
MarginalReferenceSampler$new(task, n_samples = NULL)Arguments
task(mlr3::Task) Task to sample from.
n_samples(
integer(1)|NULL) Number of reference samples to use. IfNULL, uses all task data as reference.
Examples
library(mlr3)
task = tgen("friedman1")$generate(n = 100)
# Default: uses all task data as reference
sampler = MarginalReferenceSampler$new(task)
sampled = sampler$sample("important1", row_ids = 1:10)
# Subsample reference data to 50 rows
sampler_subsampled = MarginalReferenceSampler$new(task, n_samples = 50L)
sampled2 = sampler_subsampled$sample("important1", row_ids = 1:10)