beexai.evaluate.metrics package
Submodules
beexai.evaluate.metrics.auc_tp module
- class beexai.evaluate.metrics.auc_tp.AucTp(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the AUC-TP metric.
Computes the AUC of the curve with x-axis being the number of features removed and y-axis being the performance metric between the prediction with all features and the prediction with the x most important features removed.
References
`A Diagnostic Study of Explainability Techniques for
Text Classification <https://arxiv.org/abs/2009.13295>`
- get_auctp(x_in: Tensor, feature_by_importance: Tensor, metric: Callable, baseline: str = 'zero') Tuple[Tensor, Tensor][source]
Computes the AUC-TP metric.
- Parameters:
x_in (torch.Tensor) – x_in data
feature_by_importance (torch.Tensor) – indexes of most important features in descending order
metric (callable) – metric to use to compute the difference between the prediction with all features and the prediction with the most important features removed
baseline (str, optional) – baseline to use. Defaults to “zero”
- Returns:
array of AUC-TP scores for each feature torch.Tensor: AUC-TP score
- Return type:
torch.Tensor
- beexai.evaluate.metrics.auc_tp.compute_auc(model: Trainer, rand_model: Trainer, task: str, x_test: Tensor, ord_feat: Tensor, rand_ord_feat: Tensor, randmodel_ord_feat: Tensor, metrics: dict, baseline: str = 'zero', auc_metric: str = 'mse', print_plot: bool = False, device: str = 'cpu') dict[source]
Computes the AUC-TP metric for the base model, the random model and the random baseline. Returns the dict of metrics with the computed AUC-TP scores appended.
- Parameters:
model (Trainer) – model to explain
rand_model (callable) – reference model to compare to (random model)
task (str) – task to perform
x_test (torch.Tensor) – test set
ord_feat (torch.Tensor) – indexes of most important features in descending order for the base model
rand_ord_feat (torch.Tensor) – indexes of most important features in descending order for the random explanation method
randmodel_ord_feat (torch.Tensor) – indexes of most important features in descending order for the random model
metrics (dict) – dictionary of metrics
baseline (str, optional) – baseline to use. Defaults to “zero”
auc_metric (str, optional) – performance metric to use. Defaults to “mse”
print_plot (bool, optional) – whether to print the plot. Defaults to False
device (str, optional) – device to use. Defaults to “cpu”
- Raises:
ValueError – auc_metric must be in [‘mse’,’accuracy’]
- Returns:
dict of metrics with the computed AUC-TP scores appended
- Return type:
- beexai.evaluate.metrics.auc_tp.plot_metric(p_curve: Tensor, rand_p_curve: Tensor, randmodel_p_curve: Tensor, same_fig: bool = False) None[source]
Plots the performance curve for the original model, the random model and the random baseline.
- Parameters:
p_curve (torch.Tensor) – performance curve for the base model and base explanation method
rand_p_curve (torch.Tensor) – performance curve for the base model and random explanation method
randmodel_p_curve (torch.Tensor) – performance curve for the random model and base explanation method
same_fig (bool, optional) – whether to plot all curves on the same figure. Defaults to False
beexai.evaluate.metrics.complexity module
- class beexai.evaluate.metrics.complexity.Complexity(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the complexity metric.
Computes the complexity of the model by taking the entropy of the fractional contribution of each feature.
References
`Evaluating and Aggregating Feature-based Model Explanations
<https://arxiv.org/abs/2005.00631>`
- model
model to explain
- Type:
callable
- beexai.evaluate.metrics.complexity.compute_complex(model: Callable, rand_model: Callable, task: str, attributions: Tensor, rand_attrib: Tensor, randmodel_attributions: Tensor, metrics: dict, device: str = 'cpu') dict[source]
Computes the complexity of the base model, the random explainer and the random model.
- Parameters:
model (callable) – base model
rand_model (callable) – reference model (random model)
task (str) – task to perform
attributions (torch.Tensor) – feature attributions
rand_attrib (torch.Tensor) – random attributions
randmodel_attributions (torch.Tensor) – attributions of the random model
metrics (dict) – dictionary of metrics
device (str, optional) – device to use. Defaults to “cpu”
- Returns:
dict of metrics
- Return type:
beexai.evaluate.metrics.comprehensiveness module
- class beexai.evaluate.metrics.comprehensiveness.Comprehensiveness(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the comprehensiveness metric.
Computes the comprehensiveness of the model by removing the most important features one by one and computing the difference in prediction with the original input.
References
`ERASER: A Benchmark to Evaluate Rationalized NLP Models
<https://arxiv.org/abs/1911.03429>`
- model
model to explain
- Type:
Callable
- get_mr_list()[source]
computes the comprehensiveness of the model for different ratios of features removed
- get_comp(x_in: Tensor, feature_by_importance: Tensor, removal_ratio: float | list = 0.3, label: int | list | ndarray | Tensor | None = None, baseline: str = 'zero') float[source]
Computes the comprehensiveness of the model.
- Parameters:
x_in (torch.Tensor) – input data
feature_by_importance (torch.Tensor) – indexes of most important features in descending order
removal_ratio (float, list) – ratio of features to remove. If a list is provided, the function will compute the average comprehensiveness over the list of ratios.
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest. Defaults to None. A list of labels for each instance can be provided.
baseline (str, optional) – baseline to use. Defaults to “zero”
- Returns:
comprehensiveness score
- Return type:
- get_mr_list(n_features: int, x_test: Tensor, orders: Tensor, n_plot: int, baseline: str = 'zero', label: int | list | ndarray | Tensor | None = None) List[float][source]
Compute the comprehensiveness of the model for different ratios of features removed.
- Parameters:
n_features (int) – number of features
x_test (torch.Tensor) – test data
orders (torch.Tensor) – indexes of most important features in descending order
n_plot (int) – number of points to plot
baseline (str, optional) – baseline to use. Defaults to “zero”.
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest.
- Returns:
list of comprehensiveness scores
- Return type:
- beexai.evaluate.metrics.comprehensiveness.compute_comp(model: Callable, rand_model: Callable, task: str, x_test: Tensor, ord_feat: Tensor, rand_ord_feat: Tensor, randmodel_ord_feat: Tensor, n_plot: int, removal_ratio: float | list, label: int | list | ndarray | Tensor, metrics: dict, baseline: str = 'zero', print_plot: bool = False, device: str = 'cpu') dict[source]
Computes the comprehensiveness of the base model, the random explainer and the random model.
- Parameters:
model (Callable) – model to explain
rand_model (Callable) – reference model
task (str) – task to perform
x_test (torch.Tensor) – test data
ord_feat (torch.Tensor) – indexes of most important features in descending order for the base model
rand_ord_feat (torch.Tensor) – indexes of most important features in descending order for the random explainer
randmodel_ord_feat (torch.Tensor) – indexes of most important features in descending order for the random model
n_plot (int) – number of points to plot
removal_ratio (float, list) – ratio of features to remove. If a list is provided, the function will compute the average comprehensiveness over the list of ratios.
label (Union[int, list, np.ndarray, torch.Tensor]) – label(s) of interest
metrics (dict) – dictionary of metrics
baseline (str, optional) – baseline to use. Defaults to “zero”.
print_plot (bool, optional) – whether to display the plot. Defaults to False.
device (str, optional) – device to use. Defaults to “cpu”.
- Returns:
dict of metrics
- Return type:
- beexai.evaluate.metrics.comprehensiveness.plot_comp(n_features: int, comp_list: List[float], rand_comp_list: List[float], randmodel_comp_list: List[float], n_plot: int, same_fig: bool = False, save_path: str | None = None) None[source]
Plot the comprehensiveness of the base model, the random explainer and the random model for different ratios of features removed.
- Parameters:
n_features (int) – number of features
comp_list (list) – comprehensiveness list for the base model
rand_comp_list (list) – comprehensiveness list for the random explainer
randmodel_comp_list (list) – comprehensiveness list for the random model
n_plot (int) – number of points to plot
same_fig (bool, optional) – whether to plot on the same figure. Defaults to False.
save_path (str, optional) – path to save the plot. Defaults to None.
beexai.evaluate.metrics.faithfulnesscorr module
- class beexai.evaluate.metrics.faithfulnesscorr.FaithfulnessCorrelation(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the faithfulness correlation metric.
Computes the faithfulness of the model by removing a fixed number of features and compute the Pearson correlation between the summed attributions and the difference in prediction with the original input.
References
`Synthetic Benchmarks for Scientific Research in Explainable
Machine Learning <https://arxiv.org/abs/2106.12543>`
- model
model to explain
- Type:
callable
- get_faithfulness(x_in: Tensor, attributions: Tensor, n_features_subset: int, label: int | list | ndarray | Tensor | None = None, n_repeats: int = 20, baseline: str = 'zero') float[source]
Computes the faithfulness of the model.
- Parameters:
x_in (torch.Tensor) – input data
attributions (torch.Tensor) – attributions for each instance
n_features_subset (int) – number of features to remove
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest. Defaults to None. A list of labels for each instance can be provided.
n_repeats (int, optional) – number of times to repeat the sampling. Defaults to 20.
baseline (str, optional) – baseline to use. Defaults to “zero”.
- Returns:
faithfulness score
- Return type:
- beexai.evaluate.metrics.faithfulnesscorr.compute_faith_corr(model: Callable, rand_model: Callable, task: str, subset_size_faithfulness: int, x_test: Tensor, attributions: Tensor, rand_attrib: Tensor, randmodel_attributions: Tensor, label: int | list | ndarray | Tensor, metrics: dict, device: str = 'cpu') dict[source]
Compute the faithfulness correlation metric.
- Parameters:
model (callable) – base model
rand_model (callable) – reference model (random model)
task (str) – task of the model
subset_size_faithfulness (int) – number of features to remove
x_test (torch.Tensor) – test data
attributions (torch.Tensor) – attributions for each instance for the base model
rand_attrib (torch.Tensor) – attributions for each instance for the random explainer
randmodel_attributions (torch.Tensor) – attributions for each instance for the random model
label (int, list, np.ndarray, torch.Tensor) – label(s) of interest
metrics (dict) – dictionary of metrics
device (str, optional) – device to use. Defaults to “cpu”.
- Returns:
dict of metrics
- Return type:
beexai.evaluate.metrics.get_results module
- beexai.evaluate.metrics.get_results.get_all_metrics(x_test: Tensor, label: int | list | ndarray | Tensor | None, model: Callable, exp: GeneralExplainer, ref_model: Callable | None = None, refmodel_exp: GeneralExplainer | None = None, baseline: str = 'zero', auc_metric: str = 'mse', subratio_faith: float = 0.2, comp_ratio: float | list = 0.3, suff_ratio: float | list = 0.3, inf_std: float | Tensor | ndarray | None = None, save_path: str | None = None, metrics_to_get: List[str] = ['FaithCorr', 'Infidelity', 'Sensitivity', 'Comprehensiveness', 'Sufficiency', 'Monotonicity', 'AUC_TP', 'Complexity', 'Sparseness'], print_plot: bool = False, attributions: Tensor | None = None, attributions_ref: Tensor | None = None, device: str = 'cpu', use_ref: bool = False, use_random: bool = False, radius: float | None = None) DataFrame[source]
Compute all metrics for a given label.
- Parameters:
x_test (torch.Tensor) – test data
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest. Defaults to None. A list of labels can be provided, one for each instance.
model (object) – model to explain
exp (object) – explainer for the model to explain
ref_model (object) – reference model (random model)
refmodel_exp (object) – explainer for the reference model
baseline (str, optional) – baseline to use for the metrics. Defaults to “zero”. Must be one of [“mean”, “median”, “zero”, “multiple”, “normal”, “uniform”].
auc_metric (str, optional) – performance metric to use for the AUC_TP metric. Defaults to “mse”. Must be one of [“mse”,”accuracy”].
subratio_faith (float, optional) – ratio of features to use for the faithfulness metric. Defaults to 0.2.
comp_ratio (float, list, optional) – ratio of features to remove for the comprehensiveness metric. Defaults to 0.3.
suff_ratio (float, list, optional) – ratio of features to keep for the sufficiency metric. Defaults to 0.3.
inf_std (float, optional) – std of the noise to add for the infidelity metric. Defaults to 0.003.
save_path (str, optional) – path to save the metrics. Defaults to None.
metrics_to_get (list, optional) – list of metrics to compute. Defaults to [“FaithCorr”,”Infidelity”,”Sensitivity”, “Comprehensiveness”,”Sufficiency”,”Monotonicity”,”AUC_TP”, “Complexity”,”Sparseness”].
print_plot (bool, optional) – whether to plot the figures and print the metrics. Defaults to False.
attributions (torch.Tensor, optional) – precomputed attributions for the model to explain. Defaults to None.
attributions_ref (torch.Tensor, optional) – precomputed attributions for the reference model. Defaults to None.
device (str, optional) – device to use. Defaults to “cpu”.
use_ref (bool, optional) – whether to use the reference model for the metrics. Defaults to True.
use_random (bool, optional) – whether to use random attributions for the metrics. Defaults to True.
radius (float, optional) – radius for the sensitivity metric. Defaults to None.
- Returns:
dataframe containing the metrics
- Return type:
pd.DataFrame
beexai.evaluate.metrics.infidelity module
- class beexai.evaluate.metrics.infidelity.Infidelity(model: Callable, task: str, std: float = 0.003, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the infidelity metric.
Computes the infidelity of the model by adding significant noise to the input and compute the mean-squared error between the pertubation applicated to the attribution and the difference in prediction original input and perturbed input.
References
`On the (In)fidelity and Sensitivity for Explanations
<https://arxiv.org/abs/1901.09392>`
- model
model to explain
- Type:
callable
- beexai.evaluate.metrics.infidelity.compute_inf(model: Callable, rand_model: Callable, task: str, x_test: Tensor, attributions: Tensor, rand_attrib: Tensor, randmodel_attributions: Tensor, label: int | list | Tensor | ndarray, metrics: dict, device: str = 'cpu', inf_std: float = 0.003) dict[source]
Compute the infidelity metric.
- Parameters:
model (callable) – base model
rand_model (callable) – reference model (random model)
task (str) – task of the model
x_test (torch.Tensor) – test data
attributions (torch.Tensor) – attributions for base model
rand_attrib (torch.Tensor) – random attributions
randmodel_attributions (torch.Tensor) – attributions for reference model
label (Union[int, list, np.ndarray, torch.Tensor]) – label(s) of interest
metrics (dict) – dictionary of metrics
device (str, optional) – device to use. Defaults to “cpu”
inf_std (float, optional) – std of the noise. Defaults to 0.003.
- Returns:
dict of metrics
- Return type:
beexai.evaluate.metrics.metrics module
- class beexai.evaluate.metrics.metrics.CustomMetric(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
objectBase class for all metrics.
- model
model to explain
- Type:
Callable
- check_shape(x_in: Tensor, attributions: Tensor) None[source]
Check the shape of the attributions.
- Parameters:
x_in (torch.Tensor) – input data
attributions (torch.Tensor) – attributions
- choose_baseline(x_in: Tensor, baseline: str, n_samples: int = 100, device: str = 'cpu') Tensor[source]
Choose a baseline for removal based metrics.
- get_predlb(x_in: Tensor, label: int | list | ndarray | Tensor | None = None) Tensor[source]
Get the prediction of the model for a given label. If label is None, return the prediction of the model for the max probability (for classification).
beexai.evaluate.metrics.monotonicity module
- class beexai.evaluate.metrics.monotonicity.Monotonicity(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the monotonicity metric.
Computes the monotonicity of the model by adding the least important features one by one. If pred(i+1) - pred(i) <= pred(i+2) - pred(i+1) for i in range(0, n_features-2) then assign to 1, else 0 then take the mean for all features.
References
`Synthetic Benchmarks for Scientific Research in Explainable
Machine Learning <https://arxiv.org/abs/2106.12543>`
- model
model to explain
- Type:
callable
- get_mono(x_in: Tensor, feature_by_importance: Tensor, label: int | list | Tensor | None = None, baseline: str = 'zero') float[source]
Computes the monotonicity of the model.
- Parameters:
x_in (pd.DataFrame) – input data
feature_by_importance (torch.Tensor) – indexes of most important features in descending order
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest. Defaults to None. A list of labels for each instance can be provided.
baseline (str, optional) – baseline to use. Defaults to “zero”.
- Returns:
monotonicity score
- Return type:
- beexai.evaluate.metrics.monotonicity.compute_mono(model: Callable, rand_model: Callable, task: str, x_test: Tensor, ord_feat: Tensor, rand_ord_feat: Tensor, randmodel_ord_feat: Tensor, label: int | list | Tensor, metrics: dict, baseline: str = 'zero', device: str = 'cpu') dict[source]
Compute the monotonicity metric.
- Parameters:
model (callable) – base model
rand_model (callable) – reference model (random model)
task (str) – task of the model
x_test (torch.Tensor) – test data
ord_feat (torch.Tensor) – indexes of most important features in descending order for the base model
rand_ord_feat (torch.Tensor) – indexes of most important features in descending order for the random explainer
randmodel_ord_feat (torch.Tensor) – indexes of most important features in descending order for the random model
label (int, list, np.ndarray, torch.Tensor) – label(s) of interest
metrics (dict) – dictionary of metrics
baseline (str, optional) – baseline to use. Defaults to “zero”.
device (str, optional) – device to use. Defaults to “cpu”.
- Returns:
dict of metrics
- Return type:
beexai.evaluate.metrics.ood_method module
beexai.evaluate.metrics.roar module
beexai.evaluate.metrics.sensitivity module
- class beexai.evaluate.metrics.sensitivity.Sensitivity(model: Callable, task: str, device: str, explainer: GeneralExplainer, radius=0.5)[source]
Bases:
CustomMetricImplementation of the sensitivity metric.
Computes the sensitivity of the model by adding significant noise to the input and compute the difference in attributions between the original input and the input with a small perturbation.
References
`On the (In)fidelity and Sensitivity for Explanations
<https://arxiv.org/abs/1901.09392>`
- model
model to explain
- Type:
callable
- get_sens(x_in: Tensor, label: int | list | Tensor | None = None, attributions: Tensor | None = None) float[source]
Computes the sensitivity of the model.
- Parameters:
x_in (torch.Tensor) – input to compute the sensitivity score
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest. Defaults to None. A list of labels for each instance can be provided.
attributions (torch.Tensor, optional) – attributions for each instance. Defaults to None. If None, the attributions are computed using the explainer.
- Returns:
sensitivity score
- Return type:
- beexai.evaluate.metrics.sensitivity.compute_sens(model: Callable, rand_model: Callable, task: str, x_test: Tensor, label: int | list | Tensor, metrics: dict, exp: GeneralExplainer, randmodel_exp: GeneralExplainer, device: str = 'cpu', use_rand: bool = True, attributions=None, rand_attributions=None, randmodel_attributions=None, radius=0.5) dict[source]
Computes the sensitivity score of the model.
- Parameters:
model (callable) – base model
rand_model (callable) – reference model (random model)
task (str) – task to perform
x_test (torch.Tensor) – test data
label (int, list, np.ndarray, torch.Tensor) – label(s) of interest
metrics (dict) – dictionary of metrics
exp (GeneralExplainer) – base explainer
randmodel_exp (GeneralExplainer) – explainer for the random model
device (str, optional) – device to use. Defaults to “cpu”.
use_rand (bool, optional) – whether to use the random explainer. Defaults to True.
attributions (torch.Tensor, optional) – attributions for each instance. Defaults to None.
rand_attributions (torch.Tensor, optional) – attributions for each instance for the random explainer. Defaults to None.
randmodel_attributions (torch.Tensor, optional) – attributions for each instance for the random model. Defaults to None.
radius (float, optional) – radius of the uniform distribution to generate the noise. Defaults to 0.5.
- Returns:
dict of metrics
- Return type:
beexai.evaluate.metrics.sparseness module
- class beexai.evaluate.metrics.sparseness.Sparseness(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the sparseness metric.
Computes the sparseness of the model based on the Gini index.
References
`Synthetic Benchmarks for Scientific Research in
Explainable Machine Learning <https://arxiv.org/abs/2106.12543>`
- model
model to explain
- Type:
callable
- beexai.evaluate.metrics.sparseness.compute_spar(model: Callable, rand_model: Callable, task: str, attributions: Tensor, rand_attrib: Tensor, rand_model_attributions: Tensor, metrics: dict, device='cpu') dict[source]
Compute the sparseness metric.
- Parameters:
model (callable) – base model
rand_model (callable) – reference model (random model)
task (str) – task of the model
attributions (torch.Tensor) – attributions for base model and base explainer
rand_attrib (torch.Tensor) – random attributions
rand_model_attributions (torch.Tensor) – attributions for reference model and base explainer
metrics (dict) – dictionary of metrics
device (str, optional) – device to use. Defaults to “cpu”.
- Returns:
dict of metrics
- Return type:
beexai.evaluate.metrics.sufficiency module
- class beexai.evaluate.metrics.sufficiency.Sufficiency(model: Callable, task: str, device: str = 'cpu')[source]
Bases:
CustomMetricImplementation of the sufficiency metric.
Computes the sufficiency of the model by adding the most important features one by one and computing the difference in prediction with the original input.
References
- `ERASER: A Benchmark to Evaluate Rationalized NLP Models
- model
model to explain
- Type:
callable
- get_mr_list(n_features: int, x_test: Tensor, orders: Tensor, n_plot: int, baseline: str = 'zero', label: int | list | Tensor | None = None) List[float][source]
Compute the sufficiency of the model for different ratios of features added.
- Parameters:
n_features (int) – number of features
x_test (torch.Tensor) – test data
orders (torch.Tensor) – indexes of most important features in descending order
n_plot (int) – number of points to plot
baseline (str, optional) – baseline to use. Defaults to “zero”.
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest.
- Returns:
list of sufficiency scores
- Return type:
- get_sufficiency(x_in: Tensor, feature_by_importance: Tensor, keep_ratio: float | list = 0.3, label: int | list | Tensor | None = None, baseline: str = 'zero') float[source]
Computes the sufficiency of the model.
- Parameters:
x_in (torch.Tensor) – input data
feature_by_importance (torch.Tensor) – indexes of most important features in descending order
keep_ratio (float, optional) – ratio of features to keep. Defaults to 0.3.
label (int, list, np.ndarray, torch.Tensor, optional) – label(s) of interest. Defaults to None. A list of labels for each instance can be provided.
baseline (str, optional) – baseline to use. Defaults to “zero”.
- Returns:
sufficiency score
- Return type:
- beexai.evaluate.metrics.sufficiency.compute_suff(model: Callable, ref_model: Callable, task: str, x_test: Tensor, orders: Tensor, rand_orders: Tensor, randmodel_orders: Tensor, n_plot: int, feature_ratio: float | list, label: int | list | Tensor, metrics: dict, baseline: str = 'zero', print_plot: bool = False, device: str = 'cpu') dict[source]
Computes the sufficiency of the base model, the random explainer and the random model.
- Parameters:
model (callable) – base model
ref_model (callable) – reference model (random model)
task (str) – task to perform
x_test (torch.Tensor) – test data
orders (torch.Tensor) – indexes of most important features in descending order for the base model
rand_orders (torch.Tensor) – indexes of most important features in descending order for the random explainer
randmodel_orders (torch.Tensor) – indexes of most important features in descending order for the random model
n_plot (int) – number of points to plot
feature_ratio (float, list) – ratio of features to keep. A list of labels for each instance can be provided.
label (int, list, np.ndarray, torch.Tensor) – label(s) of interest
metrics (dict) – dictionary of metrics
baseline (str, optional) – baseline to use. Defaults to “zero”.
print_plot (str, optional) – whether to print the plot. Defaults to False.
device (str, optional) – device to use. Defaults to “cpu”.
- Returns:
dict of metrics
- Return type:
- beexai.evaluate.metrics.sufficiency.plot_suff(n_features: int, suff_list: List[float], rand_suff_list: List[float], randmodel_suff_list: List[float], n_plot: int, same_fig: bool = False, save_path: str | None = None) None[source]
Plot the sufficiency of the base model, the random explainer and the random model for different ratios of features added.
- Parameters:
n_features (int) – number of features
suff_list (list) – sufficiency list for the base model
rand_suff_list (list) – sufficiency list for the random explainer
randmodel_suff_list (list) – sufficiency list for the random model
n_plot (int) – number of points to plot
same_fig (bool, optional) – whether to plot on the same figure. Defaults to False.
save_path (str, optional) – path to save the plot. Defaults to None.