Advanced Techniques: Additive Counterfactual Fairness¶
Overview¶
Additive Counterfactual Fairness (ACF) is applied during model training and can handle multiple protected features simultaneously for both classification and regression.
Key Differences from Reweighting¶
| Feature | Reweighting | ACF |
|---|---|---|
| When applied | Before training | During training |
| Protected features | One at a time (or composite) | Multiple simultaneously |
| Algorithm support | Classification only | Classification & regression |
| Data modification | None (weights only) | Uses residuals |
| Accuracy impact | Minimal | Moderate |
How ACF Works¶
The core idea: remove the influence of protected features by working with residuals — the part of a feature that can't be predicted from the protected attributes.
graph TD
A[Original Features X] --> B[Regress each X_i on S]
B --> C[Compute Residuals]
C --> D[Train Model on Residuals]
D --> E[Fair Predictions]
S[Protected Features S] --> B
Step-by-Step¶
- For each independent feature \(X_i\), fit a regression model:
\[X_i = f(S_1, S_2, \ldots, S_k) + \epsilon_i\]
- Compute residuals — the part of \(X_i\) unexplained by protected features:
\[\tilde{X}_i = X_i - f(S_1, S_2, \ldots, S_k)\]
- Train your final model on the residuals \(\tilde{X}\) instead of the original features
Python Implementation¶
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import GradientBoostingClassifier
def additive_counterfactual_fairness(X, S, y):
"""Apply ACF: remove protected feature influence from independent features."""
X_residuals = pd.DataFrame(index=X.index)
residual_models = {}
for col in X.columns:
# Regress each feature on all protected features
reg = LinearRegression()
reg.fit(S, X[col])
predicted = reg.predict(S)
# Residual = original - predicted from protected features
X_residuals[col] = X[col] - predicted
residual_models[col] = reg
return X_residuals, residual_models
# Usage
protected_features = ['gender', 'age_group', 'marital_status']
independent_features = [c for c in df.columns
if c not in protected_features + ['default']]
S = df[protected_features]
X = df[independent_features]
y = df['default']
# Get fair features
X_fair, models = additive_counterfactual_fairness(X, S, y)
# Train on fair features
clf = GradientBoostingClassifier()
clf.fit(X_fair, y)
Verify Fairness
After ACF, check that fairness metrics improved:
Trade-off
ACF removes all information correlated with protected features — including legitimate predictive signal. Monitor accuracy alongside fairness metrics.
Combining Techniques¶
For maximum bias reduction, you can combine approaches:
- ACF to remove protected feature influence from independent features
- Reweighting on the residuals for additional fairness improvement
- ROC (Chapter 6) on the output as a final safety net
graph LR
A[Raw Data] -->|ACF| B[Fair Features]
B -->|Reweighting| C[Weighted Fair Features]
C -->|Train Model| D[Predictions]
D -->|ROC| E[Fair Predictions]
Back to: Chapter 5 Overview ←