Concept Drift¶

What Is Concept Drift?¶

Concept drift occurs when the relationship between features and the target changes over time — even if the feature distributions themselves stay the same.

\[P(X|Y) \text{ changes over time}\]

Example

In 2019, "employed full-time" strongly predicted loan repayment. After 2020, many full-time employees in certain industries (hospitality, travel) defaulted at higher rates. The features didn't change, but their meaning did.

Types of Concept Drift¶

graph LR
    A[Concept Drift] --> B[Sudden]
    A --> C[Gradual]
    A --> D[Incremental]
    A --> E[Recurring]

    B --> B1["e.g., Policy change"]
    C --> C1["e.g., Market evolution"]
    D --> D1["e.g., Demographic shifts"]
    E --> E1["e.g., Seasonal patterns"]

Type	Speed	Example
Sudden	Instant	Regulatory change, pandemic
Gradual	Over weeks/months	Market trends, cultural shifts
Incremental	Very slow	Demographic evolution
Recurring	Periodic	Seasonal spending patterns

Detection Methods¶

Page-Hinkley Test¶

Detects changes in the mean of a time series (sequential detection):

import numpy as np

class PageHinkleyTest:
    """Online concept drift detector using Page-Hinkley test."""

    def __init__(self, delta=0.005, threshold=50, alpha=0.9999):
        self.delta = delta
        self.threshold = threshold
        self.alpha = alpha
        self.reset()

    def reset(self):
        self.n = 0
        self.sum = 0
        self.x_mean = 0
        self.m_t = 0
        self.M_t = 0

    def update(self, x):
        self.n += 1
        self.x_mean = self.x_mean + (x - self.x_mean) / self.n
        self.sum += x - self.x_mean - self.delta

        self.m_t = min(self.m_t, self.sum)
        self.M_t = self.sum - self.m_t

        if self.M_t > self.threshold:
            self.reset()
            return True  # Drift detected!
        return False

# Usage: monitor model error over time
detector = PageHinkleyTest(threshold=50)
for i, error in enumerate(daily_errors):
    if detector.update(error):
        print(f"⚠️ Concept drift detected at step {i}!")

ADWIN (Adaptive Windowing)¶

ADWIN maintains a variable-length window and detects drift when the means of two sub-windows differ significantly:

# Using river library
from river.drift import ADWIN

adwin = ADWIN(delta=0.002)

for i, val in enumerate(metric_stream):
    in_drift, in_warning = adwin.update(val)
    if in_drift:
        print(f"Drift detected at index {i}, window size: {adwin.width}")

Monitoring Feature-Target Correlation¶

import pandas as pd
import numpy as np

def monitor_concept_drift(data_windows, feature, target):
    """Track correlation between a feature and target over time windows."""
    correlations = []
    for window_name, window_data in data_windows.items():
        corr = window_data[feature].corr(window_data[target])
        correlations.append({'Window': window_name, 'Correlation': corr})

    result = pd.DataFrame(correlations)

    # Flag significant changes
    if len(result) > 1:
        max_change = result['Correlation'].diff().abs().max()
        if max_change > 0.1:
            print(f"⚠️ Concept drift: {feature} correlation shifted by {max_change:.3f}")

    return result

Response Strategies¶

Drift Type	Response
Sudden	Immediate retrain on recent data
Gradual	Scheduled retrain with expanding window
Incremental	Sliding window retrain
Recurring	Season-aware models or model ensemble

Next: Monitoring RAI →