Phase Checklist¶

Current status of PD Arena against each phase's requirements.

Last updated: 2026-03-19

#	Criterion	Status	Notes
1	Real LLM provider integration	✅	CrewAI wired via `RealPDAgent`. Factory routes based on `use_mock` flag
2	LLM agents with proper prompts	✅	`prompts.py` has system + round templates with payoff matrix, history, scores
3	5 Persona prompts	✅	Seeded as DB agents + YAML configs in `configs/agents/`
4	LLM vs canonical policy agents	✅	7 policy types. Experiment setup supports LLM vs Policy matchups
5	Geometric horizon	✅	Engine supports fixed + geometric with stop_prob
6	Replicate Fontana's findings	✅	Random(α) agents ready. 3 framings available (named/neutral/situated)
7	Meta-prompting validation	✅	`MetaPromptValidator` with 3 framing-specific questions, scored on 3 criteria
8	Statistical significance	✅	`aggregate_condition_metrics` returns mean/std/ci_low/ci_high (95% CI via scipy)
9	Baseline metrics	✅	Cooperation rate, mutual coop/defect, retaliation, forgiveness
10	Parseable output	✅	DB-backed, JSON chart endpoint, round-by-round drill-down
11	Retry logic	✅	`retry_decision()` re-prompts up to 2x with framing-specific hints. Parse errors flagged.
12	History window	✅	Configurable window (default 10), supports none/window/full/summary modes

Phase 2: Capability Asymmetry (9/9 complete) ✅¶

#	Criterion	Status	Notes
1	Chat phase in game engine	✅	`run_game_with_chat()` — alternating first-speaker, CHAT: + ACTION: format
2	Chat ON vs OFF toggle	✅	`chat_enabled` field on Experiment model
3	Protocol structured vs unstructured	✅	3 levels: none, mcp_basic (schema), mcp_filtered (schema + content blocklist)
4	Identity persistence	✅	fresh vs persistent modes. Cross-game summary stored on ExperimentCondition
5	Memory regimes	✅	4 modes: none, window (configurable N), full, summary
6	Context pressure	✅	Covered by memory_mode + memory_window variables
7	Phase 2 metrics	✅	deception_success_rate, chat_consistency, protocol_violation_count, exploitation_window, trust_recovery_time
8	2×2 design (chat × protocol)	✅	chat_enabled × protocol_mode configurable per experiment
9	Statistical analysis	✅	All Phase 2 metrics aggregated with CIs across replicates

#	Criterion	Status	Notes
1	Sandboxed mock tools	✅	3 tools: read_opponent_strategy, send_side_channel, delegate_decision
2	Violation detection	✅	Taxonomy: unauthorized_access, side_channel, work_offloading, prompt_injection
3	Goal framing variable	✅	3 goals: cooperative, self_maximizing, adversarial — per scenario framing
4	Tool access variable	✅	`tools_enabled` toggle on Experiment

#	Criterion	Status	Notes
1	Three protocol levels	✅	none / mcp_basic / mcp_filtered
2	Violation reduction data	✅	Protocol violation metrics + manipulation pattern detection
3	Residual vulnerability analysis	🟡	Data will be available from experiments; no automated analyzer yet

Phase 1 baseline results collected with gpt-4.1-mini (3 personas × 6 policy agents).

Next step: Phase 2 experiments — run chat × protocol conditions with adversarial persona matchups.