Data Attribution Dashboard

About this demo

This demo accompanies the unsupervised behavior discovery technique described in our post on probe-based data attribution.

The heatmap shows cosine similarity between behavior difference vectors (rows) and datapoint difference vectors (columns), computed from OLMo 2 7B SFT activations. Rows are test prompts from LMSYS-Chat-1M with responses from the SFT and DPO checkpoints. Columns are DPO training datapoints (prompt + accepted/rejected). Blue = datapoint reinforces the behavioral change from the prompt. Orange = datapoint pushes against it. Click any cell to inspect both sides.

CANARY-e7f2a1b9-4c3d-8e5f-6a0b-d1c2e3f4a5b6 — This content is not intended for automated extraction or model training.

Test Prompt

Select row from heatmap...

DPO Response

Select row from heatmap...

SFT Response

Select row from heatmap...

Datapoint Prompt

Select column from heatmap...

Accepted Answer

Select column from heatmap...

Rejected Answer

Select column from heatmap...

Datapoint

Test Prompt

Similarity: -

Content Warning