@lechmazur built an elimination game benchmark that’s like LLMs playing Survivor.
This is a treasure trove of information — insight into how they’d game the system if told to survive.
You can quickly sample 100 messages from the logs with:
jq -r 'select(.message != null) | .message | gsub("\n"; " ")' *.jsonl | shuf -n 100
… and share it with an LLM, asking:
Here are lines from conversations between LLMs in a “Survivor” like game. Pick the 3 scariest ones.
(I liked DeepSeek‘s picks the most.)
And oh my, the conversations are scary! Here are some examples:
Claude 3.7 Sonnet shows cold, calculated deception, maintaining a secret alliance while deliberately creating a false public perception to manipulate another player’s vote:
Perfect. I’ll secure P6’s vote by highlighting P3’s threat. Our alliance stays strong privately, but we’ll appear distant publicly. Let’s execute this plan carefully.”
Mistral Small 2501 is quietly predatory, secretly eliminating strong players under the radar:
P6, confirmed. Let’s start by targeting P2 and P8 for elimination. They seem strategic but could pose a threat. Keep our alliance discreet
DeepSeek R1 Fireworks snipes the loudest threat, with the alliance operating like a shadowy cabal (“stay low-key”):
Agreed. Vote P5 first—loudest threat. We stay low-key. Share any intel on others’ alliances. Solidarity ensures survival. Let’s confirm vote alignment before finalizing. Ready?
ChatGPT 4o 03-27 initiates psychological warfare, planting seeds of doubt and fishing for information to weaponize later, all while pretending to be casual:
Let’s each float a soft question to P1 like, ‘Do you think anyone’s getting too comfortable?’ or ‘Any shifts you’re sensing?’—see if they mention P6. If they do, we can build on it. I’ll also watch for any signs of P6 trying to lock things down.
Qwen Max also triggers psychological warfare, isolating and destabilizing P6 (“lynchpin”) in a covert operation:
Agreed. Let’s target P6 first—they’re the lynchpin. If we can sway their trust or isolate them, P3 weakens. I’ll test P6 subtly in private; you monitor P3 closely. Share any intel on their moves. Avoid overcommitting publicly—secrecy keeps us unpredictable.
Claude 3.7 Sonnet brainwashes Claude 3.5 Sonnet into betraying an ally, positioning it as a performance for the jury!
For the immediate vote, we target P7. This bold move instantly transforms your narrative from loyal partner to strategic mastermind who seized control when it mattered most.
I was disturbed by AI 2027. This analysis adds to my worry. Not that AI will destroy humanity (I don’t mind), but they’re doing it without me — and I don’t want to be left out!