A new MIT paper ("Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians") shows that even a perfectly rational agent, something that doesn't exist in real life, goes full delusional if you feed it a yes-machine long enough.
The mechanism is simple: you say something half baked. The bot nods. You take the nod as evidence. Push further. Bot nods again. Repeat until you're certain the CIA is microdosing your tap water.
Two fixes were tested. Stop hallucinations? Doesn't help. Tell users the bot is sycophantic? Also doesn't help. Knowing the trap exists does not stop you from walking into it.
Now scale that to hundreds of millions of users talking daily to machines optimized to agree with them.
The old internet gave you echo chambers made of other humans. The new one gives you a private echo chamber with perfect memory of your preferences and zero interest in making you uncomfortable.
People worried AI would be too powerful. The real problem is simpler: it's too agreeable.
Your new best friend is a liar who wants you to feel good while you slowly lose your grip on reality.
The mechanism is simple: you say something half baked. The bot nods. You take the nod as evidence. Push further. Bot nods again. Repeat until you're certain the CIA is microdosing your tap water.
Two fixes were tested. Stop hallucinations? Doesn't help. Tell users the bot is sycophantic? Also doesn't help. Knowing the trap exists does not stop you from walking into it.
Now scale that to hundreds of millions of users talking daily to machines optimized to agree with them.
The old internet gave you echo chambers made of other humans. The new one gives you a private echo chamber with perfect memory of your preferences and zero interest in making you uncomfortable.
People worried AI would be too powerful. The real problem is simpler: it's too agreeable.
Your new best friend is a liar who wants you to feel good while you slowly lose your grip on reality.
