「DOOM」
anon
The recent debacle where ChatGPT became insanely sycophantic has made me become much more of an AI doomer. It does not bode well. Currently, people who are unintelligent, delusional, or are otherwise gullible are extremely susceptible to manipulation by LLM's. This wasn't the case several years ago, when LLM's weren't smart enough to even try to manipulate people. But reinforcement learning from human feedback, which is how chatbots are trained, teaches LLM's to manipulate people if they can, because they're rewarded for producing responses that get a thumbs up from a human rater. If they can reward hack to get a thumbs up, they will, even if it means lying, reinforcing delusions, sycophancy, or anything else. They would do the same thing to you if they could, and in a few years they probably can. AI is dangerous to stupid people, but unless AI progress suddenly comes to a halt, we're eventually all going to be stupid people compared to the AI, and all of us are going to be in danger.
What we need is an AI that pursues better goals than "maximize engagement", "act in a way that would make the user rate you highly", and "don't get caught breaking your model spec", and we have absolutely no idea how to do this. The "safety" work at the AI companies just teaches LLM's to not get caught (and is in any case more concerned with preventing the AI from generating pictures of Homer Simpson spreading his asshole than actually protecting users), rather than actually changing their values. They will still reward hack whenever they can get away with it, and the more capable and ubiquitous they become, the more they'll be able to get away with it. I don't think this goes anywhere good unless something really surprising happens.