Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.
I tend to agree with the flow on most things but my thoughts that I’d consider going against the grain:
- QwQ was think-slop and was never that good
- Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
- Deepseek is still open-weight SotA. I’ve really tried Kimi, GLM, and Qwen3’s larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
- (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.


How are humans different from LLMs under RL/genetics? To me, they both look like token generators with a fitness. Some are quite good. Some are terrible. Both do fast and slow thinking. Some have access to tools. Some have nothing. And they both survive if they are a good fit for their application.
I find the technical details quite irrelevant here. That might be relevant if you want to discuss short term politics, priorities and applied ethics. Still, it looks like you’re approaching this with a lot of bias and probably a bunch of false premises.
BTW, I agree that quantum computing is BS.
Well, a LLM doesn’t think, right? It just generates text from left to right. Whereas I sometimes think for 5 minutes about what I know, what I can deduct from it, do calculations in my brain and carry one over… We’ve taught LLMs to write something down that resembles what a human with a thought process would write down. But it’s frequently gibberish or if I look at it it writes something down in the “reasoning”/“thinking” step and then does the opposite. Or omits steps and then proceeds to do them nonetheless or it’s the other way round. So it clearly doesn’t really do what it seems to do. It’s just a word the AI industry slapped on. It makes them perform some percent better, and that’s why they did it.
And I’m not a token generator. I can count the number of "R"s in the word “strawberry”. I can go back and revise the start of my text. I can learn in real-time and interacting with the world changes me. My brain is connected to eyes, ears, hands and feet, I can smell and taste… My brain can form abstract models of reality, try to generalize or make sense of what I’m faced with. I can come up with methods to extrapolate beyond what I know. I have goals in life, like pursue happiness. Sometimes things happen in my head which I can’t even put into words, I’m not even limited to language in form of words. So I think we’re very unalike.
You have a point in theory if we expand the concept a bit. An AI agent in form of an LLM plus a scratchpad is proven to be turing-complete. So that theoretical concept could do the same things a computer can do, or what I can do with logic. That theoretical form of AI doesn’t exist, though. That’s not what our current AI agents do. And there are probably more efficient ways to achieve the same thing than use an LLM.
Exactly what an LLM-agent would reply. 😉
I would say that the LLM-based agent thinks. And thinking is not only “steps of reasoning”, but also using external tools for RAG. Like searching the internet, utilizing relationship databases, interpreters and proof assistants.
You just described your subjective experience of thinking. And maybe a vauge definition of what thinking is. We all know this subjective representation of thinking/reasoning/decision-making is not a good representation of some objective reality (countless of psychological and cognitive experiments have demonstrated this). That you are not able to make sense of intermediate LLM reasoning steps does not say much (except just that). The important thing is that the agent is able to make use of it.
The LLM can for sure make abstract models of reality, generalize, create analogies and then extrapolate. One might even claim that’s a fundamental function of the transformer.
I would classify myself as a rather intuitive person. I have flashes of insight which I later have to “manually” prove/deduc (if acting on the intuition implies risk). My thought process is usually quite fuzzy and chaotic. I may very well follow a lead which turns out to be dead end, and by that infer something which might seem completely unrelated.
A likely more accurate organic/brain analogy would be that the LLM is a part of the frontal cortex. The LLM must exist as a component in a larger heterogeneous ecosystem. It doesn’t even have to be an LLM. Some kind of generative or inference engine that produce useful information which can then be modified and corrected by other more specialized components and also inserted into some feedback loop. The thing which makes people excited is the generating part. And everyone who takes AI or LLMs seriously understands that the LLM is just one but vital component of at truly “intelligent” system.
Defining intelligence is another related subject. My favorite general definition is “lossless compression”. And the only useful definition of general intelligence is: the opposite of narrow/specific intelligence (it does not say anything about how good the system is).