Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I’d consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I’ve really tried Kimi, GLM, and Qwen3’s larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    4 months ago

    Uh, I’m really unsure about the engineering task of a few years, if the solution is quantum computers. As of today, they’re fairly small. And scaling them to a usable size is the next science-fiction task. The groundworks hadn’t been done yet and to my knowledge it’s still totally unclear whether quantum computers can even be built at that scale. But sure, if humanity develops vastly superior computers, a lot of tasks are going to get easier and more approachable.

    The stochastical parrot argument is nonsense IMO. Maths is just a method. Our brains and entire physics abide by math. And sure, AI is maths as well with the difference that we invented it. But I don’t think it tells us anything.

    And with the goal, I think that’s about how AlphaGo has the goal to win Go tournaments. The hypothetical paperclip-maximizer has the goal of maximizing the paperclip production… And an LLM doesn’t really have any real-world goal. It just generates a next token so it looks like legible text. And then we embed it into some pipeline but it wasn’t ever trained to achieve the thing we use it for, whatever it might be. That’s just a happy accident if a task can be achieved by clever mimickry, and a prompt which simply tells it - pretend you’re good at XY.

    I think it’d probably be better if a customer service bot was trained to want to provide good support. Or a chatbot like ChatGPT to give factual answers. But that’s not what we do. It’s not designed to do that.

    I guess you’re right. Many aspects of AI boil down to how much compute we have available. And generalization and extrapolating past their training datasets has always been an issue with AI. They’re mainly good at interpolating, but we want them to do both. I need to learn a bit more about neural networks. I’m not sure where the limitations are. You said it’s a practical constrain. But is that really true for all neural networks? It sure is for LLMs and transformer models because they need terabytes of text being fed in on training, and that’s prohibitively expensive. But I suppose that’s mainly due to their architecture?! I mean backpropagation and all the maths required to modify the model weights is some extra work. But does it have to be so much that we just can’t do it while deployed with any neural networks?

    • snikta@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      4 months ago

      How are humans different from LLMs under RL/genetics? To me, they both look like token generators with a fitness. Some are quite good. Some are terrible. Both do fast and slow thinking. Some have access to tools. Some have nothing. And they both survive if they are a good fit for their application.

      I find the technical details quite irrelevant here. That might be relevant if you want to discuss short term politics, priorities and applied ethics. Still, it looks like you’re approaching this with a lot of bias and probably a bunch of false premises.

      BTW, I agree that quantum computing is BS.

      • hendrik@palaver.p3x.de
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        4 months ago

        Well, a LLM doesn’t think, right? It just generates text from left to right. Whereas I sometimes think for 5 minutes about what I know, what I can deduct from it, do calculations in my brain and carry one over… We’ve taught LLMs to write something down that resembles what a human with a thought process would write down. But it’s frequently gibberish or if I look at it it writes something down in the “reasoning”/“thinking” step and then does the opposite. Or omits steps and then proceeds to do them nonetheless or it’s the other way round. So it clearly doesn’t really do what it seems to do. It’s just a word the AI industry slapped on. It makes them perform some percent better, and that’s why they did it.

        And I’m not a token generator. I can count the number of "R"s in the word “strawberry”. I can go back and revise the start of my text. I can learn in real-time and interacting with the world changes me. My brain is connected to eyes, ears, hands and feet, I can smell and taste… My brain can form abstract models of reality, try to generalize or make sense of what I’m faced with. I can come up with methods to extrapolate beyond what I know. I have goals in life, like pursue happiness. Sometimes things happen in my head which I can’t even put into words, I’m not even limited to language in form of words. So I think we’re very unalike.

        You have a point in theory if we expand the concept a bit. An AI agent in form of an LLM plus a scratchpad is proven to be turing-complete. So that theoretical concept could do the same things a computer can do, or what I can do with logic. That theoretical form of AI doesn’t exist, though. That’s not what our current AI agents do. And there are probably more efficient ways to achieve the same thing than use an LLM.

        • snikta@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          4 months ago

          Exactly what an LLM-agent would reply. 😉

          I would say that the LLM-based agent thinks. And thinking is not only “steps of reasoning”, but also using external tools for RAG. Like searching the internet, utilizing relationship databases, interpreters and proof assistants.

          You just described your subjective experience of thinking. And maybe a vauge definition of what thinking is. We all know this subjective representation of thinking/reasoning/decision-making is not a good representation of some objective reality (countless of psychological and cognitive experiments have demonstrated this). That you are not able to make sense of intermediate LLM reasoning steps does not say much (except just that). The important thing is that the agent is able to make use of it.

          The LLM can for sure make abstract models of reality, generalize, create analogies and then extrapolate. One might even claim that’s a fundamental function of the transformer.

          I would classify myself as a rather intuitive person. I have flashes of insight which I later have to “manually” prove/deduc (if acting on the intuition implies risk). My thought process is usually quite fuzzy and chaotic. I may very well follow a lead which turns out to be dead end, and by that infer something which might seem completely unrelated.

          A likely more accurate organic/brain analogy would be that the LLM is a part of the frontal cortex. The LLM must exist as a component in a larger heterogeneous ecosystem. It doesn’t even have to be an LLM. Some kind of generative or inference engine that produce useful information which can then be modified and corrected by other more specialized components and also inserted into some feedback loop. The thing which makes people excited is the generating part. And everyone who takes AI or LLMs seriously understands that the LLM is just one but vital component of at truly “intelligent” system.

          Defining intelligence is another related subject. My favorite general definition is “lossless compression”. And the only useful definition of general intelligence is: the opposite of narrow/specific intelligence (it does not say anything about how good the system is).