Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 3 days ago

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Saterz@lemmy.world · 1 day ago

Well it is a 9B model after all. Self hosted models become a minimum “intelligent” at 16B parameters. For context the models ran in Google servers are close to 300B parameters models

SuspciousCarrot78@lemmy.world · edit-2 39 minutes ago

Not sure how we’re quantifying intelligence here. Benchmarks?

Qwen3-4B 2507 Instruct (4B) outperforms GPT-4.1 nano (7B) on all stated benchmarks. It outperforms GPT-4.1 mini (~27B according to scuttlebutt) on mathematical and logical reasoning benchmarks, but loses (barely) on instruction-following and knowledge benchmarks. It outperforms GPT-4o (~200B) on a few specific domains (math, creative writing), but loses overall (because of course it would). The abliterated cooks of it are stronger yet in a few specific areas too.

https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF

So, in that instance, a 4B > 7B (globally), 27B (significantly) and 200-500B(?) situationally. I’m pretty sure there are other SLMs that achieve this too, now (IBM Granite series, Nanbiege, Nemotron etc)

It sort of wild to think that 2024 SOTA is ~ ‘strong’ 4-12B these days.

I think (believe) that we’re sort of getting to the point where the next step forward is going to be “densification” and/or architecture shift (maybe M$ can finally pull their finger out and release the promised 1.58 bit next step architectures).

ICBW / IANAE

Appoxo@lemmy.dbzer0.com · 20 hours ago

Any source for that info? Seems important to know and assert the quality, no?

Saterz@lemmy.world · 9 hours ago

Here:

https://www.sitepoint.com/local-llms-complete-guide/

https://www.hardware-corner.net/running-llms-locally-introduction/

https://travis.media/blog/ai-model-parameters-explained/

https://claude.ai/public/artifacts/0ecdfb83-807b-4481-8456-8605d48a356c

https://labelyourdata.com/articles/llm-fine-tuning/llm-model-size

https://medium.com/@prashantramnyc/understanding-parameters-context-size-tokens-temperature-shots-cot-prompts-gsm8k-mmlu-4bafa9566652

To find them it only required a web search using the query local llm parameters and number of params of cloud models on DuckDuckGo.

Edit: formatting

Appoxo@lemmy.dbzer0.com · 6 hours ago

Appreciated. Very much appreciated!

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Opper