shockingly enough, corporations always act in their own self-interest
- 0 Posts
- 26 Comments
it’s additional rules for the subreddit on top of the site wide rules for all of reddit
Q4 will give you like 98% of quality vs Q8 and like twice the speed + much longer context lengths.
If you don’t need the full context length, you can try loading the model at shorter context length, meaning you can load more layers on the GPU, meaning it will be faster.
And you can usually configure your inference engine to keep the model loaded at all times, so you’re not loosing so much time when you first start the model up.
Ollama attempts to dynamically load the right context lenght for your request, but in my experience that just results in really inconsistent and long time to first token.
The nice thing about vLLM is that your model is always loaded, so you don’t have to worry about that. But then again, it needs much more VRAM.
In my experience anything similar to qwen-2.5:32B comes closest to gpt-4o. I think it should run on your setup. the 14b model is alright too, but definitely inferior. Mistral Small 3 also seems really good. anything smaller is usually really dumb and I doubt it would work for you.
You could probably run some larger 70b models at a snails pace too.
Try the Deepseek R1 - qwen 32b distill, something like deepseek-r1:32b-qwen-distill-q4_K_M (name on ollama) or some finefune of it. It’ll be by far the smartest model you can run.
There are various fine tunes that remove some of the censorship (ablated/abliterated) or are optimized for RP, which might do better for your use case. But personally haven’t used them so I can’t promise anything.
DavidGarcia@feddit.nlto Fediverse@lemmy.world•Bad UX is keeping the majority of people away from LemmyEnglish4·2 months agoI don’t know, feddit.nl is pretty chill. I always see everything and barely anything objectable
DavidGarcia@feddit.nlto LocalLLaMA@sh.itjust.works•DeepSeek gives Europe's tech firms a chance to catch up in global AI raceEnglish6·2 months agoTrue, but the newest mistral model is already pretty great
DavidGarcia@feddit.nlto Fediverse@lemmy.world•Let's discuss how to efficiently promote Lemmy to potential new joinersEnglish31·3 months agolet’s just rebrand instances to superleddits and communities to subleddits 🤣
DavidGarcia@feddit.nlto Fediverse@lemmy.world•Let's discuss how to efficiently promote Lemmy to potential new joinersEnglish151·3 months ago“Reddit but you can block the part that annoys you”
DavidGarcia@feddit.nlto Fediverse@lemmy.world•Pixelfed user count has gone vertical.English6·3 months agodoes anyone know why this sudden uptick?
DavidGarcia@feddit.nlto Selfhosted@lemmy.world•Serve local IP stream to internet via local webserver/ websiteEnglish21·3 months agowhy would I want to stream myself peeing??
DavidGarcia@feddit.nlto Linux@lemmy.ml•I just distro hopped after using a distro almost a year. Is it normal?11·3 months agoEvery Linux user has to go through a period of compulsive distro hopping. Don’t worry, eventually you’ll grow tired of it and just settle on one workhorse distro.
DavidGarcia@feddit.nlto Linux@lemmy.ml•Perf Support For 2,048 CPU Cores Is Becoming Not Enough - Patches Bump Kernel Limit171·4 months agoHow long until CPUs and GPUs just merge into one thing
DavidGarcia@feddit.nlto PC Gaming@lemmy.ca•A Valve engineer fixed 3D lighting so hard he had to tell all the graphics card manufacturers their math was wrong, and the reaction was: 'I hate you'English64·4 months agoFixing anything in industry is like fighting a very big, very lazy elephant seal bull
DavidGarcia@feddit.nlto PC Gaming@lemmy.ca•AI PCs aren't driving sales — The need to upgrade from Windows 10 will drive 2025 laptop salesEnglish3·4 months agoThe only real advantage of local AI is privacy and that it’s much cheaper if you use it a lot.
The only consumer use case I see in the wild with some real momentum behind it is role play.
All the local AI communities I browse are 50% people trying to find usecases for it at their job (like me; unsuccessfully I might add) and 50% people interested in role play.
People will apparently spend thousands to jerk off to a soulless machine demon simulacrum shell of a human.
To be fair, I can see the appeal of local AI for video games, like RPGs. There is this really fun game called “Suck Up”, where you are a vampire trying to convince AI to let you inside their house. That is the one real “killer” application I see atm.
I personally see a lot of other useful usecases for local AI, but from my experience at work, I would estimste it will take another 5 years until any of it is anywhere near consumer ready.
DavidGarcia@feddit.nlto PC Gaming@lemmy.ca•AI PCs aren't driving sales — The need to upgrade from Windows 10 will drive 2025 laptop salesEnglish241·4 months agolocal AI is cool and all, but neither the hardware nor the models are really ready for your average consumer
DavidGarcia@feddit.nlto Selfhosted@lemmy.world•Building my own log aggregation and search serverEnglish512·5 months agowhy are you so interested in logs? are you like a lumberjack?
DavidGarcia@feddit.nlto PC Gaming@lemmy.ca•Sony says it should've gotten more feedback before launching Concord, but it isn't done with live service games despite 'a certain amount of risk'71·5 months agoI’m so glad to see X as a service fail miserably everywhere
DavidGarcia@feddit.nlto Linux@lemmy.ml•FSF is working on freedom in machine learning applications211·6 months agoIt is kind of interesting how open machine learning already is without much explicit advocacy for it.
It’s the only field I can think of where the open version is just a few months behind SOTA in all of IT.
Open training pipelines and open data are the only aspects that could still use improvements in ML, but there are plenty of projects that are near-SOTA and fully open.
ML is extremely open compared to consumer mobile or desktop apps that are always ~10 years behind SOTA
Congratulations 🤠🥾er
bad chart