Welcome! Thats really cool to hear that the post inspired you to get an LLM going on the laptop. What size Gemma 3 are you able to run like a 8b?
SmokeyDope
- 13 Posts
- 58 Comments
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•Latest release of kobold.cpp adds tts voice cloning support via OuteTTS, updates multimodal vision mmproj projectors for Qwen2.5 VLEnglish1·6 days agoI don’t know how old the last version of kobold you used was or what it looked like. The newest verion has a couple different web UI themes to pick from in settings, the basic one has more buttons for easier editing, the corpo one is pretty sleek for phones and tablets. They finally have a terminal mode if you are running on headless servers.
I love this answer! What a wonderful context for a generative model to bring people together and add positivity to daily life. Thank you very much for sharing.
Not the person you asked but I used kobold.cpp to generate images with SD models. it works as a okay introduction to image gen. Their wikihas everything you need to get it working
The most useful thing my LLM has done is help me with hobbyist computer coding projects and to ask advanced stem questions. I try to use my llm to parse code that im unfamiliar with and to understand how the functions translate to actual things happening. I give it an example of functioning code and ask it to adapt the logic a certain way to see how it goes about it. I have to parse a large very old legacy codebase written in many parts by different people of different skill so just being able to understand what block does what is a big win some days. Even if its solutions aren’t copy/paste ready I usually learn quite a lot just seeing what insights it can gleam from the problem. Actually I prefer when I have to clean it up because it feels like I still did something to refine and sculpt the logic in a way the llm cant.
I don’t want to be a stereotypical ‘vibe coder’ who copies and paste without being able to bug fix or understand the code their putting in. So I ask plenty of questions and read through its reasoning for thousands of words to understand the thought processes that lead to functioning changes. I try my best to understand the code and clean it up. It is nice to have a second brain help with initial boiler plating and piecing together general flow of logic.
I treat it like a teacher and an editor. But its got limits like any tool and needs a sweet spot of context, example, and circumstance for it to work out okay,
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Tips for getting Ollama to be useful with home assistant?English4·10 days agoHi! So heres the rundown.
You are going to need to be willing to learn how computer program services send text messages to eachother over open ports, how to call on a API in a programming script, and slowly piece together how to work with ollamas external API calling tool functions. Heres the documentation
Essentially you need to
-
learn how ollama external API works. How to send it text data using a basic program in python on an open port and recieve data back to put into a text file.
-
learn how to make that python program pull weather and time data from openweather
-
learn how to feed that weather and time data into ollama on an open port as part of a tool calling function. A tool call is a fancy system prompt that tells the model how to interface with the data in a well defined paratamized way. you say a keyword like get weather, it sends a request to your python program to get data from openweather and sends it back in way the llm is instructed to process.
Unless you are already a programmer who works with sending and recieving data over the internet to be processed, this is a non-trivial task that requires a lot of experimentation and getting your hands dirty with ports and coding languages. Im currently getting ready to delve into this myself so I know its all can feel overwhelming. Hope this helps.
-
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•Some updates on community changes and future goals (03-28-2025)English1·13 days agoI really appreciate the kind words swelter_spark thats very cool of you to say!
Keep being awesome :)
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•Latest release of kobold.cpp adds tts voice cloning support via OuteTTS, updates multimodal vision mmproj projectors for Qwen2.5 VLEnglish8·13 days agoWhat?? Nooo of course not. I mean who would even have things to awaken in the first place, that would be crazy am I right? Not you or me thats for sure. Anyways all im saying is that whatever happens between you and your local kobold stays between you two. Everyone is welcome to come now and join LocalLLaMA today!
Ooops, looks like someone acidentally dropped all these unrelated pictures under my comment just now. Oh well, better leave them all here in case they come back for them.
Would something like this interest you? Gemtext formatted to html is about as light weight as it gets. lots of automatic gemtext blog software on github that also formats and mirrors an html copy. Whenever a news page article gets rendered to gemtext through newswaffle it shrinks about 95-99% of the page size while keeping text intact. Let me know if you want some more information on gemini stuff.
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•Some updates on community changes and future goals (03-28-2025)English3·18 days agoThank you for the feedback, ill look into digital llama aesthetic. I updated the picture just now with a lot of unneeded linework removed to get it cleaner for small view. Think its a little better now.
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•M4 Max 128GB vs M1 Ultra 128GBEnglish4·19 days agoThe community consensus is that bandwith is key. An extra 300gb/s is a big bonus.
Your taking a small amount of risk with second hand but generally laptops will last a good long while and an extra thousand or so off for a better performing product is a decent deal IMO. Ive been using my second hand laptop for over a decade now, so I would probably be biased to take the dice roll on used after testing it out in person.
I hope it works out for you! Man I wish I could get my rig upgraded on the cheap but GPU market is insane. I wonder how the nvidia digits or and strix will compare to a Mac?
When I had my AMD GPU going the best way to get models running was kobold.cpp and using vulcan. The flag is like --usevulcan or something. Its way easier than getting a rocm fork working from source.
SmokeyDope@lemmy.worldMto LocalLLaMA@sh.itjust.works•Best model for programming?English61·19 days agoin your range 32b models work well give qwen coder a try
SmokeyDope@lemmy.worldto Fediverse@lemmy.world•I reckon this is the usage distribution of Lemmy servers that we'll end up with.English3·21 days agoI’m not sure the smaller Lemmy servers running on a pi or resident internet upload could handle the equal sizable fraction of .world . it would be interesting to calculate how many users total on Lemmy divided by how many public joinable instances and see the averages or which servers are forced to scale the most.
This would take away complete user freedom to choose the server they want which is controversial.
Looks promising, hope this ends up in an open source process that improved RAG type task.
This is so exciting! Glad to see mistral at it with more bangers.
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•Returning back to where it started with llama 3 8B. DeepHermes is a great for 8gb VRAM cardsEnglish3·29 days agoIve tried official Deepseek qwen 2.5 14b r1 distill and a few unofficial mistrals trained on R1 CoT. They are indeed pretty amazing and I found myself switching between a general purpose model and a thinking model regularly before this released.
DeepHermes is a thinking model family with R1 distill CoT that you can toggle between standard short output or spending a few thousand tokens thinking about a solution.
I found that pure thinking models are fantastic for asking certain kinds of problem solving questions, but awful at following system prompt changes for roleplay scenarios or adopting complex personality archetypes.
This let’s you have your cake and eat it too by letting CoT be optional while keeping regular system prompt capabilities.
The thousands of tokens spent thinking can get time consuming when you only getting 3t/s on the larger 24b models. So its important to choose between a direct answer or spend 5 minutes to let it really think. Its abilities are impressive even if it takes 300 seconds to fully think out a problem at 2.5t/s.
Thats why I am so happy the 8b model is pretty intelligent with CoT enabled so I can fit a thinking model entire in vram and its not dumb as rocks in knowledge base either. I’m getting 15-20t/s with 8b instead of 2.5-3t/s partially offloading a larger model. 6.4x speed inceease at the CoT is a huge W for my real life human time spent waiting for a complete output.
SmokeyDope@lemmy.worldOPMto LocalLLaMA@sh.itjust.works•DeepHermes Preview features swappable standard output to R1 distill CoT reasoning. Its kind of blowing my mind.English1·30 days agoI think the idea of calling multiple different kinds of ways to for llms to ‘process’ a given input in a standard way is promising.
I feel that after reasoning we will train models how to think emotionally in a more intricate way. By combining reasoning with a more advanced sense of individuality and greater emotions simulation we may get a little closer to finding a breakthrough.
If you are asking questions try out deephermes finetune of llama 3.1 8b and turn on CoT reasoning with the special system prompt.
Tap for spoiler
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
It really helps the smaller models come up with nicer answers but takes them a little more time to bake an answer with the thinking part. Its unreal how good models have come in a year thanks to leveraging reasoning in context space.