• 13 Posts
  • 58 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle
  • If you are asking questions try out deephermes finetune of llama 3.1 8b and turn on CoT reasoning with the special system prompt.

    Tap for spoiler

    You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

    It really helps the smaller models come up with nicer answers but takes them a little more time to bake an answer with the thinking part. Its unreal how good models have come in a year thanks to leveraging reasoning in context space.







  • The most useful thing my LLM has done is help me with hobbyist computer coding projects and to ask advanced stem questions. I try to use my llm to parse code that im unfamiliar with and to understand how the functions translate to actual things happening. I give it an example of functioning code and ask it to adapt the logic a certain way to see how it goes about it. I have to parse a large very old legacy codebase written in many parts by different people of different skill so just being able to understand what block does what is a big win some days. Even if its solutions aren’t copy/paste ready I usually learn quite a lot just seeing what insights it can gleam from the problem. Actually I prefer when I have to clean it up because it feels like I still did something to refine and sculpt the logic in a way the llm cant.

    I don’t want to be a stereotypical ‘vibe coder’ who copies and paste without being able to bug fix or understand the code their putting in. So I ask plenty of questions and read through its reasoning for thousands of words to understand the thought processes that lead to functioning changes. I try my best to understand the code and clean it up. It is nice to have a second brain help with initial boiler plating and piecing together general flow of logic.

    I treat it like a teacher and an editor. But its got limits like any tool and needs a sweet spot of context, example, and circumstance for it to work out okay,



  • Hi! So heres the rundown.

    You are going to need to be willing to learn how computer program services send text messages to eachother over open ports, how to call on a API in a programming script, and slowly piece together how to work with ollamas external API calling tool functions. Heres the documentation

    Essentially you need to

    1. learn how ollama external API works. How to send it text data using a basic program in python on an open port and recieve data back to put into a text file.

    2. learn how to make that python program pull weather and time data from openweather

    3. learn how to feed that weather and time data into ollama on an open port as part of a tool calling function. A tool call is a fancy system prompt that tells the model how to interface with the data in a well defined paratamized way. you say a keyword like get weather, it sends a request to your python program to get data from openweather and sends it back in way the llm is instructed to process.

    example: https://medium.com/@ismalinggazein/openai-function-calling-integrating-external-weather-api-6935e5d701d3

    Unless you are already a programmer who works with sending and recieving data over the internet to be processed, this is a non-trivial task that requires a lot of experimentation and getting your hands dirty with ports and coding languages. Im currently getting ready to delve into this myself so I know its all can feel overwhelming. Hope this helps.






  • SmokeyDope@lemmy.worldtoSelfhosted@lemmy.worldlightweight blog ?
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    18 days ago

    Would something like this interest you? Gemtext formatted to html is about as light weight as it gets. lots of automatic gemtext blog software on github that also formats and mirrors an html copy. Whenever a news page article gets rendered to gemtext through newswaffle it shrinks about 95-99% of the page size while keeping text intact. Let me know if you want some more information on gemini stuff.





  • The community consensus is that bandwith is key. An extra 300gb/s is a big bonus.

    Your taking a small amount of risk with second hand but generally laptops will last a good long while and an extra thousand or so off for a better performing product is a decent deal IMO. Ive been using my second hand laptop for over a decade now, so I would probably be biased to take the dice roll on used after testing it out in person.

    I hope it works out for you! Man I wish I could get my rig upgraded on the cheap but GPU market is insane. I wonder how the nvidia digits or and strix will compare to a Mac?







  • Ive tried official Deepseek qwen 2.5 14b r1 distill and a few unofficial mistrals trained on R1 CoT. They are indeed pretty amazing and I found myself switching between a general purpose model and a thinking model regularly before this released.

    DeepHermes is a thinking model family with R1 distill CoT that you can toggle between standard short output or spending a few thousand tokens thinking about a solution.

    I found that pure thinking models are fantastic for asking certain kinds of problem solving questions, but awful at following system prompt changes for roleplay scenarios or adopting complex personality archetypes.

    This let’s you have your cake and eat it too by letting CoT be optional while keeping regular system prompt capabilities.

    The thousands of tokens spent thinking can get time consuming when you only getting 3t/s on the larger 24b models. So its important to choose between a direct answer or spend 5 minutes to let it really think. Its abilities are impressive even if it takes 300 seconds to fully think out a problem at 2.5t/s.

    Thats why I am so happy the 8b model is pretty intelligent with CoT enabled so I can fit a thinking model entire in vram and its not dumb as rocks in knowledge base either. I’m getting 15-20t/s with 8b instead of 2.5-3t/s partially offloading a larger model. 6.4x speed inceease at the CoT is a huge W for my real life human time spent waiting for a complete output.