• 13 Posts
  • 19 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle
  • Thanks for sharing your nice project ThreeJawedChuck!

    I feel like a little bit of prompt engineering would go a long way.

    To explain, a models base personality tends to be aligned into the “ai chat assistant” archetype. Models are encouraged to be positive yes-men with the goal of assisting the user with goals and pleasing them with pleasantry in the process.

    They do not need to be this way though. By using system prompts you may directly instruct the model to alter its personality or directly instruct it on how to structure things. In this relevant context tell it something like

    "You are a dungeon master with the primary goal of weaving an interesting and coherent story in the ‘dungeons and dragons’ universe. Your secondary goal is ensuring game rules are generally followed correctly.

    You are not a yes-man. You are dominant and in control of the situation.You may argue and challenge users as needed when negotiating game actions.

    Your players want a believable and grounded setting without falling into the tropes of main character syndrome or becoming Mary Sues. Make sure that their adventures remain grounded and the world their characters live in remains largely indifferent to their existance."

    This eats into a little bit of context but should change things up a little.

    You may make the model more creative and outlandish or more rigid and predictable by adjusting sampler settings.

    Consider finding a PDF or an epub of an old DND manual, convert to text, and put into your engines rag system so it can directly reference DND rules.

    Be wary of context limits. No matter what model makers tell you, 16-32k is a reasonable limit to expect when it comes to models keeping coherent track of things. A good idea is to keep track of important information you dont want the model to forget in a text file and give it a refresher on relevant context when it starts getting a little confused about who did what.

    Chain of Thought reasoning models may also give an edge when it comes to thinking deeper about the story and how its put together interaction wise. But as a downside they take some extra time and compute to think about things.

    I never tried silly tavern but know its meant for roleplaying with character cards. I always recommend kobold since I know most about it but theres more than one way to do things.


  • Im not really into politics enough to say much about accuracy of labels. Ive been on lemmy long enough to see many arguments and debates between political people about what being ‘anarchist’ or ‘leftist’ or ‘socialist’ really means, writing five paragraph heated essays back and forth over what labels properly define what concepts, so on and so forth.

    It seems political bias are one of those things nobody can really agree with because it boils down to semantic/linguistic arguments for redefining ultimately arbitrary drawn labels. Also arguers being somewhat emotional about it since its largely dealing with subjective beliefs, and their own societal identity politics over which groups they want to be seen/grouped based off their existing beliefs, which can cause some level of mental gymnastics.

    Its a whole can of worms that is a nightmare to navigate semantically in a traditional sense, let alone try to mathematically analyze through plotting data in matrixes and extract range values. I cant imagine how much of a headache it would be to figure out a numerical chart rating ‘political bias’ as a 0-100 percentile number in a chart like UGI. Political minded people cant really agree on what terms really mean, so the data scientist people trying to objectively analyze this stuff for llm benchmarking get shafted when trying to figure out a concrete measurement system. The 12Axes test they use is kind of interesting to read through in itself


  • From what Ive seen in arguments about this, Plex generally is more accessible with QoL and easier to understand interface for non-techie people to share with family/friends. Something thats hard for nerdy people to understand is that average people are perfectly fine paying for digital goods and services. An older well off normie has far more money than sense and will happily pay premiums just to not have to rub two braincells together with setup or for a nicer quality of experience. If you figure out how to make a very useful plug-an-play service that works without the end user of average intelligence/domain knowledge stressing about how to set up, maintain, and navigate confusing layouts, you’ve created digital gold.

    This isn’t the fault of open source services you can only expect so much polish from non-profit voulenteer. Its just the nature of consumer laziness/expectation for professional product standards and the path/product of least resistance.






  • they don’t really write the model is having an Aha moment due to some insight it had.

    Well, they really can’t write it that way because it would imply the model is capable of insight which is a function of higher cognition. That path leads to questioning if machine learning neural networks are capable of any real sparks of sapience or sentience. Thats a ‘UGI’ conversation most people absolutely don’t want to have at this point for various practical, philosophical, and religous/spiritual implications.

    So you can’t just outright say it, especially not in an academic STEM paper. Science academia has a hard bias against the implication of anything metaphysical or overly abstract at best they will say it ‘simulates some cognative aspects of intelligence’.

    In my own experience, the model at least says 'ah! aha! Right, right, right, so…` when it thinks it has had an insight of some kind. Whether or not models are truly capable of such thing or is merely some statistical text prediction artifact is a subjective discussion of philosophy kind of like a computer scientist nerds version of the deterministic philosophical zombie arguments.

    Thanks for sharing the video! I havent seen computerphile in a while, will take a look especially with that title. Gotta learn about dat forbidden computation :)


  • Good engineers are figuring out more energy/compute efficient ways to train models all the time. Part of the original deepseek hype was that they not only cooked a competitive model but did it with the fraction of energy/compute needed by their competion. On the local hosting side computer hardware isalso getting more energy efficient over time not only do graphics cards improve in speed but also they slowly reduce the amount of power needed for the compute.

    AI is a waste of energy

    It depends on where that energy is coming from, how that energy is used, and the bias of the person judging its usage. When the energy comes from renewable resources without burning more emmisions into the air, and computation used actually results in useful work being done to improve peoples daily lives I argue its worth the watt hours. Espcially in local context with devices that take less power than a kitchen appliance for inferencing.

    Greedy programmer type tech bros without a shred of respect for human creativity bragging about models taking away artist jobs couldn’t create something with the purpose of helping anyone but themselves if their life depended on it. But society does run on software stacks and databases they create, so it can be argued llms spitting out functioning code and acting as local stack exchange is useful enough but that also gives birth to vibe coders who overly rely without being able to think for themselves.

    Besides the loudmouth silicon valley inhabitors though, Theres real work being done in private sectors you and I probably dont know about.

    My local college is researching the use of vision/image based models to examine billions of cancer cells to potentially identify new subtle patterns for screening. Is cancer research a waste of energy?

    I would one day like to prototype a way to make smart glasses useful for blind people by having a multimodal model look through the camera for them and transmit a description of what it sees through braille vibration pulses. Is prototyping accessibility tools for the disabled a waste of energy?

    trying to downplay this cancer on society is dangerous

    “Cancer on society” is hyperbole that reveals youre coming at us from a place of emotional antagonism. Its a tool, one with great potential if its used right. That responsibility is on us to make sure it gets used right. Right now its an expensive tool to create which is the biggest problem but

    1. Once its trained/ created it can be copied and shared indefinitely potentially for many thousands of years on the right mediums or with tradition.

    2. Trsining methods will improve efficiency wise through improvements to computational strategy or better materials used.

    3. As far as using and hosting the tool on the local level the same power draw as whatever device you use from a phone to a gaming desktop.

    In a slightly better timeline where people cared more about helping eachother than growing their own wealth and american mega corporations were held at least a little accountable by real government oversight then companies like meta/openAI would have gotten a real handslap for stealing copyright infringed data to train the original models and the tech bros would be interested in making real tools to help people in an energy efficient way.

    ai hit a wall

    Yes and no. Increasing parameter size past the current biggest models seems to not have big benchmark omprovements though there may be more subtle improvements in abilities not caputured with the test.

    The only really guilty of throwing energy and parameters at the wall hoping something would stick is meta with the latest llama4 release. Everyone else has sidestepped this by improving models with better fine tuning datasets, baking in chain of thought reasoning, multi modality (vision, hearing, text all in one). Theres still so many improvements being made in other ways even if just throwing parameters eventual peters out like a Moore’s law.

    The world burned long before AI and even computers, and it will continue to burn long after. Most people are excessive, selfish, and wasteful by nature. Islands of trash in the ocean, the ozone layer being nearly destroyed for refigerantd and hair sprays, the icecaps melting, god knows how many tons of oil burned on cars or spilled in the oceans.

    Political environmentalist have done the math on just how much carbon, water, and materials were spent on every process born since the industrial revolutions. Spoilers, none of the numbers are good. Model training is just the latest thing to grasp onto for these kinds of people to play blame games with.


  • Thanks for the suggestion and sharing your take :)

    the ‘okay…’ ‘hold on,’ ‘wait…’ let me read over it again…’ is part of deepseeks particular reasoning patterns, I think the aha moments are more of an emergent expression that it sometimes does for whatever reason than part of an intended step of the reasoning pattern but I could be mistaken. I find that lower parameter models will suffer from not being able to follow its own reasoning patterns and quickly confuses itself into wrong answers/hallucinations. Its a shame because 8B models are fast and able to fit on a lot of GPUS but they just cant handle keeping track of all the reasoning across many thousands of tokens.

    The best luck ive had was with bigger models trained on the reasoning patterns that can also switch them on and off by toggling the <think> tag. My go to model the past two months has been deephermes 22b which is toggleable deepseek reasoning baked into mistral 3 small, its able to properly bake answers to tough logical questions putting its reasoning together properly after thinking for many many tokens. Its not perfect but it gets things right more than it gets them wrong which is a huge step up.




  • I volunteer as developer for a decade old open source project. A sizable amount of my contribution is just cooking up decent documentation or re-writting old doc from the original module authors written close to a decade ago because it failed me information wise when I needed it. Programmers as it turns out are very ‘eh, the code should explain itself to anyone with enough brains to look at it’ type of people so lost in the sauce of being hyperfluent tech nerds instantly understanding all variables, functions, parameters, and syntax at very first glance at source code, that they forgot the need for re-translation into regular human speak for people of varying intelligence/skill levels who can barely navigate the command line.


  • Theres a very vocal subset of the ai-hater Lemmy population that thinks

    1. the only machine learning models are ones made by mgacorporations like facebook ans openai using stolen internet data
    2. model creators in 2025 are still using stolen scraped unfiltered internet data for training datasets

    Theres plenty of models trained on completely open public domain information and released under a permissive license. This isnt the era of tayAI twitter garbage fed sloppo models anymore. All the newest models are trained on 90% synthetic data, 10% RFHL done by contracted out educators with degrees making a quick buck through easy remote work.

    But that doesnt matter to the emotionally and political charged Lemmy leftist with liberal arts degrees who dont care to understand the realities behind machine learning.

    No, the modern AI bubble begins and ends for them with their art being stolen by facebook/meta without so much as a handslap by the govt then having stablediffusion rubbed in their face automation threatening their livelyhood by smug greedy tech bros without a shred of respect for human creativity.

    So in retaliation, the Lemmings throw tantrums in the comments of all ai gen post babbling about how the newest batch of didital computer tools to cut down manual work is destroying everything, and clutch on to the venence fantasy they can still ‘poison the AI that stole my work!’ By saying the magic words like a SCP cognitohazard.

    The reality is the only one still scraping your slop is ad sellers and big brother, while the only human data being fed into modern chatgpt is from someone with an associates degree in an academic field.

    Ive chosen to allow the comment to stay in this scenario as I dont believe in censorship especially if the post isnt against stated guidelines. I am against fostering echo chambers.

    However, c/localllama was always intended to be a small island of safe space for ML enthusiast to talk and share the hobby in a positive construtive way without fear of being attacked/shit on by the general Lemmy population who just dont get what we do here except that we support ‘AI’. Haters who dont understand can go to literally any other community to circlejerk without pushback, I think a few fuckAI communities exist just for that purpose. So If these kind of cloak-and-dagger wink wink nudge nudge antagonistic comments about ‘poisoning teh AI!’ become more common I’ll update guidelines and start enforcing them appropriately.



  • No worries I have my achthually… Moments as well. Though here’s a counter perspective. The bytes have to be pulled out of abstraction space and actually mapped to a physical medium capable of storing huge amounts of informational states like a hard drive. It takes genius STEM engineer level human cognition and lots of compute power to create a dataset like WA. This makes the physical devices housing the database unique almost one of a kind objects with immense potential value from a business and consumer use. How much would a wealthy competing business owner pay for a drive containing such lucrative trade secrets assuming its not leaked? Probably more than a comparative weighed brick of gold, but that’s just fun speculation.



  • Models running on gguf should all work with your gpu assuming its set up correctly and properly loaded into the vram. It shouldnt matter if its qwen or mistral or gemma or llama or llava or stable diffusion. Maybe the engine you are using isnt properly configured to use your arc card so its all just running on your regular ram which limits things? Idk.

    Intel arc gpu might work with kobold and vulcan without any extra technical setup. Its not as deep in the rabbit hole as you may think, a lot of work was put in to making one click executables with nice guis that the average person can work with…

    Models

    Find a bartowlski made quantized gguf of the model you want to use. Q4_km is recommended average quant to try first. Try to make sure it all can fit within your card size wise for speed. Shouldnt be a big problem for you with 20gb vram to play with. Hugging face gives the size in gb next to each quant.

    Start small with like high quant of qwen 3 8b. Then a gemma 12b, then work your way up to a medium quant of deephermes 24b.

    Thinking models are better at math and logical problem solving. But you need to know how to communicate and work with llms to get good results no matter what. Ask it to break down a problem you already solved and test it for comprehension.

    kobold engine

    Download kobold.cpp, execute it like a regular program and adjust settings in graphical interface that pops up. Or make a startup script with flags.

    For input processing library, see if Vulcan processing works with Intel arc. Make sure flash attention is enabled too. Offload all layers of the model I make note of exactly how many layers each model has during startup and specify it but it should figure it out smartly even if not.


  • SmokeyDope@lemmy.worldMtoLocalLLaMA@sh.itjust.worksSpecialize LLM
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    28 days ago

    I would receommend you read over the work of the person who finetuned a mistral model on many us army field guides to understand what fine tuning on a lot of books to bake in knowledge looks like.

    If you are a newbie just learning how this technology works I would suggest trying to get RAG working with a small model and one or two books converted to a big text file just to see how it works. Because its cheap/free t9 just do some tool calling and fill up a models context.

    Once you have a little more experience and if you are financially well off to the point 1-2 thousand dollars to train a model is who-cares whatever play money to you then go for finetuning.


  • It is indeed possible! The nerd speak for what you want to do is ‘finetune training with a dataset’ the dataset being your books. Its a non-trivial task that takes setup and money to pay a training provider to use their compute. There are no gaurentees it will come out the way you want on first bake either.

    A soft version of this thats the big talk right now is RAG which is essentially a way for your llm to call and reference an external dataset to recall information into its active context. Its a useful tool worth looking into much easier and cheaper than model training but while your model can recall information with RAG it won’t really be able to build an internal understanding of that information within its abstraction space. Like being able to recall a piece of information vs internally understanding the concepts its trying to convey. RAG is for wrote memorization, training is for deeper abstraction space mapping





  • Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.

    Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.

    LLM wise Your phone can run 1-4b models, Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.

    Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.

    Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.