Noob experience using local LLM as a D&D style DM.

ThreeJawedChuck@sh.itjust.works · 6 days ago

Noob experience using local LLM as a D&D style DM.

brucethemoose@lemmy.world · edit-2 21 hours ago

Also, another suggestion would be to be careful with your sampling. Use a low temperature and high MinP for queries involving rules, higher temperature (+ samplers like DRY) when you’re trying to tease out interesting ideas.

I would even suggest an alt front end like mikupad that exposes token probabilities, so you can go to any point in the reply and look through every “idea” the LLM had internally (and regen from that point of you wish”). It’s also good for debugging sampling issues when you have an incorrect answer (as sometimes the LLM gets it right, but bad sampling parameters choose a bad answer).

ThreeJawedChuck@sh.itjust.works · 20 hours ago

Ah, great idea about the low temp for rules and high for creativity. I guess I can easily change it in the front end, although I also set the temp when I start the server, and I’m not sure which one takes priority. Hopefully the frontend does, so I can tweak it easily.

Also your post just got me thinking about the DRY sampler, which I’m using, but might be causing troubles for cases where the model legit should repeat itself, like an !inventory or !spells command. I might try to either disable it or add a custom break token, like the ! mark.

I think ST can show token probabilities, so I’ll try that too, thanks. I have so much to learn! I really should try other frontends though. ST is powerful in a lot of ways like dynamic management of the context, but there are other things I don’t like as much. It attaches a lot of info to a character that I don’t feel should be a property of a character. And all my D&D scenarios so far have been just me + 1 AI char, because even though ST has a “group chat” feature, I feel like it’s cumbersome and kind of annoying. It feels like the frontend was first designed around one AI char only, and then something got glued on to work around that limitation.

brucethemoose@lemmy.world · edit-2 18 hours ago

One one more thing, I saw you mention context management.

Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K (which is a huge body of text) and (with the right backend settings) you can easily run either at 64K with very minimal vram overhead.

Specifically, run Gemma with the latest llama.cpp server and comment (where it will automatically use sliding window attention as of like yesterday), or Qwen (and most other models) with exllamav2 or exllamav3, which quantizes the kv cache down to Q4 very efficiently.

This way you don’t need to manage context: you can feed the LLM the whole adventure so it doesn’t forget anything, and streaming responses will be instance since it’s always cached.

ThreeJawedChuck@sh.itjust.works · 8 minutes ago

Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K

It looks like the Harbinger RPG model I’m using (from Latitude Games) is based on Mistral 24B, so maybe it inherits that limitation? I like it in other ways. It was trained on RPG games, which seems to help it for my use case. I did try some general purpose / vanilla models and felt they were not as good at D&D type scenarios.

It looks like Latitude also has a 70B Wayfarer model. Maybe it would do better at bigger contexts. I have several networked machines with 40GB VRAM between all them, and I can just squeak I4Q_XS x 70B into that unholy assembly if I run 24000 context (before the SWA patch, so maybe more now). I will try it! The drawback is speed. 70B models are slow on my setup, about 8 t/s at startup.

brucethemoose@lemmy.world · 19 hours ago

Oh, one thing about ST specifically: its default sampling presets are catastrophic last I checked. Like, they’re designed for ancient models, and while I have nothing against the UI it is kinda from a different era.

For Gemma and Qwen, I’ve been using like 0.2-0.7 temp, at least 0.05 MinP, 1.01 rep penalty (not something insane like 1.1) and maybe 0.3-ish dry, though like you said dry/xtc can really mess up some tasks.

Noob experience using local LLM as a D&D style DM.

Noob experience using local LLM as a D&D style DM.

General Experience

The Good

The Bad

Thoughts about fantasy-adventure RP and LLMs