How much gpu do i need to run a 90b model

11 days ago

How much gpu do i need to run a 90b model

Sylovik@lemmy.world · 11 days ago

In case of LLM’s you should look at AirLLM. I suppose there is no conviniet integrations to local chat tools, but issue at Ollama already started.

11 days ago

That looks like exactly the sort of thing i want. Any existing solution to get it to behave like an ollama instance (i have a bunch of services pointed at an ollama run on docker).

Sylovik@lemmy.world · 10 days ago

You may try Harbor. The description claims to provide an OpenAI-compatible API.

red@lemmy.zip · 10 days ago

this is useless, llama.cpp already does that airllm does (offloading to CPU) but its actually faster. so just use ollama