Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

Fisch ,
@Fisch@discuss.tchncs.de avatar

What I'm using is Text Generation WebUI with an 11B GGUF model from Huggingface. I offloaded all layers to the GPU, which uses about 9GB of VRAM. With GGUF models, you can choose how many layers to offload to the GPU, so it uses less VRAM. Layers that aren't offloaded use system RAM and the CPU, which will be slower.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • selfhosted@lemmy.world
  • incremental_games
  • random
  • meta
  • All magazines