Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

Zworf ,

Hmmm weird. I have a 4090 / Ryzen 5800X3D and 64GB and it runs really well. Admittedly it's the 8B model because the intermediate sizes aren't out yet and 70B simply won't fly on a single GPU.

But it really screams. Much faster than I can read. PS: Ollama is just llama.cpp under the hood.

Edit: Ah, wait, I know what's going wrong here. The 22B parameter model is probably too big for your VRAM. Then it gets extremely slow yes.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@beehaw.org
  • incremental_games
  • meta
  • All magazines