Activity - Unfortunately LLMs need a lot of VRAM. You could try using koboldcpp, it runs on...

Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

Fisch , 4 months ago (edited 4 months ago)

Unfortunately LLMs need a lot of VRAM. You could try using koboldcpp, it runs on the CPU but let's you offload layers onto the GPU. That way you might be able to stay withing those 4gb even with larger models.

Edit: I forgot to mention there's a fork of koboldcpp with rocm for AMD cards, which is about twice as fast if I remember correctly. Only relevant if you have an AMD card tho.

Edit 2: This is the model I use btw

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Federation

Status:

On | Off

Thread

hexual

@hexual@lemmy.world

Added: 4 months ago
Views: 25
Ratio: 0

Magazine

Technology

@technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed