chasing0entropy a day ago

I also use self-hosted LLMs. You can make three GTX 1080s run a 7b model competently at limited context through ollama. Get a little more bold with LM studio and you can actually make a coherent and sort of reliable model.

keyle a day ago

on macOS if you opted for 32GB you can run a GPT4-oss model with LMStudio really easily.

It's "good enough" for a lot of questions and doesn't go up and down like a yoyo (OpenAI dashboard lies)