• 4 posts
  • 100 comments
Joined 1 month ago
Cake day: May 14th, 2026
  • I dunno what’s happened in the last week but there’s a Lemmy wide infestation of bots and spammers.

    I think what you suggested is the right way forward. If things keep a pace, every other post will be slop. We need a (very) low bar for them to trip over.

    Dunno what can be done about false reporting and brigading. Probably nothing. The saving grace of Lemmy is you can just hide / switch off vote visibility on client app.

    I remember when karma and vote buttons became a thing and I remember thinking “oh, great idea”. No, no it was not.

  • Ha. You were doing inference on CPU on a haswell era. Been there, done that.

    OTOH…whisper.cpp is heavily optimised for it.

    Plus, you’re doing batch transcription, not real-time, so slow doesn’t actually matter.

    Fire Whisper small or medium overnight and wake up to searchable text.

    PS: if you want a good fast little llm, something like Qwen 3.6 2B will work well on the Xeon.

  • Probably that plus a higher quant solves it. Thing is most of us default to Q4_K_M as “precise enough”… and that seems to be kryptonite for the new Qwen’s.

    That’s another thing with hosting AI that’s not often discussed. Sure, you can maybe run that 27B model…but if it’s at Q3_XS it’s going to be … “mentally challenged”.

    I’ve heard the Gemma models with QAT are meant to be near full precision at Q4 size. Haven’t tried em yet.

    Actually, on that topic - I’ve heard there’s a different architecture (RWKV), that’s supposed to be much more efficient for long context because it uses an entirely different KV system.

    Sadly, there are few RWKV native models and retraining a standard transformer to RWKV seems like a pain in the ass. I’d need to hire a cloud GPU, distill into a different architecture, mess with datasets … honestly ICBF.

  • Yeah, I’ve heard the B70 is good bang for buck. My kids love using chat GPT to generate images and I’m aware that there are some really capable local models that can do that as well now - B70 should make short work of it.

    That may be something for me to look at later on if I decide to keep self hosting.

    OTOH, I’m also aware that I may end up building something that they don’t actually use. Been there, done that, and I don’t want to do it again.

    Actually, on that topic, one interesting use case for me is my youngest one wants to have a YouTube channel.

    So obviously, I’m not going to let her become a YouTuber, but what I’m thinking of doing is providing her my old phone (properly locked down) so that she can video record clips of what she wants.

    Then - have those clips sent automatically to our jellyfin server so it appears like a channel. Code a fake YT plugin so that AI can do likes, positive comments etc.

    It’s… work. I dunno…maybe a good enough AI can vibe code the entire project for me.

  • Pretty simple. People keep going on about how useful these local models are for coding. So what I wanted to do was to create a standardized test for myself to see if that was true before committing to anything.

    ( I think the various benchmarks out there are a bit fluffy, so I wanted to try it against a real workload.)

    What I did was throw a bunch of money up at OpenRouter and then used Roo to call in diff models, one at a time.

    I gave each the same task - that is, here is a piece of code, here is my ticket, here is my repo. Investigate what you want and then do what my ticket says.

    I already knew what was wrong with the code, but I wanted to see how obedient the models are at sticking to a scoped ticket and what they would find.

    By far the best bang for buck was GPT 5.4 mini. It is exceptionally obedient at doing exactly what you tell it as long as you tell it exactly what to do.

    It won’t go off piste if properly constrained.

    I think for light - med workloads, $20 on ChatGPT is a crimal steal. Chat and Codex have a separate usage pool.

    I’m also aware that this is open AI’s lock in phase where they provide the samples of crack for free to get you hooked. And, yes, they are crack dealers in every sense of the word.

    Anyway, it’s good to know that with a little bit of elbow grease and some smarts, the smaller models, which could reasonably be self-hosted, could do a decent enough job if they are narrowly scoped.

    You’re probably not going to be able to yeet an entire code base at them and go “figure out what’s wrong and fix it” while you snooze tho, but I think that’s probably a good thing from a human in the middle perspective.