@SuspiciousCarrot78

SuspiciousCarrot78@aussie.zone · 3 hours

Yeah. Though I think theres a new strix out soon (Medusa? Gorgon? Something like that).

Its a bit like my P40. On paper, it has 24GB. But that 24gb is capped at 400GB/s and the ai compute is what…Pascal era?

AI = Good, fast, cheap - pick 2

SuspiciousCarrot78@aussie.zone · 3 hours

Do you mean Sonnet 4.5?

I don’t have the rig to run it at real speeds but I’ve played with it over API. Seems pretty good.

SuspiciousCarrot78@aussie.zone · 3 hours

I dunno what’s happened in the last week but there’s a Lemmy wide infestation of bots and spammers.

I think what you suggested is the right way forward. If things keep a pace, every other post will be slop. We need a (very) low bar for them to trip over.

Dunno what can be done about false reporting and brigading. Probably nothing. The saving grace of Lemmy is you can just hide / switch off vote visibility on client app.

I remember when karma and vote buttons became a thing and I remember thinking “oh, great idea”. No, no it was not.

SuspiciousCarrot78@aussie.zone · 4 hours

LOL.

https://en.wikipedia.org/wiki/Flowers_for_Algernon

Looks like someone got big mad over a harmless, good natured and on topic joke. You love to see it.

Sorry they wasted your time.

SuspiciousCarrot78@aussie.zone · 6 hours

Damn - I thought strix would do a bit better than that, for how much it costs.

SuspiciousCarrot78@aussie.zone · 8 hours

Ha. You were doing inference on CPU on a haswell era. Been there, done that.

OTOH…whisper.cpp is heavily optimised for it.

Plus, you’re doing batch transcription, not real-time, so slow doesn’t actually matter.

Fire Whisper small or medium overnight and wake up to searchable text.

PS: if you want a good fast little llm, something like Qwen 3.6 2B will work well on the Xeon.

SuspiciousCarrot78@aussie.zone · 9 hours

How ancient is ancient? TTS and STT are much lighter than llm. (eg: Whisper, Piper, Kokoro, Coqui etc)…you might have more capability than you think, especially if you’re doing batch processing like that.

SuspiciousCarrot78@aussie.zone · 9 hours

What sort of tok/s are you getting on the strix?

SuspiciousCarrot78@aussie.zone · 11 hours

You need to read the room, dude. Look at the current most active discussions.

SuspiciousCarrot78@aussie.zone · 12 hours

That’s impressive and probably within reach of most serious home labs.

I quite like MiMo and I agree with your assessment of its capability.

SuspiciousCarrot78@aussie.zone · 12 hours

Oh…i recognise this sickness :)

SuspiciousCarrot78@aussie.zone · 13 hours

Llama.cpp or death!

SuspiciousCarrot78@aussie.zone · 12 hours

Probably that plus a higher quant solves it. Thing is most of us default to Q4_K_M as “precise enough”… and that seems to be kryptonite for the new Qwen’s.

That’s another thing with hosting AI that’s not often discussed. Sure, you can maybe run that 27B model…but if it’s at Q3_XS it’s going to be … “mentally challenged”.

I’ve heard the Gemma models with QAT are meant to be near full precision at Q4 size. Haven’t tried em yet.

Actually, on that topic - I’ve heard there’s a different architecture (RWKV), that’s supposed to be much more efficient for long context because it uses an entirely different KV system.

Sadly, there are few RWKV native models and retraining a standard transformer to RWKV seems like a pain in the ass. I’d need to hire a cloud GPU, distill into a different architecture, mess with datasets … honestly ICBF.

SuspiciousCarrot78@aussie.zone · 13 hours

Yeah, I’ve heard the B70 is good bang for buck. My kids love using chat GPT to generate images and I’m aware that there are some really capable local models that can do that as well now - B70 should make short work of it.

That may be something for me to look at later on if I decide to keep self hosting.

OTOH, I’m also aware that I may end up building something that they don’t actually use. Been there, done that, and I don’t want to do it again.

Actually, on that topic, one interesting use case for me is my youngest one wants to have a YouTube channel.

So obviously, I’m not going to let her become a YouTuber, but what I’m thinking of doing is providing her my old phone (properly locked down) so that she can video record clips of what she wants.

Then - have those clips sent automatically to our jellyfin server so it appears like a channel. Code a fake YT plugin so that AI can do likes, positive comments etc.

It’s… work. I dunno…maybe a good enough AI can vibe code the entire project for me.

SuspiciousCarrot78@aussie.zone · 13 hours

Which models did you find particularly useful for those tasks?

SuspiciousCarrot78@aussie.zone · 11 hours

Pretty simple. People keep going on about how useful these local models are for coding. So what I wanted to do was to create a standardized test for myself to see if that was true before committing to anything.

( I think the various benchmarks out there are a bit fluffy, so I wanted to try it against a real workload.)

What I did was throw a bunch of money up at OpenRouter and then used Roo to call in diff models, one at a time.

I gave each the same task - that is, here is a piece of code, here is my ticket, here is my repo. Investigate what you want and then do what my ticket says.

I already knew what was wrong with the code, but I wanted to see how obedient the models are at sticking to a scoped ticket and what they would find.

By far the best bang for buck was GPT 5.4 mini. It is exceptionally obedient at doing exactly what you tell it as long as you tell it exactly what to do.

It won’t go off piste if properly constrained.

I think for light - med workloads, $20 on ChatGPT is a crimal steal. Chat and Codex have a separate usage pool.

I’m also aware that this is open AI’s lock in phase where they provide the samples of crack for free to get you hooked. And, yes, they are crack dealers in every sense of the word.

Anyway, it’s good to know that with a little bit of elbow grease and some smarts, the smaller models, which could reasonably be self-hosted, could do a decent enough job if they are narrowly scoped.

You’re probably not going to be able to yeet an entire code base at them and go “figure out what’s wrong and fix it” while you snooze tho, but I think that’s probably a good thing from a human in the middle perspective.

SuspiciousCarrot78@aussie.zone · 19 hours

Yeah :(

Were not there yet on consumer rigs.

SuspiciousCarrot78@aussie.zone · 19 hours

Probably. I wish Lemmy would remove up / down votes entirely. I might ask our lovely new mod if that’s possible.

The like / dislike button has been a curse since invented.

https://www.theguardian.com/technology/2017/oct/05/smartphone-addiction-silicon-valley-dystopia

EDIT: ah - I can hide em on my end. Not as good but it will do

SuspiciousCarrot78@aussie.zone · 20 hours

Brilliant. Thanks for doing that. Appreciated.

SuspiciousCarrot78@aussie.zone · 20 hours

This sounds like an excellent use case and I don’t know why you were downvoted.