Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation

supersquirrel@sopuli.xyz · 2 hours

DuckDuckStop

supersquirrel@sopuli.xyz · 3 hours

Well said!

supersquirrel@sopuli.xyz · 8 hours

This is precisely why LLMs and AI are 99% a scam.

AI is the minimum value of a well structured dataset obtained by a destructive, hallucinatory compressor that is MAXIMALLY inefficient with inefficiency increasing nonlinearly as a model gets bigger… while it does demonstrate power, this observation that a well structured, QC’d, properly curated dataset reveals a latent intelligence in good data clearly points to functional programming, relational programming and the profession of the librarian and archivist as the directions where the genesis point of intelligence can be pursued, not these bullshitting AI’s which demonstrate a degraded truth in a stupendously hamfisted, wasteful way that sends amateurs looking hopelessly looking in the wrong direction.

In otherwords, the algorithm and model are worthless, costly junk, it is the well structured, large high quality dataset and the humans that maintain and contextualize it that are precious.

Scientists could have told computer people this was true a long time ago if they had listened.

supersquirrel@sopuli.xyz · 8 hours

The failure analysis in First Proof’s Appendix A describes something qualitatively different from the hallucination patterns studied in factual QA: models producing proofs that are fluently wrong, where the wrongness is concentrated in a small number of unjustified load-bearing claims rather than spread across obviously false individual facts. I have tried in this paper to give that pattern a precise enough description to be studied systematically. The taxonomy has four modes (F1: citation fabrication, F2: premise smuggling, F3: silent reformulation, F4: local-to-global gap), and my empirical audit of eight Flash proofs finds that F2 accounts for the failure in every case—even though it is the mode least targeted by existing mitigation proposals.

The obvious question this raises is whether it is possible to build a system that doesn’t produce these failures in the first place, as opposed to detecting them after the proof has been written. A prevention-oriented system would need to enforce, during generation, that every load-bearing claim in the proof is either derived from stated premises, grounded in a retrieved and verified source, or explicitly flagged as unverified before the output is returned. The failure modes described here are, I think, a reasonable specification of what such a system would need to prevent.

supersquirrel@sopuli.xyz · 22 hours

Why do people seem to especially like fish?

supersquirrel@sopuli.xyz · 1 day

Remember in Reaper you can change the theme and menus around all you want, so you can make it look and behave however you want if certain things annoy you.

FL Studio is great, it isn’t fair to compare other audio software to Reaper lol.

Sytrus is an amazing design for a synthesizer and Harmor is fascinating too.

supersquirrel@sopuli.xyz · 2 days

Beyond All Reason or Xonotic.

Xonotic is so fast and smooth.

https://inv.nadeko.net/watch?v=K2FoeKWm5a0

https://www.youtube.com/watch?v=K2FoeKWm5a0

supersquirrel@sopuli.xyz · 2 days

Ok sorry I got frustrated, this AI stuff is very frustrating and I am pretty damn worried it is going to send my country into a depression because of how much fake money is being tossed around… but you are right I shouldn’t have used the phrase “mentally incapable” like that, this is complicated stuff, life is hard, and that is a valid point that being ableist about isn’t productive for anyone or anything so fair enough I am sorry about that.

I don’t apologize for being a dick though, just because my tone is harsh doesn’t mean I am the one doing the damage here, the people lost in the sauce about AI are going to ruin real human lives with this when it comes crashing down, so to be clear I apologize specifically for being ableist because you are right that was shitty.

supersquirrel@sopuli.xyz · 3 days

Reaper works great on linux.

supersquirrel@sopuli.xyz · 3 days

annoying note - for the record I quoted this from the article, I didn’t write this.

I agree with you though, that is the exact reason I quoted that section because it clarified things for me!

supersquirrel@sopuli.xyz · 4 days

emacs with vim keybindings is all I need

supersquirrel@sopuli.xyz · 4 days

Arch is not the only distribution that has a service for providing “use at your own risk”, unreviewed, user-submitted content; Fedora has Copr, the openSUSE project has the Open Build Service (OBS), and Ubuntu has Personal Package Archives (PPAs). Each of those services allow a person to sign up without any review process and build packages for download by other users of the distributions.

However, there are important differences between those services and the AUR. They provide a build environment that is similar to the ones used for the official distribution packages, and do not allow pre-built binaries or proprietary software. The model for Copr, OBS, and PPAs is that a user creates a project under their own user namespace; users have to add each repository from one of those services separately.

For example, niri creator Ivan Molodetskikh maintains a Copr repository for Fedora users who want to run the tiling Wayland compositor. To install niri from Copr, a user has to enable that repository specifically. It is possible for other Copr users to create a similar project under their own namespace, but it is not possible for another user to take over Molodetskikh’s repository unless they compromise his credentials. A would-be attacker could create a malicious fork on Copr and try to lure Fedora users to add that package repository to their system instead, but the attacker cannot simply pick up an orphaned Copr repository to compromise users who have already added it.

The AUR, on the other hand, is much more relaxed about ownership; the PKGBUILD files are all maintained under the AUR namespace. The rules state that, when a new maintainer takes over an AUR package, they are supposed to add their own information as maintainer and then list the prior maintainers as contributors. That, however, is taken on trust and (as seen with the current attack) can be easily abused.

supersquirrel@sopuli.xyz · 4 days

I am going to make a post soon, especially if people are interested, about how to use Boxbuddy with Distrobox to create mutable atomized linux distributions you can add packages to and do whatever you want with while retaining the vanilla immutable Vanilla SteamOS settings. There are a lot of advantages to this method, one of them being that you can easily set up many different atomized linux environments with different versions or combinations of packages for testing or other use cases.

I am still encountering some limitations but I have Debian with R and Python along with multiple geospatial packages for both languages and Pandoc installed in an atomized linux distribution and it works great. I just installed emacs on the atomized linux distribution itself and having been using org mode to execute R and Python with codeblocks and set up a workflow that way.

https://flathub.org/en/apps/io.github.dvlv.boxbuddyrs

https://wiki.archlinux.org/title/Distrobox

supersquirrel@sopuli.xyz · 5 days

I expect people not to spread disinformation about technology that sells a scifi fantasy as an inevitability just around the corner without providing any proof that it will happen.

You are bullshitting, stop bullshitting.

supersquirrel@sopuli.xyz · 5 days

Yeah I just use my Steam Deck as my main computer, everyone always seems to be surprised but like… the reason I didn’t buy a Nintendo Switch was so I wouldn’t have a piece of essentially bricked hardware outside of very narrow controls…

supersquirrel@sopuli.xyz · 5 days

You are pointing at a remote potential you consider inevitable and saying condescendingly “Well of course this will happen!?”.

The burden of proof is on you.

supersquirrel@sopuli.xyz · 5 days

You are speculating about something with no evidence.

supersquirrel@sopuli.xyz · 5 days

cross-posted from: https://sopuli.xyz/post/47552333

edit there is a list of the games in the video description

This is a great channel!

https://www.youtube.com/watch?v=TpVGInA0IGk

supersquirrel@sopuli.xyz · 8 days

What pisses me off so much about the AI bubble as someone who was trained in science is the basic obvious fact that if you have a magic tool that can “solve any problem” but that takes a nearly impossible amount of energy to do so, taking so much energy that everything else has to be sidelined in order to power this magic problem solving tool… than you are just restating that the tool you have is incapable of problem solving.

Suppose there are two problem solving machines A and B. Problem solving machine A does not work very well but it takes a small amount of energy and material to function, problem solving machine B is like AI in that it can tackle “any problem” if given nearly all the electrical power and computer chips that humans can produce.

Both problem solving machine A and B are equally useless even though the owners of problem solving machine B can make grandiose claims about the power of their problem solving machine if they conveniently exclude the “energy required” part, which as someone trained in science ends up seeming pretty damn similar to someone arguing that their perpetual motion machine is capable of perpetual motion so long as we feed all of the electrical power humanity can generate into it…

If your Oracle requires all the investment, electrical energy and computer chips earth can produce and more to tell you the future, it is not an Oracle but rather your own demise twisted into a promise of absolute power and knowledge.

Anybody who understands computer science should be able to grasp the basic idea that it is trivially easy to describe infinitely powerful computer programs that take essentially infinite time and energy to compute, that doesn’t mean anything, in fact most of computer science has been the proactive study of how to rigorously talk about programs that are impossible to execute because of outsized material demands and develop an overarching theory of computer science that nonetheless accounts for that theoretical axis. Computer science people thus should be well equipped to understand AI is just bullshit being sold as genius by disguising the cost per unit of intelligence… but they don’t really seem to grasp that basic aspect of physics in how it limits computation and it doesn’t say a lot of good about the whole industry.

supersquirrel@sopuli.xyz · 9 days

There was never any other realistic option here, Microslop is too dysfunctional and lost in the fake AI “revolution” to ever hold on to game developer talent.

Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation

100 Indie Games to Try | Steam Next Fest June 2026 | Best Indie Games - Youtube (Invidious link)