Even alternatives like GrapheneOS relies on AOSP. I wonder if it's possible for regulators in certain countries to pressure Google to kill it in the future.
Even if that's not the case, I'd imagine attestation apps like banking apps would require some kind of identity verification in exchange for trusting Graphene's keys.
In principle it doesn't make sense to leave any escape hatch, but I guess as always, it boils down to economy.
In theory, it's possible to have a third party (other than Google or Apple) to provide attestation on third party hardware.
You can have a separate core and kernel to run such code. They don't have to be powerful, but they'll need to be small enough to be verified by the said provider. For most of the code that doesn't need attestation, they can be executed on normal hardware.
The provider also has to convince the regulator or banks to trust them. However, if that's solved, the user should feel no difference between pure Android and alternative platform plus attestation.
Among the six patterns identified, it's interesting that "Iterative AI Debugging" takes more time (and possibly tokens) but results in worse scores than letting AI do everything. So this part really should be handed over to agent loops.
The three high score patterns are interesting as well. "Conceptual Inquiry" actually results in less time and doesn't improve the score than the other two, which is quite surprising to me.
Some valid points, but I hope the authors had developed them more.
On the semantic gap between the original software and its representation in the ITP, program extraction like in Rocq probably deserves some discussion, where the software is written natively in the ITP and you have to prove the extraction itself sound. For example, Meta Rocq did this for Rocq.
For the how far down the stack problem, there are some efforts from https://deepspec.org/, but it's inherently a difficult problem and often gets less love than the lab environment projects.
This specific example to me is less likely a consequence of model collapsing, but the "personality" adjustment about how aggressively it should read into the user's intention.
From time to time, I enjoy the model guessing what I meant rather than what I wrote. For example, "Find the backend.py" can be auto-corrected into "find the app.py".
> But let's hit the random button on wikipedia and pick a sentence, see if you can draw a picture to convey it, mm?
The inverse is also difficult. Pick a random 15 second movie clip, how to describe it using text without losing much of its essence? Or can one really port a random game into a text version? Can a pilot fly a plane with text-based instrument panel?
Text is not a superset of all communication media. They are just different.
Commercial aviation involves mostly textual interaction[1] to determine what the aircraft does, for most of the time. Aviation is rife with plain text, usually upper case for better legibility[2].
The program used to check the validity of a proof is called a kernel. It just need to check one step at a time and the possible steps can be taken are just basic logic rules. People can gain more confidence on its validity by:
- Reading it very carefully (doable since it's very small)
- Having multiple independent implementations and compare the results
- Proving it in some meta-theory. Here the result is not correctness per se, but relative consistency. (Although it can be argued all other points are about relative consistency as well.)
Checking the validity of a given proof is deterministic, but filling in the proof in the first place is hard.
It's like Chess, checking who wins for a given board state is easy, but coming up with the next move is hard.
Of course, one can try all possible moves and see what happens. Similar to Chess AI based on search methods (e.g. MinMax), there are proof search methods. See the related work section of the paper.
Even if that's not the case, I'd imagine attestation apps like banking apps would require some kind of identity verification in exchange for trusting Graphene's keys.
In principle it doesn't make sense to leave any escape hatch, but I guess as always, it boils down to economy.
reply