(Then, there's my contribution: NISC -- the Null Instruction Set Computer! It's a revolutionary architecture, in that die sizes and TDP can be scaled down, seemingly arbitrarily, with no loss in functionality. In fact, NISC can transcend the solid state entirely, and be implemented directly on the quantum foam of the vacuum!
Another benefit: NISC machines are entirely immune to all buffer overflow and use after free exploits! In fact, they are free of all exploits, whatsoever!)
Well, we start running one instruction, it just never quite finishes. The other OISC systems run way more than one instruction, they’re just always the same one with different parameters.
Nope, it's trivial to DoS them. Attackers don't even have to do anything.
That's just the "wrong" spin! NISC computers exhibit the same performance characteristics under DDOS attacks of immense scale. Increase the scale of the DDOS attack, and the performance graph remains flat! And by flat, I don't mean with a small slope or slight wobbles. I mean completely flat!
If the attackers can't do anything how will they DoS them? Wait I think I understand now how it can be implemented on the directly on the quantum foam of the vacuum.
Without taking anything away from the author, calling MOV a "single instruction" stretches things quite a bit. It's more of a big family of instructions that happen to share an assembly mnemonic.
MOV a "single instruction" stretches things quite a bit. It's more of a big family of instructions that happen to share an assembly mnemonic.
But if you look at the serious attempts to implement OISC, you'll find that the "single instruction" has so many parameters, some might think it might as well be a big family of instructions.
That was like one of my professor's version of Object Oriented programming. He would simply message pass to his undergrad assistant, Mark. "Mark, code up an X."
Reverse Substract and Skip if Borrow (RSSB) uses only one parameter and is supposed to be turing complete. I havent understood how it works, to be honest.
"No wireless. Less space than a nomad. Lame." -- CmdrTaco
Yea this isn't a true OISC setup, but it's very close and still completely impressive that this technique even works. That you can influence all necessary registers like this and accomplish arithmetic isn't immediately obvious.
I'd recommend the video presentation linked off the page, where he walks through how he actually turned mov into a machine. It's very weird and interesting!
> This is thought to be entirely secure against the Meltdown and Spectre CPU vulnerabilities, which require speculative execution on branch instructions.
> The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.
I can imagine some timing-specific attacks for memory accesses, but they're not likely as robust as attacks against the branch-predictor:
1. This is the simplest one - if the memory being accessed is in a cache (L1/L2, or page in TLB), the function will take a significantly shorter time to execute. If movfuscator achieves conditional execution by manipulating index registers to perform idempotent operations, this will be very easy to detect.
2. Prefetching - if movfuscator reads memory sequentially with a detectable stride, prefetching will shorten the execution time.
3. Write combining - if the code writes to nearby addresses (same cache line), the CPU will combine them to a single write. This will cause a measurable timing difference.
EDIT: One more: Store forwarding - if the code writes to a memory address and reads it soon, the CPU may bypass the memory access (and even cache access) completely.
This is a defense against spectre type attacks, but this cannot be a "good" defense because it sacrifices too much. The programs written this way are assuredly quite slow.
Considering that the floating point library is big enough to be disabled by default tells me you might run out of disk space compiling a single application.
That being said, if this can increase difficulty of sidechannel attacks and branch prediction, maybe it'd actually make sense for very isolated parts of some services.
>That being said, if this can increase difficulty of sidechannel attacks and branch prediction, maybe it'd actually make sense for very isolated parts of some services.
I agree, I can see how it certain special circumstances/services this could be very useful.
However, I still expect someone to compile Quake with it by the end of the year.
From the example images of original and obfuscated assembly, there's seems to be about a sixfold increase in the number of instructions (for that example, at least), so I imagine you'll see a similar binary size increase.
Basically, you're going to try to emulate other instructions that you don't have with this one instruction, and that's not going to perform very well because now, instead of many optimized instructions, you have strings of this one instruction in its place. And I don't see any way to parallelize this: you're doing the same thing you always were, just with a bunch more code.
The first number is zero, right? Then the first letter of the alphabet is well its hard to show because it doesn't print. It's just an empty set, nothing, a stop bit. We actually do use only 1 bit in digital cpus, but a weird mix of analogue in broad band transmission. I wonder why cpus don't use ternary or whatever. But I wonder why asynchronous CPUs didn't take off, so don't mind me, just being bored.
No, modern GPUs run many of the same instructions as CPUs. They have branches and everything. Their main limitation is that groups of threads are bundled together (called warps) and share a program counter, so lots of branching can result in a lot of wasted work if the threads disagree on which branch to take. That, and there's a huge penalty you pay for moving data across the bus to GPU RAM.
few problems are easily parallelizable. that said, that's not even the issue here. specialized instruction may be emulated by movs, but the speed loss could never be recouped even by massive parallelization.
The problem is probably the address space that movs use, instead of specialized registers with optimized pipelining. But internally, many instructions might actually come down to conditional moves. I guess that's either after the microcode is decoded, or if I guessed wrong about that, then Register Transfer Logik still pretty much sounds like it was based on, well, transfers.
You can perform multiplication by repeated addition, but that is a very inefficient way to multiply. It's the same thing here, where you can replace other instructions with MOV, but the replacement is much slower than the original.
What makes you think this would be easier to parallelize than a traditional application? Just because there is only one kind of instruction used doesn't mean they don't still have to come in the right order!
I don't see any material online indicating that programs written for one-instruction-set-computers are more parallelizable than programs written for traditional computers. In fact, here is someone claiming the opposite:
> The disadvantage of an MISC is that instructions tend to have more sequential dependencies, reducing overall instruction-level parallelism.
At first look, the code produced by Movfuscator is mind boggling. But despite its perceived entanglement there exist some deobfuscators that bring the code back to Harvard (e.g. human readable) model.
TL/DR obfuscation by translating a program to "mov" instructions is susceptible to static analysis and thus fully reversible.
> there exist some deobfuscators that bring the code back
Care to elaborate? The author of Movfuscator is very experienced and capable at reverse engineering. In one of his video talks he hits some Movfuscated code with some tools and says he doesn't know of anything that can deobfuscate it. That may have changed since then -- I'd be curious to know.
https://en.wikipedia.org/wiki/One_instruction_set_computer
(Then, there's my contribution: NISC -- the Null Instruction Set Computer! It's a revolutionary architecture, in that die sizes and TDP can be scaled down, seemingly arbitrarily, with no loss in functionality. In fact, NISC can transcend the solid state entirely, and be implemented directly on the quantum foam of the vacuum!
Another benefit: NISC machines are entirely immune to all buffer overflow and use after free exploits! In fact, they are free of all exploits, whatsoever!)