Movfuscator – Single instruction C compiler

stcredzero · on Jan 24, 2019

Of course, no post like this is complete without OISC:

https://en.wikipedia.org/wiki/One_instruction_set_computer

(Then, there's my contribution: NISC -- the Null Instruction Set Computer! It's a revolutionary architecture, in that die sizes and TDP can be scaled down, seemingly arbitrarily, with no loss in functionality. In fact, NISC can transcend the solid state entirely, and be implemented directly on the quantum foam of the vacuum!

Another benefit: NISC machines are entirely immune to all buffer overflow and use after free exploits! In fact, they are free of all exploits, whatsoever!)

Legogris · on Jan 24, 2019

Someone made a NISC compiler: https://github.com/jbangert/trapcc

jbangert · on Jan 24, 2019

Well, we start running one instruction, it just never quite finishes. The other OISC systems run way more than one instruction, they’re just always the same one with different parameters.

Terr_ · on Jan 24, 2019

Nope, it's trivial to DoS them. Attackers don't even have to do anything.

stcredzero · on Jan 24, 2019

Nope, it's trivial to DoS them. Attackers don't even have to do anything.

That's just the "wrong" spin! NISC computers exhibit the same performance characteristics under DDOS attacks of immense scale. Increase the scale of the DDOS attack, and the performance graph remains flat! And by flat, I don't mean with a small slope or slight wobbles. I mean completely flat!

tiuPapa · on Jan 24, 2019

If the attackers can't do anything how will they DoS them? Wait I think I understand now how it can be implemented on the directly on the quantum foam of the vacuum.

naniwaduni · on Jan 24, 2019

It doesn't matter that they can't do anything if they don't need to do anything. Things attackers can't do only matter if they'd need to do them.

stcredzero · on Jan 24, 2019

Newsflash: NISC system weathers huge DDOS attack, with no discernible effects!

> Things attackers can't do only matter if they'd need to do them.

BUT, by the same token, things attackers can do don't matter either! NISC computers: So advanced, they make all attackers meaningless!

nickpsecurity · on Jan 25, 2019

Although that's funny, NISC is already a thing:

https://en.wikipedia.org/wiki/No_instruction_set_computing

readingnews · on Jan 24, 2019

Best part of the project is at the bottom, in the FAQ:

> Q: Why did you make this? A: I thought it would be funny.

gnulinux · on Jan 26, 2019

This is real hacker news.

stephencanon · on Jan 24, 2019

Without taking anything away from the author, calling MOV a "single instruction" stretches things quite a bit. It's more of a big family of instructions that happen to share an assembly mnemonic.

stcredzero · on Jan 24, 2019

MOV a "single instruction" stretches things quite a bit. It's more of a big family of instructions that happen to share an assembly mnemonic.

But if you look at the serious attempts to implement OISC, you'll find that the "single instruction" has so many parameters, some might think it might as well be a big family of instructions.

bepvte · on Jan 25, 2019

new OISC instruction: RUN <x86 instruction you want to run> <parameters>

stcredzero · on Jan 25, 2019

That was like one of my professor's version of Object Oriented programming. He would simply message pass to his undergrad assistant, Mark. "Mark, code up an X."

posterboy · on Jan 26, 2019

Reverse Substract and Skip if Borrow (RSSB) uses only one parameter and is supposed to be turing complete. I havent understood how it works, to be honest.

stephencanon · on Jan 25, 2019

This is completely fair, which is why I said not to take "anything away from the author" =)

stcredzero · on Jan 25, 2019

I wasn't interpreting your comment as a criticism. I was introducing background information, which also shouldn't be interpreted as a criticism.

simcop2387 · on Jan 24, 2019

"No wireless. Less space than a nomad. Lame." -- CmdrTaco

Yea this isn't a true OISC setup, but it's very close and still completely impressive that this technique even works. That you can influence all necessary registers like this and accomplish arithmetic isn't immediately obvious.

stephencanon · on Jan 25, 2019

My comment should not be interpreted as criticism.

ianbicking · on Jan 24, 2019

I'd recommend the video presentation linked off the page, where he walks through how he actually turned mov into a machine. It's very weird and interesting!

https://www.youtube.com/watch?v=R7EEoWg6Ekk

Legogris · on Jan 24, 2019

I love how he nonchalantly throws in a 0-day x86 PoC in the end.

benj111 · on Jan 24, 2019

So as there's no branch prediction, this is good defense against spectre type attacks?

mbauman · on Jan 24, 2019

For a certain value of "good", yes.

> This is thought to be entirely secure against the Meltdown and Spectre CPU vulnerabilities, which require speculative execution on branch instructions.

> The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.

https://github.com/xoreaxeaxeax/movfuscator/tree/master/vali...

ClassyJacket · on Jan 25, 2019

Should be good enough for console players.

JasuM · on Jan 24, 2019

I can imagine some timing-specific attacks for memory accesses, but they're not likely as robust as attacks against the branch-predictor:

1. This is the simplest one - if the memory being accessed is in a cache (L1/L2, or page in TLB), the function will take a significantly shorter time to execute. If movfuscator achieves conditional execution by manipulating index registers to perform idempotent operations, this will be very easy to detect.

2. Prefetching - if movfuscator reads memory sequentially with a detectable stride, prefetching will shorten the execution time.

3. Write combining - if the code writes to nearby addresses (same cache line), the CPU will combine them to a single write. This will cause a measurable timing difference.

EDIT: One more: Store forwarding - if the code writes to a memory address and reads it soon, the CPU may bypass the memory access (and even cache access) completely.

klodolph · on Jan 24, 2019

This is a defense against spectre type attacks, but this cannot be a "good" defense because it sacrifices too much. The programs written this way are assuredly quite slow.

dx87 · on Jan 24, 2019

in a similar vein, here is an exmple program that runs using only x86 CPU faults.

https://github.com/jbangert/trapcc

saurik · on Jan 24, 2019

https://hn.algolia.com/?query=movfuscator&sort=byPopularity

yellowapple · on Jan 24, 2019

Who would win: the world's most popular CISC architecture or one MOVey boi?

jchw · on Jan 24, 2019

This is awesome and it gets posted periodically. Previous discussion:

https://news.ycombinator.com/item?id=12372242

ChrisSD · on Jan 24, 2019

On a slight tangent, I remember minimal lambda languages like Iota, Jot and Zot that I think were derived from Unlambda.

odorousrex · on Jan 24, 2019

Ha! This is actually pretty darn clever!

I wonder what the final binary sizes are in comparison to normal GCC compilation, as well as execution speed.

I can't believe you made a floating point emulator just for this as well! Quite impressed.

Legogris · on Jan 24, 2019

Considering that the floating point library is big enough to be disabled by default tells me you might run out of disk space compiling a single application.

That being said, if this can increase difficulty of sidechannel attacks and branch prediction, maybe it'd actually make sense for very isolated parts of some services.

odorousrex · on Jan 24, 2019

>That being said, if this can increase difficulty of sidechannel attacks and branch prediction, maybe it'd actually make sense for very isolated parts of some services.

I agree, I can see how it certain special circumstances/services this could be very useful.

However, I still expect someone to compile Quake with it by the end of the year.

simcop2387 · on Jan 24, 2019

They've apparently done DOOM already, and it's hours per frame. I don't want to think what Quake would be like.

acheron · on Jan 24, 2019

To be clear, I (the HN submitter) have nothing to do with this, I just found it and thought it was great.

kbenson · on Jan 24, 2019

From the example images of original and obfuscated assembly, there's seems to be about a sixfold increase in the number of instructions (for that example, at least), so I imagine you'll see a similar binary size increase.

_ofdw · on Jan 24, 2019

Why haven't Single Instruction-Set architectures taken off?

If mov is Turing Complete it seems like there'd be a big win here... You could parallelize this massively.

Edit: can someone explain why this is being down voted please, because this is a legit question

saagarjha · on Jan 24, 2019

They’re impractical and massively inefficient?

_ofdw · on Jan 24, 2019

Okay?

I'm asking because I don't know; can you elaborate on why that is?

Aren't most modern GPUs similar in that they're designed to just shit out triangles as fast as possible, massively parallel?

saagarjha · on Jan 24, 2019

Basically, you're going to try to emulate other instructions that you don't have with this one instruction, and that's not going to perform very well because now, instead of many optimized instructions, you have strings of this one instruction in its place. And I don't see any way to parallelize this: you're doing the same thing you always were, just with a bunch more code.

md5person · on Jan 24, 2019

> "can you elaborate on why that is?"

Might as-well suggest we use a single letter, instead of 26.

posterboy · on Jan 26, 2019

     .....   ..  ..
    ..   ..  .. ..
    ..   ..  ....
    ..   ..  .. ..
     .....   ..  ..

The first number is zero, right? Then the first letter of the alphabet is well its hard to show because it doesn't print. It's just an empty set, nothing, a stop bit. We actually do use only 1 bit in digital cpus, but a weird mix of analogue in broad band transmission. I wonder why cpus don't use ternary or whatever. But I wonder why asynchronous CPUs didn't take off, so don't mind me, just being bored.

Sohcahtoa82 · on Jan 24, 2019

.... --- .-- .- -... --- ..- - - .-- ---?

munk-a · on Jan 24, 2019

............... .............. .....

............. ......... ....... ........ ....................

....................... ............... ................. ...........

chongli · on Jan 24, 2019

No, modern GPUs run many of the same instructions as CPUs. They have branches and everything. Their main limitation is that groups of threads are bundled together (called warps) and share a program counter, so lots of branching can result in a lot of wasted work if the threads disagree on which branch to take. That, and there's a huge penalty you pay for moving data across the bus to GPU RAM.

stefs · on Jan 24, 2019

few problems are easily parallelizable. that said, that's not even the issue here. specialized instruction may be emulated by movs, but the speed loss could never be recouped even by massive parallelization.

posterboy · on Jan 26, 2019

The problem is probably the address space that movs use, instead of specialized registers with optimized pipelining. But internally, many instructions might actually come down to conditional moves. I guess that's either after the microcode is decoded, or if I guessed wrong about that, then Register Transfer Logik still pretty much sounds like it was based on, well, transfers.

CyberDildonics · on Jan 24, 2019

Why would it make sense to use a single instruction?

(Also no, GPUs are not the same at all)

kej · on Jan 24, 2019

You can perform multiplication by repeated addition, but that is a very inefficient way to multiply. It's the same thing here, where you can replace other instructions with MOV, but the replacement is much slower than the original.

shawnz · on Jan 24, 2019

What makes you think this would be easier to parallelize than a traditional application? Just because there is only one kind of instruction used doesn't mean they don't still have to come in the right order!

stcredzero · on Jan 24, 2019

What makes you think this would be easier to parallelize than a traditional application?

Exactly such an idea was proposed for parallel signal processing quite a while back, actually. Look up One Instruction Set Computer.

shawnz · on Jan 24, 2019

I don't see any material online indicating that programs written for one-instruction-set-computers are more parallelizable than programs written for traditional computers. In fact, here is someone claiming the opposite:

> The disadvantage of an MISC is that instructions tend to have more sequential dependencies, reducing overall instruction-level parallelism.

https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m...

stcredzero · on Jan 24, 2019

Also note that the idea didn't take off.

stcredzero · on Jan 24, 2019

If mov is Turing Complete it seems like there'd be a big win here... You could parallelize this massively.

Actually, One Instruction Set Computers were proposed for this very application! (Also mentioned elsewhere, but is applicable in a serious way here.)

https://en.wikipedia.org/wiki/One_instruction_set_computer

garganzol · on Jan 24, 2019

At first look, the code produced by Movfuscator is mind boggling. But despite its perceived entanglement there exist some deobfuscators that bring the code back to Harvard (e.g. human readable) model.

TL/DR obfuscation by translating a program to "mov" instructions is susceptible to static analysis and thus fully reversible.

TimTheTinker · on Jan 25, 2019

> there exist some deobfuscators that bring the code back

Care to elaborate? The author of Movfuscator is very experienced and capable at reverse engineering. In one of his video talks he hits some Movfuscated code with some tools and says he doesn't know of anything that can deobfuscate it. That may have changed since then -- I'd be curious to know.

andreareina · on Jan 25, 2019

There's the demovfuscator: https://kirschju.re/demov

TimTheTinker · on Jan 25, 2019

Thanks! That’s really cool.