More

tomysshadow · on Dec 17, 2024

Someone on Reddit pointed out to me after, DDS is DirectX specific and the DVD was for Win/Mac, so it may have been more of a multiplatform issue than a space one. In an ideal world you'd use DDS on Windows because it's fastest there, and whatever the OpenGL equivalent is on Mac, but that would've been well, well outside of space constraints, at that point

tomysshadow · on Dec 17, 2024

1. There are probably better tools to do it, but I just used Google Drawings.

2. Not for Myst IV currently. All of the prior games are supported by ScummVM, which would work on an Android device, but Myst IV is not in there yet. Maybe someday though

tomysshadow · on Dec 17, 2024

No, actually, I did see this. The problem with using STBI_MALLOC in that way is that it is used for _all_ allocations by STB, not just the main image buffer. The image buffer I need to put the data into is passed in as an argument to Gfx Tools. It is already existing, and I couldn't touch the code where it was created, that code lived in yet another DLL, the one calling into Gfx Tools. So if I overrode STBI_MALLOC to just return that same buffer every time, then any time STB calls malloc, each malloc call would return the same buffer, that same buffer would be used for different allocations with entirely different purposes. So, it's close, but it wouldn't work. I would need to have done some hack like checking the requested buffer size to malloc matches the expected size of the image buffer, and only return it then, but that's of course quite fragile and error prone. Or, you know, just go into the STB source and find the one malloc I need to replace, but that's kind of dirty.

tomovo · on Dec 29, 2024

Thanks for the detailed explanation, good to know!

tomysshadow · on Dec 17, 2024

For me at least, with the optimized loader library, yes, the impact of the DDS conversion is almost unnoticeable. However, I have a pretty fast CPU so I don't want to assume that it'd be the case for everyone, and the DDS conversion was done first, so even if it is overkill it costs me nothing to leave in and will better serve anyone with a slow CPU where mango and Pixman aren't enough.

Shortly after I released my tool, I had someone report that it was crashing for them because they were using a third gen Intel CPU that didn't have a SIMD instruction set the x64 command line portion used (particularly the BMI instruction set.) It was a bug in the mango image library anyway because I had disabled that particular instruction set when I built it, but goes to show that when you're doing a retro game hacking project, a lot of gamers are keen to use older hardware for them and I'm quite aware of this fact

tomysshadow · on Dec 17, 2024

I was genuinely concerned that everything I was doing with mango and Pixman was going to turn out to be pointless. It wasn't, thankfully, there was a noticeable difference after introducing them. But it was a gamble for sure, because there was no smaller test I could really do to know it was worth it in advance - if I wanted to replace that DLL, I was going to have to replace the whole DLL because it was C++, the DLL exports were all mangled names for classes that all kind of interacted with each other, so I couldn't just cleanly replace one call and see that it was a good idea. I try to gather as much evidence as I can to back the idea it'll work before I make the leap, but I've learned that if you really want to get stuff done sometimes you just have to go for it and assume there is a way to salvage it if it fails

tomysshadow · on Dec 17, 2024

To be completely honest, it's surprising to me as well. I would expect it to be bad, but not as bad as it was. I entirely expected that the slow part would be decoding, not copying. In fact, my initial plan was to convert the remaining images that couldn't be DDS to Targa, on the assumption it would decode faster. However, when I investigated the slow functions and found they were only copying, I changed tactic because then in theory that would not make a difference.

There is no fixed amount of per-frame work. After the 550ms hardcoded timer is up, it is blocking during the loading of those images, and during this phase all animations on screen are completely still. I thought to check for this, because it did occur to me that if it tried to render a frame inbetween loading each image to keep the app responsive, that would push it to be significantly longer, and that would be a pretty normal thing to want to do! But I found no evidence of this happening. Furthermore, I never changed anything but the actual image loading related code - if it tried to push out a frame after every image load or every x number of image loads, those x number of frames wouldn't go away only by making the images load faster, so it'd have never gotten as instant as it did without even more change.

The only explanation I can really fathom is the one I provided. The L_GetBitmapRow function has a bunch of branches at the start of it, it's a DLL export so the actual loop happens in a different DLL, and that happens row by row for 500+ images per node... I can only guess it must be because of a lack of CPU caching, it's the only thing that makes sense given the data I got. Probably doesn't help that the images are loaded in single threaded fashion, either.

That said, there have been plenty of criticisms of my profiling methodology here in these comments, so it would be nice to perhaps have someone more experienced in low level optimizations back me up. At the end of the day, I'm pretty sure I'm close enough to right, at least close enough to have created a satisfactory solution :)

mananaysiempre · on Dec 18, 2024

I absolutely did not mean to imply that you did a bad job at any point, or to discourage you. The mere fact that you reached that far into the game’s internals, achieved the speedup you were aiming for, and left it completely functional is extremely impressive to me.

And that’s part of why I’m confused. If you’d screwed up the profiling in some obvious way, I’d have chalked it up to bad profiling and been perfectly unconfused. But your methods are good as far as I can see, and with the detail you’ve gone into I feel I see sufficiently far. Also, well, whatever you did, it evidently did help. So the question of what the hell is happening is all the more poignant.

(I agree with the other commenter that you may have dismissed WaitForSingleObject too quickly—can your tools give you flame graphs?.. In general, though, if machine code produced by an optimizing compiler takes a minute on a modern machine—i.e. hundreds of billions of issued instructions—to process data not measured in gigabytes, then something has gone so wrong that even the most screwed-up of profiling methodologies shouldn’t miss the culprit that much. A minute of work is bound to be a very, very target-rich environment, enough so that I’d expect even ol’ GDB & Ctrl-C to be helpful. Thus my discounting the possibility that your profiling is wrong.)

tomysshadow · on Dec 17, 2024

This is something I'd love to get right. Pixman does appear to support sRGB input, in the form of the PIXMAN_a8r8g8b8_sRGB format, which might work well enough for the premultiply step. It's the unpremultiply that I'm struggling to wrap my head around - I'm guessing I'd need Pixman to output to 16-bit channels in the destination, otherwise I wouldn't be able to convert back to sRGB? That's kind of a massive pain though, I'd have to allocate a whole other temporary buffer that's double the size, for something that is imperceptible enough I never noticed it with my test images or in my playthrough. So I'm unsure what the cheapest way to do it would be. This is all well outside of my area of expertise which is primarily hacking and reverse engineering, but I'm always open to learn.

I tried my hardest to create something that was as "technically correct" as I could approximate given my lack of graphics experience and the performance constraints I was under, but I kind of knew it was likely I could mess up some small detail. Maybe since it's open source someone will eventually come along to correct it? One can dream :P

tomysshadow · on Dec 17, 2024

right, looking at it again, I think I get it now. You'd need: the 8-bit sRGB source, to the premultiplied image as floating point (Pixman can't do to 16-bit channels it seems,) then to the resized image as floating point, then unpremultiply that, and then go back to 8-bit sRGB. It makes sense in my head, I just don't know if it's really worth all that tradeoff, it's a lot of extra steps... I don't even know that the original resize algorithm would've even done it either given its age, and my goal is to replicate that. But maybe I'll test and see how it goes eventually

tomysshadow · on Dec 21, 2024

Followup: I've now implemented this and I determined it doesn't take enough longer to have a noticeable impact. It so happens mango has an sRGB to linear function that is much faster than using Pixman to do the conversion so that's what I used. I kept it 32-bit all the way through which will introduce some colour banding but it's not really noticable with the photorealistic images being resized. So I expect this will be ready for whenever I release my next version

tomysshadow · on Feb 27, 2019

Probably because Director was originally created under a different name, VideoWorks, which WAS released by MacroMind before they eventually became Macromedia.

tomysshadow · on Feb 27, 2019

Note that Shockwave installers will also stop working after that point, as even the "full" installer downloads components from the web during the install.

tomysshadow · on July 28, 2018

Here's the thing. A lot of the games being saved here are games that the original publishers/developers no longer care about - that or the original creators don't even exist anymore. That throws a large portion of Flash games into the gray area of abandonware, although that's a different discussion entirely.

This is actually a place where I differ with BlueMaxima (BlueMaxima is Flashpoint's creator and the writer of the article linked here.) Personally, if I were running the project I would remove a game if the original creator asked, simply out of respect for them.

However, I think that most of the staff involved with Flashpoint would disagree with me here. It is more of a "for now, just go go go and we'll worry about organizing everything and dealing with the repercussions later." There's a lot of Flash games out there to say the least and Flash being discontinued in 2020 leaves us limited time if we hope to save as much as possible.