Denormals are not needed if you are willing to handle underflow exceptions.
Before Intel 8087 and the IEEE 754 standard, any decent floating-point unit generated overflow exceptions and underflow exceptions, which had to be handled by the programmer, unless the default behavior of crashing the program was acceptable.
Intel 8087 and the standard based on it have offered to the lazy programmers the option to not handle the exceptions, in which case overflow exceptions generate infinities and the underflow exceptions generate denormals.
When the exceptions are not handled, it is supposed that the programmer will check the final results of a long computation, and if infinities and denormals are not desired, but they exist nonetheless in the results, the programmer will investigate the reason and then the bug will be fixed.
So anyone is free to ensure that no denormals will ever appear in an application , by enabling the underflow exception. If it is desired that the program must not crash, then the program must be written carefully, so that underflows are impossible.
There is no correct way of eliminating denormals, except throwing exceptions on underflows.
The flush denormals to zero on output and interpret denormals as zero on input behaviors are not permissible in any program that must produce correct results. Anyone who uses -ffast-math or similar options for compiling a program that is not intended for graphics or ML/AI, where errors supposedly do not matter, makes an unforgivable mistake.
Unfortunately, "-ffast-math" enables a very large number of compilation options. A part of them are safe and they can cause a great increase in performance, like using fused multiply-add instructions. Others not only are dangerous, like flushing denormals to zero, but they also provide negligible performance increase on many processors.
Therefore, instead of aggregate options like "-ffast-math" one must enable only some of the component options, for maximum performance, without affecting result accuracy. For example, in gcc and clang one must use "-ffp-contract=fast", for enabling FMA instructions.
It contains a bunch of configurable parameters, including the 64-byte size of those buffers.
Based on those configurable parameters, which also impose limits on the depth of procedure calls, it is possible to compute which will be the maximum space that will be needed in a stack. Thus in an embedded system it would be possible to guarantee that the stack size is not exceeded.
By changing the values of those configuration parameters, it should be possible to tune the size of the stack, with the price that with a lower space available in the stack it may become impossible to decode certain more complex messages.
The simulators for running old operating systems must simulate correctly the entire IBM PC, with all its peripherals, not only the CPU.
QEMU simulates some peripherals, e.g. a certain video card, for which it hopes that any operating system that you install includes device drivers. This assumption is no longer true for very old operating systems, which may either lack device drivers or their device drivers may rely on some hardware behavior of the peripherals that is not implemented in QEMU.
Simulators like 86Box simulate an IBM PC clone at a much greater detail, but that is paid by being much slower, so they are not suitable for recent operating systems, which need faster CPUs.
Because this is a library, it presumably allocates nothing in the heap or in static storage.
All data must be allocated in the program invoking procedures from the library, and passed as actual parameters.
You are right about the .text size being dependent on architecture, flags and compiler, but these dependencies may at most double or triple the size. They will certainly not make the size ten times greater. So with a maximum size of 25 kB, I expect that the maximum size will be under 100 kB on any combinations of architecture, flags and compilers.
I do not understand exactly what you mean about "unless it's all rodata". Depending on the architecture, flags and compiler, the constants may be allocated in separate sections, like ".rodata", or they may also be allocated in the same ".text" section with the executable code. The latter choice is typically superior on the CPUs that have relative addressing, like x86_64.
This is what you meant, that it is not clear whether the quoted ".text" size also includes the constants, or not?
I do not think that such a library includes a great amount of constants, so it is likely that adding them or not does not change much the size.
At the line pointed by you, the size of each of the 2 allocated buffers is 64 bytes. The buffer sizes and other parameters that determine the maximum amount of stack usage are defined in "wolfcose.h". It appears that is possible to tune the amount of stack needed, as a tradeoff with the complexity of the messages that can be decoded.
I agree that "Zero dynamic allocation" in the README is not really correct, because they meant "zero allocation in the heap".
Nevertheless, this cannot cause confusions, because any programmer should be aware that a claim of no dynamic allocation of any kind is typically impossible, because almost all functions or procedures must allocate some variables in the stack, with very rare exceptions where there are so few local variables that they may be allocated only inside registers. On x86_64, zero allocation in the stack is completely impossible, because at least the return address must be allocated in the stack.
The myth that the original Mackintosh was not intended to have a graphic display with a pointing device is debunked in TFA.
The only technical detail that we know about the original Mackintosh that was a mistake was the choice of the MC6809 CPU, for reduced costs.
MC6809 was a very nice CPU and for many programs the Intel 8088 CPU from the IBM PC was slower than a 2 MHz MC6809, but MC6809 was limited to 64 kB of memory, so the original Mackintosh would have become obsolete very quickly.
This is not enough to allow anyone to claim that it would have been a commercial disaster, but it would certainly would have had a short lifetime and then there would have been significant costs to port any software to a replacement CPU with a greater address space.
Jef wanted a trackball instead of a mouse. The claim that he did not want a pointing device is false.
That would have made little difference for a Mackintosh user. The hardware of a trackball is exactly equivalent with that of a mouse, neither is simpler than the other. (Optical mice without rolling balls have appeared only decades later.)
I have actually used trackballs instead of mice for a few years, and I have greatly preferred them to mice or touchpads.
Trackballs tend to be slower than mice, because you normally move them with the thumb or with the fingers, instead of moving the entire hand, but they are usually more comfortable than mice.
Nowadays, since several years ago, I use as the graphic pointing devices small graphic tablets configured in the relative mode instead of their default absolute mode. These are greatly superior from all points of view, speed, accuracy, comfort, to both mice and trackballs and to any other kinds of pointing devices, like trackpoints or touchpads.
So Jef Raskin had good reasons to question which is the best graphic pointing device, instead of just accepting the mouse because that happened to be the choice made at Xerox.
Based on my experience on how much better a stylus is than any kind of mouse, I consider the use of mice for pointing devices as a great historical mistake in the use of computers. I deeply regret that I have used mice for decades, instead of trying to find something better since the beginning.
The Apple Mackintosh is a significant culprit for the undeserved popularity of mice.
Is your deliberate misspelling of "Macintosh" spell-check or the same sort of intransigence that compels some people to misspell "Micro$oft" thinking they're clever?
I use Wacom Intuos S graphic tablets, on Linux. These are cheap tablets without screens, with a USB or Bluetooth interface.
These tablets have the same size as a traditional mouse pad, so they do not take more size on the desk than a mouse.
The stylus is extremely light, so I can keep it between my fingers while touch typing with all fingers on the keyboard. Thus I can transition between typing and pointing much faster than with a mouse, where I have to grip the mouse or release it.
I configure the tablet in "Relative" mode, where the operating system sees it as a mouse and there is no difference in behavior between it and a mouse.
I configure the stylus to emit a "left click" when I touch the tablet with its tip. The stylus has 2 buttons that can be pressed with your index. I configure one to be "right click" and the other to be "double left click". You can choose any other behaviors.
Holding the stylus is much, much more comfortable than holding a mouse, because of the natural position of the hand.
Moving the cursor with the stylus is much faster than with a mouse. I can move it across the screen from corner to corner instantaneously, because the stylus is much lighter than a mouse, and there is no friction, since it does not touch the tablet, unless you use it to select an area.
The positioning accuracy is much better than with a mouse. If you desire so, you can do handwriting, e.g. signatures, or make drawings with it, which of course is not surprising as this is supposed to be a graphic tablet.
I tried this initially only hoping for a better comfort, because my hand was hurting from the excessive use of mouse all day long. But then I discovered that not only it is very comfortable, but also much faster and more accurate than a mouse can be. Therefore I do not intend to ever use a mouse again.
Trackpads are more comfortable than mice, but they are much slower, so they are not competitive for things like drawing electronics schematics, which I do from time to time. While trackpads, like touchscreens, do not have the inertia problem of a mouse, it is absolutely impossible to move your fingertips with the speed with which you can move the tip of a stylus, which amplifies the amplitude of your finger movements.
There are too few programs which know to use "mouse gestures" for the user interface, e.g. some expensive CAD/EDA programs or the Web browser Vivaldi.
Mouse gestures are already better than any trackpad gestures, but when you use a stylus they become even better, as they are pretty much the same movements as in handwriting.
See another comment that quotes TFA, where what Jef says shows that Canon Cat was not at all "along the lines of his original vision" for the Mackintosh project.
He says that his Mackintosh was also intended to have a graphic display, not a text display, but with a trackball instead of a mouse, therefore it was completely unlike Canon Cat.
In TFA, Jef Raskin claims that the story about the mouse is not entirely correct:
JR: No. I designed it to be graphical from the ground up. But the text portions of the interface, which I also cared about, would have been cleaner. People have put together my dislike of the mouse (confusing dislike for a particular input device with dislike for graphic input devices in general; I personally prefer trackballs and tablets) and my careful attention to text handling to a false legend of my wanting a text-based machine. Andy [Hertzfeld, a major developer on the early Mac team], unfortunately, has not generally gone back to the original documents, and he’s interviewed lots of people about the history of the Mac, but not me. His website is, as a result, full of errors.
History is written by the victors. In this case it’s completely fine, as Raskin’s “corrections” don’t really amount to much, and certainly would have led to a path where Macintosh was just another abandoned experiment like the Apple III.
Perhaps in this alternate universe, a substantially reworked “Lisa II” might have been Apple’s long-lived computing platform.
The corrections may not amount to much, but there is no reason to believe that his version would be a failed experiment like the Apple III or the Lisa would have taken it's place.
Part of the magic of the Macintosh was the simplicity of the hardware. In that respect, it was much closer to the Apple II than the Apple III or Lisa. Consumers may not think much about what's inside the case, but it matters when it comes to manufacturing costs and that translates into the cost for consumers. While the original Macintosh was by no means cheap, it was about half the cost of the Apple III and a quarter of the cost of the Lisa. Heck, even the adoption of the Macintosh was slow because of its price. Maybe a less expensive 6809 based Macintosh would have had more success in the market, at least early on. It's also too easy to read too much into the failure of the Canon Cat. The Canon Cat was introduced years later. User expectations were starting to solidify around the GUI at that point. (Then again, success was not guaranteed. Lacking compatibility with the Apple II would have held it back. Especially so after the introduction of the IBM PC since the IBM PC had IBM backing it.)
I also think the adoption of the GUI for consumer computers would have been delayed considerably without the Macintosh 128k. Early machines that supported a GUI tended to be expensive. Early versions of Windows were crude. The only real outliers in that respect were the Atari and the Amiga. Would they have supported a GUI without Apple taking that first step? It's hard to tell.
The defining aspect of the Macintosh for me will always be the mandatory GUI - most everything else had it as either an entire afterthought, or at least as a “program started later”.
The mandatory graphic GUI - and MacPaint - made the point that the Mac was primarily a visual design tool that happened to handle text.
That was absolutely revolutionary.
S-100 systems and the early PCs were primarily text systems that sometimes happened to do crude graphics.
The original Apple II tried to do graphics but the tech to do it properly just didn't exist. And the underlying UI was still text based.
Raskin's Mac vision didn't make that leap. It wasn't just about the mouse, it was about the philosophy of the product. Raskin wanted text-but-cheaper-and-better, Jobs wanted pictures and art.
Early on, sure. I seem to recall Apple having their Human Interface Guidelines early on, which helped, yet there were developers who were either unaware of them or experimenting with different ideas. Other platforms tried to improve consistency later on though. For example: there was CUA for IBM. Of course, most of that went out the windows in the late 1990's and early 2000's when companies figured out that the easiest way to differentiate their products to consumers was visually, rather than technically.
Lisa 2 was cheaper than many later Macs, but the Mac folks seemed to have little interest in convergent evolution for the platforms or in integrating Lisa features like memory protection into the Mac. The result was that Lisa died as the Macintosh XL (ex-Lisa), with a Mac compatibility environment (MacWorks, which looked terrible with the stock Lisa rectangular pixels but better with a "Screen Kit" square pixel upgrade) as a consolation prize, while Mac users had to wait until Mac OS X for memory protection. Ultimately the Lisa hardware was able to run 68K versions of Mac OS through 7.6.1 in 1997.
Assuming the Mac folks had no interest in converging the platform in favour of the Lisa is somewhat unfair. While it sounds like some code was shared between the two platforms, the Lisa's operating system was quite different. It would have been difficult to make Lisa software operate under the Macintosh System Software. To my knowledge, there was virtually no software for the Lisa anyhow. Breaking software compatibility on the Macintosh to get the benefits of Lisa would have been a terrible business decision.
Aside from that, the MMU in the Lisa would have been a custom solution which Apple would have to support. When Motorola introduced an MMU, it was for 68020 generation machines. Apple should have been able to introduce memory protection at that point, but didn't. One of the reasons was that Apple struggled to make that next generation operating system while retaining compatibility with existing software (albeit, memory protection may have been only one of many problems). This was by no means a problem exclusive to Apple. Other platforms ran into similar issues.
Apple doesn't seem to have leveraged or combined work on (Lisa, Lisa Smalltalk, Lisa Xenix, Mac OS, A/UX, ...) as successfully as they might have. As you note, protected memory was deferred to multiple failed Mac OS successor projects (Pink/Taligent, Copland/NuKernel, etc.)
Ultimately Apple gave up, acquired Steve Jobs and NeXT, and eventually successfully migrated the Mac platform to an OS with memory protection.
Since then however Apple's OS and hardware strategy has been much more coherent, with macOS, iOS, iPadOS, tvOS, watchOS etc. sharing code, and sharing SoC technology as well. Ironically this is similar to Microsoft's "Windows [NT] everywhere" strategy.
That delay in shipping a memory-protected Mac was probably originally at least as much the result of upper-management politics as anything else. After Jobs left Apple Gassée cancelled Jobs’ pet project, the Big Mac which was intended to run Mac applications on a Unix base. Big Mac project leader Rich Page (and IIRC some other project members) rang Steve Jobs begging him to do something, and the rest is history.
I think it was in one of the On the Metal interviews where one of the guests mentions MPW was a submarine project, from UNIX background engineers, to eventually replace Pascal with C++.
Why is writing inline Assembly considered an advantage of C, a language extension even not part of ISO, and always used to point out issues when other languages make use of it?
Naturally there had to be a balance, until mid-90s what we consider AAA games, were mostly Assembly.
I'm saying that being designed around the singular task of word processing would have made it a platform/ecosystem failure, even if was a nominally successful one-off product.
The Macintosh (specifically the original 128k version) was a dismal market failure too. What succeeded (relatively speaking) was the platform/ecosystem.
This is exactly the theory of the RISC and VLIW processors, which replaced, respectively, the vertical microprograms and the horizontal microprograms stored in ROMs, which were used in the processors of the seventies, with normal programs with simple instructions, which were normally executed from fast cache memories, thus achieving the same speed as microprograms.
However, when the 8087 was designed, RISC and VLIW processors were still in the future, because a fast cache memory allowing the execution of an instruction per clock cycle was still far too expensive in comparison with a microprogram ROM.
Most earlier floating-point accelerators were microprogrammed like 8087, with the microprograms stored in a ROM. However, there existed FPS AP-120B, introduced by the company Floating Point Systems in 1976. This was a floating-point accelerator for minicomputers, like DEC PDP-11 or VAX, which was marketed as a "supercomputer for the poor".
FPS AP-120B was a VLIW processor launched 7 years before the term "VLIW" was coined. This means that it was a horizontally microprogrammed processor (i.e. with multiple concurrent operations specified by each microinstruction), where the microprogram was not stored in a ROM, but it was fed into the accelerator by the host computer. Therefore the user could write directly such microprograms for it, to implement optimized computational algorithms.
Nevertheless, while FPS AP-120B was said to be a "supercomputer for the poor", "poor" was meant only in comparison with those who could afford to buy a Cray-1. Such a "cheap" array processor still had a price more than 100 times greater than an Intel 8087.
By the time when RISC and VLIW CPUs became fashionable, using microinstructions as simple as those of Intel 8087 for implementing floating-point operations was no longer acceptable, because having to execute tens or hundreds of simple instructions for each FP operation was deemed too slow. Therefore the instruction sets of RISC and VLIW CPUs were eventually extended to include FP operations as single instructions, which had to be implemented in complex hardware in order to achieve an execution throughput of one instruction per clock cycle.
Denormal processing is slow only on certain CPUs, where the designers have been lazy, so when denormals are encountered that is handled by a microprogrammed sequence.
During the last half of century there have been plenty of CPUs where denormals have been handled in hardware, so that any slow down caused by them is negligible.
Except for generating graphic images seen by humans or in ML/AI applications, neither flushing results to zero nor treating denormal inputs as zero are acceptable, because they can lead to huge errors.
Whoever fears that denormals can slow down an application, must enable the underflow exception. In that case denormals are never generated, but the underflow exceptions must be handled, because when denormals are not desired but underflows happen, that means that there are bugs in the program, which must be fixed.
Denormals have been created so that people can mask the underflow exception and avoid to handle it, without dire consequences.
However this habit of no longer handling the floating-point exceptions, like before the IEEE 754 standard, has created younger developers who are no longer aware of how FP arithmetic must be handled to avoid errors, so now there are too many who believe that the use of "-ffast-math" is permitted in general-purpose programs, not only in special applications where result accuracy does not matter.
For correct results, you must use either denormals or underflow exception handling. There is no third choice. The third choice, like in GPUs, is only for when correctness is irrelevant.
Before Intel 8087 and the IEEE 754 standard, any decent floating-point unit generated overflow exceptions and underflow exceptions, which had to be handled by the programmer, unless the default behavior of crashing the program was acceptable.
Intel 8087 and the standard based on it have offered to the lazy programmers the option to not handle the exceptions, in which case overflow exceptions generate infinities and the underflow exceptions generate denormals.
When the exceptions are not handled, it is supposed that the programmer will check the final results of a long computation, and if infinities and denormals are not desired, but they exist nonetheless in the results, the programmer will investigate the reason and then the bug will be fixed.
So anyone is free to ensure that no denormals will ever appear in an application , by enabling the underflow exception. If it is desired that the program must not crash, then the program must be written carefully, so that underflows are impossible.
There is no correct way of eliminating denormals, except throwing exceptions on underflows.
The flush denormals to zero on output and interpret denormals as zero on input behaviors are not permissible in any program that must produce correct results. Anyone who uses -ffast-math or similar options for compiling a program that is not intended for graphics or ML/AI, where errors supposedly do not matter, makes an unforgivable mistake.
Unfortunately, "-ffast-math" enables a very large number of compilation options. A part of them are safe and they can cause a great increase in performance, like using fused multiply-add instructions. Others not only are dangerous, like flushing denormals to zero, but they also provide negligible performance increase on many processors.
Therefore, instead of aggregate options like "-ffast-math" one must enable only some of the component options, for maximum performance, without affecting result accuracy. For example, in gcc and clang one must use "-ffp-contract=fast", for enabling FMA instructions.
reply