6 years ago, I rewrote 30 000 lines of C++ code in 1000 lines of Ruby.
A year ago, I rewrote 424 lines of Java+Spring+Hibernate in 18 lines of bash. This is less glorious, but if you compare the size of the deliverable, it's 39Mb for the J2EE webapp against… 772 bytes for the shell script.
At my current job, I maintain an app with 500,000 lines of code that could easily fit in 20,000.
Java is quite a bit more verbose than some other languages, but I think the culture is a significant factor; C# is similar. I've noticed the tendency to overabstract and overgeneralise, apply design patterns for the sake of using them, etc. is particularly strong. It's inexplicably hard to trace out the train of thought that makes things like this appear:
I've written small Java apps --- one class only, i.e. one source file --- which needn't be any more complex, yet coworkers have said there's something uncomfortable about my code; but for some reason can't explain exactly why. I'd get comments like "wouldn't it be better if you made this (only used once and very trivial) line of code a separate function?" "could you use more classes?" (for a <100LoC script-ish thing that had almost no duplicated code nor much in the way of loops.) It's almost as if they can't get their heads around how simple something can be, so it somehow feels very wrong to them.
Then there's the extremist "premature optimisation is evil" attitude, which I think is completely misguided because more code and complexity is bad not only in terms of computer time but also programmer time --- it takes more time for programmers to design, write, read, and debug more complex code.
I'd get comments like "wouldn't it be better if you made this (only used
once and very trivial) line of code a separate function?" "could you use
more classes?"
The trouble is that there is a cargo-cult belief in unit testing in the Java (and .Net) worlds. Everything must have a full suite of unit tests, even if it's a one-off script. Therefore, in order to enable the mock objects needed for unit testing, you can't have "just" a class. You have to have an interface, which that class then implements. If you have any non-trivial piece of functionality, it must be encapsulated into its own method, otherwise, how are you going to test it in isolation.
My belief is that it all stems from the fact that Java doesn't have a REPL, so programmers use "test-driven development" to get the same level of interactivity from their code.
If you look at the history of object orientation, it starts as slots for function pointers in structs in C, with some preprocessor statements to make it easier to use. Literally the object is just a struct which holds data and function pointers.
Pretty interesting when you think about it - java hasn't moved that conceptually far from C preprocessor statements.
In my early days - I once found a Tuple library for Java, and added it to the codebase. Senior devs soon instructed me to remove it - Back to a multitude of container objects...
Hard to say without knowing more context, but they could have been right. Personally I'd prefer clearly named, domain-specific container objects with adequately named properties than having to deal with "anonymous" Tuples all over the codebase, and being forced to remember what the first element was, what the second one was etc.
> My belief is that it all stems from the fact that Java doesn't have a REPL, so programmers use "test-driven development" to get the same level of interactivity from their code.
It's not the same thing though. REPL gives you fast feedback once. Unit tests you build by following TDD are there to stay.
I've gotten many code review comments to the effect of "can you break this out into an interface?" so I 100% understand what you're saying.
Many of the instances were things that we were only likely to make one example of either. I could see that argument being valid if there was a clear second implementation in mind, but an interface with only one implementing class is often just wasted time.
Also, extracting an interface is often 100% automated using the refactoring capabilities of your IDE, so doing it up front has little benefit (except in certain cases like being a public API in a library).
The verbosity in Java is soon going to be reduced by an extensive degree by the three new features included in `Java 8`, namely:
1. Lambdas: Lambdas are brief and to the point, the expressions reflect the terseness of `C/C++`. In many situations, you can totally do away with interface creation by just using a lambda instead. Consider all the lines of code saved in a legacy code by just using lambdas in code!
2. Default methods: Again, `Interface patching` is a curse that a lot of Java projects suffer from. Requirements change slightly and what your typical enterprise Java dev does is add patches upon patches and layers upon layers in an interface. With default methods, you can just add default methods to your existing interfaces without affecting the ABI of your app. Again, tons of redundant code saved here.
3. Type annotations: Again, your typical Java devs have to resort to ugly hacks to place any constraints on the object properties. With type annotations, all you have to do is:
@NonNull String str;
And lo and behold! The variable `str` cannot be assigned any `null` value. So, lots of validation code being saved here again.
To the best of my knowledge, none of the above features exist in the `C#` language (yet), so I take it as a Java's edge over C#.
I coded in C# for a few years, used plenty of Lambdas and a handful of annotations in Class definitions. Everytime I had to go back to Java I missed them to death.
I never felt like interface patching was a problem for me, may be because I'm more of a FP than an OOP programer. Type annotations do sound interesting.
I talked about c#'s lambda syntax, which Java later borrowed, below. For the other points:
I have seen a little of the interface vs function conversions in modern Java, and it seems a decent enough small convenient feature in the context of Java, but it doesn't feel like a big win that I would cry out for in C# code.
Defaulting to not-null and immutable values is a trend in several recent languages, e.g. from Swift and Rust to F#. This IMHO looks like a good step forward and I welcome it, however it's hard to see how this would be comprehensively added into existing languages like Java or C# with an existing legacy of not working that way. Opting into Not-null in c# has been talked about many times (1) in various forms, even implemented using attributes (similar enough to type annotations). (2)
I don't know what major project you are talking which uses these features. I'm not talking about small projects by John Joes, I'm talking about LARGE infrastructure projects. For example, Android SDK is extensively used in Java world, and the Android SDK has tons of Interfaces as either arguments or class helpers which can be substantially overhauled by using these features.
Please don't move the goalposts; You stated that c# does not have lambdas, but has for 9 years now.
You also asked "I don't know what major project you are talking which uses these features". As far as I have seen, and I've seen a fair amount, any and every non-trivial c# program (and many of the trivial ones too) use lambdas. Even if it's just in the small-scoped, rote use e.g. of
var matches = someList.Where(item => item.x > y);
Technically that counts as "using a lambda". Don't knock it for being trivial, it's a gateway drug ;)
Hindsight is 20-20. The iterative bloat during development can happen for so many reasons. (Numerous changes in requirements late in the project, many developers working on numerous modules, design for flexibility and expansion that are never used, going too far down a path that turns out to be more work than an alternative).
Once a program has been in production for a while and its use cases are more well defined it becomes easier for a single person to swoop in and rewrite it in a far more conscice way. In most of these cases I think the language is pretty irrelevant in terms of how much can be culled.
> The iterative bloat during development can happen for so many reasons. (Numerous changes in requirements late in the project,
... and people not cleaning their mess up afterwards;
> many developers working on numerous modules,
... without proper coordination, or, without the drive to keep things clean (i.e. proper manners);
> design for flexibility and expansion that are never used,
... and never cleaned up, i.e. left there to rot by the people who put them there;
> going too far down a path that turns out to be more work than an alternative)
... and not cleaning up.
> Once a program has been in production for a while and its use cases are more well defined it becomes easier for a single person to swoop in and rewrite it in a far more conscice way.
I think it's always easier for the guy/gal who wrote something to clean it up than for the next guy/gal, who doesn't know the code or its history as well.
Really, this is all just people not cleaning up (or being allowed to clean up) after themselves.
Now if we could all just be nice to each other and hold hands..
Coming from a 10 year veteren in consulting for orgs ranging in size from 3 people to federal departments.
Most of the time there isn't any budget left after they have a functional app for cleaning things up. Despite impassioned hand wringing the client will place value on things they can see over things they can't. So any budget left over will almost invariably be directed toward additional features and not on proper engineering.
Sad but true.
Speaking to your points there's also the problem of not being able to see the forest through the trees for developers who have been with a project from the beginning. So large refactoring endeavors can be less obvious to them. They may also have fatigue and be less willing to rewrite everything. This combined with budget pressure is all it takes for the bloat to stick.
After my own decade of enterprise programming i can there only two ways to obtain clean code. Either you build it right from the start, or you make a cleanup a blocking requirement before a high value feature can be added (which often requires a liberal interpretation of the word 'blocking').
I'm a fan of the first approach. Building it right is the most efficient way to build, with the overall lowest amount of effort required. It doesn't mean gold-plating, rather the opposite: keeping things light, elegant and minimal. Anything that can't be built cleanly isn't built at all (yet), or at the very worst it is mocked with a fixed data implementation, which suffices for demos but not for shipping.
However, the problem with building it right is that you have to learn all the wrong ways to build before you learn how to stick to the narrow path of clean code. I wish we could do brain dumps from old hands to young wolves and not see the same mistakes repeated by every generation of programmers.
Here's a anecdote of how a typical webapp goes to shit with the best intentions from everyone.
"building it right from the start" sounds great. I've never worked on a project where people did not honestly try to do it this way. Lets say that for once you have a clear set of unambiguous requirements (a rarity) when starting in. I like to start with the database first. I'll design a nice normalized database and start building the website from there. If the requirements are fleshed out enough you'll already have a map of what pages ought to do what.
Once you get to the point where you can have a customer try out a few pages you get some "feedback" which is actually a change in the requirements. You can't hit them with a change request for every little thing at first so you comply. Maybe it's just a small change turning a one to many into a many to many relation for example, or moving a few fields from one table to another. Soon you have 30 or so similar "small tweaks" even if you hit them with a change request every time, they don't all come in at once giving you a good opportunity to re-evaluate the schema/application as a whole, it's always viewed as a singular "small" change so re-factoring the whole thing isn't really an option. The more changes that come through the more your elegant implementation becomes a series of hacks.
Soon they may start complaining asking why their "small" changes are taking so long to fix. This adds further budget pressure.
The problem is, it's not very clear to people down in the weeds when implementing the small changes if it's a hack or not. I think this is something that takes experience. So even when you have cleanup be a blocking requirement, taken one at a time it might seem fine and no large refactor is obvious until it becomes too large of a task to lump in a typical change request.
This is an especially easy trap to fall into for people who have been toiling away at the current design. They will have a blind spot/affinity for keeping as much of their implementation as possible and resist an overhaul.
If you have an iron will and a steely ice cold 1000 yard stare from 10 years of war in enterprise app development it's easier for us to kill code because we don't get attached to it as easily.
Be careful not to fall into the dark side though, (the flip side of those 10 years can instil a cold iron heart who stares right through the hacks and stops trying to argue with the client about doing things properly. In this scenario you just do as you are told and stop caring about the overall state of well-being of the codebase.)
The real art to web dev consulting is being able to build a flexible enough design to withstand these sorts of changes and still remain elegant and well thought out. It also takes the experience to know the difference between a hack and a proper change and the will power to do the right thing. It's really hard to do and it a good "gut" feeling for how a particular thing will actually be used, what a customer really means when they say something etc. I'm not sure that it's something that can be taught. It leads right back to your brain dump wish.
> The real art to web dev consulting is being able to build a flexible enough design to withstand these sorts of changes and still remain elegant and well thought out.
This is where it gets tough. I happen to have http://www.colorforth.com/POL.htm [1] open, a portion of which illustrates the other side of this fence (emphasis mine):
> Do not put code in your program that * might * be used. Do not leave hooks on which you can hang extensions. The things you might want to do are infinite; that means that each one has 0 probability of realization. If you need an extension later, you can code it later - and probably do a better job than if you did it now. And if someone else adds the extension, will they notice the hooks you left? Will you document that aspect of your program?
Obviously all of this must be taken in moderation, of course. But it hints at the problem I'm trying to describe: the design needs to be flexible, but scoped to the problem space you're working within. Bamboo is flexible, and usefully so; try and generalize that flexibility to the fullest extent and you'll end up with goo which will expand to fill the problem space entirely and then subsequently accomplish nothing.
So, it's about figuring out where flexibility will be most critical (eg, for a part of the architecture that will necessarily be "locked in" with a lot of other components or mental models depending on it) and where you should be able to get away with rigidity, either because it'll be easy to redesign (few dependencies / easy to conceptualize and subsequently retool) or because even Revision #31781 won't need you to change that bit.
...I think I just found myself at the brain-dump problem too. Haha
I'm also curious... I've known about the "kill code" thing for a while. I haven't really gotten into coding especially seriously as yet (lots of analysis, but nil implementation), and it's hard to contemplate the idea of doing this, even for situations where I know I definitely need to, like iterating on properly implementing something I've never explored before. How does one acquire the "iron will" mindset you talked about?
(I ask this as I've recently discovered my lack of get-up-and-go is due to a wonky/underpowered thyroid (giving me chronically low motivation); now that I know what's wrong I'm working on fixing it, but I suspect the anxiety that's developed over time will not go away without some mental effort as well. You could say I'm a bit of a worst-case scenario when it comes to juggling multi-domain mental task sequences generally speaking.)
[1]: Search http://www.colorforth.com/blog.htm for "Problem-Oriented" to find the hyperlink to the mentioned URL. There are many interesting links to other places on this page, which I don't think are mentioned anywhere else on the site. (Translation: there's no sitemap, category system, or menu; you have to dig.)
>How does one acquire the "iron will" mindset you talked about?
First of all, you need to be comfortable using source control to the point where you can kill a bunch of code then find it and merge it back in months and hundreds of revisions later.
Experience and education will give you the ability to recognize when you have bloat.
You may need lots of bad experiences and a memory long enough to remember the pain of dealing with bloat and cruft.
The will power portion is what makes you do the right thing when you do recognize it, if you have a strong will you won't need to learn the hard way repeatedly. Coffee helps with this, if you feel yourself slacking, drink more coffee drink stronger coffee. If you still can't do it, look into getting an ADHD perscription (i've got one).
The point and purpose of commercial programming is profit, not beautiful code.
Yes, I know, tech debt is a thing and those that don't pay attention to the engineers will soon find themselves with a shitty code base that they can't sell.
It can be really, really hard to persuade a management board that you need to take a bunch of their expensive techs to not write new features that will improve saleability, but instead rewrite the code-base (that they're still depreciating), for no net gain (and considerable risk) except to maybe reduce the support overhead and make future developments easier and cheaper.
>> Once a program has been in production for a while and its use cases are more well defined it becomes easier for a single person to swoop in and rewrite it in a far more conscice way.
>I think it's always easier for the guy/gal who wrote something to clean it up than for the next guy/gal, who doesn't know the code or its history as well.
Once a piece of functionality gets established in production, over time the knowledge of the original requirements and the implementation is lost. The consequence is that nobody knows why the code does what it does (and usually it does something important) and due to it working fine in production there is strong pushback to making changes.
If there is one thing I really hate it's deploying new code that has cruft/legacy shit already baked in. It just makes me feel awful. But sometimes, after so many months (or years, in at least one of my cases) on a project, I just say fuck it, and ship it (be it my code or a team member's).
I definitely agree with your ethos though. I think there's a lot to be said for having some buffer between functionally complete code and a hard ship date so that you can leave things in a state that don't make you or someone else feel bad about things later.
>Hindsight is 20-20. The iterative bloat during development can happen for so many reasons.
Indeed, I once had the pleasure of replacing a BizTalk server with 50 lines of C#. It wasn't that the BizTalk server wasn't a bad idea. The project was for a big harbour, which have documents, cargo manifests, import papers and a ton of other stuff flying back and forth. The idea of having a central hub for data exchange and data conversion wasn't actually that fare fetched. It's just that the only part that was ever implemented was a the upload of a small 5-10 line xml file to the maritime administration, regarding the departure of ships.
After three years of running a BizTalk server, for that one purpose, and with no-one fully understanding Biztalk, we just replaced it with a custom Windows service.
If anything the takeaway is that you projects should be revisited once in a while for clean up.
This is a perfect example of what I'm talking about. People can get caught up in the process including all sorts of possible use cases and lose sight of what really matters.
After the dust settles you can see what it's actually used for. And sure, its not like the other use cases were bad or wrong, its just not how it ended up.
It's possible that the other use cases never got traction because of some longest yard effort required on the client's end to implement a workflow. But even that low of a bar ended up being too high.
Rewriting something like C++ into a higher level language with a heavier runtime wouldn't necessarily be reducing code bloat in the author's eyes. He's talking about "bloat" in the sense of the whole stack, not just how many lines you write on top of the stack.
In my experience there are two main causes of code bloat.
One is that people use abstractions that are more verbose than necessary. The fact this is bad is pretty uncontroversial. Though very few people go out of their way to write bad code, it's just that everyone's view on the right amount of abstraction is different. What's fine to you might look like gibberish to me. I've radically simplified code by converting complex regexes to multiple steps, for example.
The other cause of bloat in mature code is the inevitable bug fixes, feature requests, and edge cases that accumulate over time. Having the attitude that a system is unnecessary complex is dangerous because you don't know which of these edge cases could doom your rewrite.
Even if you do manage to ship a simplified version of the system it's incredibly difficult to maintain the pristine state of the system. Unless you devote all your time to playing code police then cruft will creep back in when others touch the code. This is where having more abstractions can actually help, because good abstractions limit the impact a bad piece of code can have.
I've also seen code bloat due to insufficient abstraction: the programmer fails to realize that a lot of things in the code are instances of the same thing that can be factored out into an abstraction.
'The Unix way' would be an example of too little abstraction.
Sure it's great to have lots of little single purpose applications that can be mixed/matched in a variety of different ways.
But...
If everything in a system is built based on that premise, eventually the OS just becomes a lousy abstraction for a global namespace.
I think it's important to strike a middle ground. Build tools that work well for solving a 'class' of problems, then include public interfaces so they can be re-used for the higher level composition of systems.
I tend to find that shorter, putatively simpler bash scripts are often shorter because they don't do adequate (or, at times, any) checking of error conditions. It is deceptively easy to write a terse shell script when you assume that every command, or pipeline of commands, will work exactly as you expect every time.
In practice, even simple operations like "rm", or "mkdir", or even "cd" can fail in new and exciting ways. The blanket use of "errexit" and "pipefail" is really only the bare minimum of error handling; scripts that simply _stop_ as soon as a command exits non-zero are not generally an operable product that somebody unfamiliar with the innards of the system can be expected to work with.
It is also unfortunately easy to miss out on putting quotes around expansions, or to get caught out by the somewhat peculiar behaviour of variable scoping in functions. Some improved constructs are available in more recent versions of bash, but if the goal is to be _portable_ (without also shipping your target bash binary) you may not be able to use some or any of them.
The bash "language" (such as it is) is antiquated and clunky, because it seeks to be backwards compatible with the rich tapestry of history that is the Bourne shell. A truly robust, operable program written in this environment often does not come out a whole lot shorter (if it is shorter at all) than a program written in another language with better facilities.
My experience mirrors yours. I'd like to also stress that it's not just 30:1 ratios that are worthwhile.
Most recently, I just rewrote a decently written ~8 kloc C program in to a ~4 kloc Go program that does 5 times as much, 10 times faster. If I had more time, I could probably shrink the new Go program to about half its size. This new program includes the old, as well as a new simplified interface, both HTTP.
Moving clients to the new interface has had cascading C#/Scala code reduction of ~1 kloc each. This is the most important point! Simplicity breads more simplicity.
Exactly. I have recently had a few cases where I've rewritten bloated overly-complex js into 10% to 20% of the LOC.
The big benefit is the drastic complexity reduction. Suddenly the junior developers look like superstars because they can understand it and their productivity soars.
I'm continually surprised at how otherwise smart engineers end up writing boated spaghetti that is 'correct' at the micro level but ridiculous at the macro level. A few poorly chosen abstractions can increase the cost of development by 10x easily.
Not to argue that code bloat is a problem, but the concept of writing things in any dialect of Forth makes me cringe.
I played with Forth. Sure you can do complex things without parenthesis, but if you squeeze something complicated into a single line equation, it will be harder for a human to follow than if you use parenthesis.
The original article also says "It's never clear how efficiently source will be translated into machine language," which is complete garbage: Most modern processors were designed precisely to optimize for the kinds of control and data structures that C creates, and C (though NOT C++) has close to a one-to-one mapping from code you write to assembly language output.
A key exception is those infix equations, which C can reorder in ways that are more optimal than a naive developer writing Forth would create, so this hardly seems like a recommendation for Forth.
This is all orthogonal to whether bloat exists or is a bad thing. I think that bloat exists and is a bad thing primarily because developers are some combination of lazy, unskilled, or pressured to complete features on constant deadline. And honestly, in most cases it's probably all three, though I would guess anyone reading HN is mostly afflicted with the last.
> it will be harder for a human to follow than if you use parenthesis
Not any harder than following Master Yoda speech.
OTOH, following something (say, an English text) (which is heavily parenthesised) may be (I'm sure there is a lot of references to back this notion) exceptionally hard (even if there is just one nested level (and more levels certainly add more confusion)). Choosing flat over structured often makes a lot sense.
Most people find uppercase text harder to read than lowercase. Research has shown this to be the case because people are more familiar with lowercase (people more exposed to more uppercase will eventually find it easier to read).
Because of this effect you cannot trust your own sense of what is inherently easier to read, it will be biased by experience.
>you cannot trust your own sense of what is inherently easier to read, it will be biased by experience.
Funny you should say that, when basically every person in the world going through a basic mathematics curriculum will be experienced with infix notation, and completely unfamiliar with RPN.
True, maybe you for some reason have more experience with RPN. So use Forth.
But there is a clearly superior notation when you're talking about "which one will most people understand." HP calculators notwithstanding.
I find it amusing that arithmetics is always invoked in any discussion about nested expressions syntax, despite being largely unimportant in most of the typical code scenarios.
Honestly, how often do you need any deeply nested arithmetic expressions in your code? I very rarely use anything beyond an increment by one. Any time you are invoking nested logic you are already summoning a complexity problem. Luckily, it can be avoided in most of the real world cases.
How often? Quite. "Take these two points and extrapolate." "Solve this quadratic equation."
Whether you need more complicated arithmetic expressions depends entirely on what field you are working in. I would not call `8+6-(4+5)*7` complicated by any means.
If 10% is your threshold of "often", then no. Arithmetic expressions do show up often enough in code that I work on that they should be immediately readable.
Nobody is stopping you from keeping them as readable as you like. The question here is in readability of the remaining 90% of the code, and this is exactly where nested expressions are evil and must be flattened.
In literary English, parenthetical clauses are almost always delimited using commas, and the literate reader has no problems because she's spent years practicing reading that style. She'd have no problem reading your example if she'd spent years reading that style instead.
I think your, convoluted, example would be better if you hadn't merely added in parentheses randomly. Parenthetical and asides are sub-statements, and, typically, can be omitted without losing the overall meaning of the expression.
Take my above asides ", convoluted," and ", typically,". Neither are necessary for understanding the sentence, but neither make the sentence exceptionally difficult to understand for a literate reader.
The method in which parentheses (and other structuring notations) are used is more akin to grouping from algebra and mathematics.
EDIT:
Original:
OTOH, following something (say, an English text) (which is heavily parenthesised) may be (I'm sure there is a lot of references to back this notion) exceptionally hard (even if there is just one nested level (and more levels certainly add more confusion)). Choosing flat over structured often makes a lot sense.
But it doesn't make sense when we remove the parenthetical statements:
OTOH, following something may be exceptionally hard.
Choosing flat over structured often makes a lot [of]
sense.
(one edit for clarity). That first sentence has lost all meaning. Following what? Following a car on foot can, indeed, be exceptionally hard. But following a slug is often exceptionally easy.
A more reasonable version of your sentence:
OTOH, following something--say, an English text--which
is heavily parenthesised may be, I'm sure there is a
lot of references to back this notion, exceptionally
hard (even if there is just one nested level (and more
levels certainly add more confusion)). Choosing flat
over structured often makes a lot sense.
Now the remaining asides (demarcated using the various English notations of em-dash, comma or parenthesis) can be removed without losing the general sense of your sentence.
I admit my hastily written example is clumsy, but the general idea still holds in your rendering of it.
Nesting with commas is just as confusing as nesting with parenthesis. It forces your reader to operate with more than one contexts at a time, and this is exactly what pushes up the complexity level.
Even in Lisp, I realised that I always tend to build flat DSLs instead of the nested, using the non-parenthesised punctuation wherever possible. And I am not alone, take a look at the LOOP macro for example.
> 6 years ago, I rewrote 30 000 lines of C++ code in 1000 lines of Ruby.
(...)
> Code bloat is indeed a recurrent problem.
But in your example, you're still depending on thousands (millions?) of lines of BIOS, OS, library, runtime code.
It's so many orders of magnitude different than the point Moore is making -- and also, more subtly different: it's not like Forth system don't have library/OS code -- but eg. fitting a rich editor/IDE and OS on a floppy is quite different from eg: Ruby. It's a completely different mindset.
Even if something like Ruby does a lot of things out of the box, if you were to strip it down to a subset that only allowed to be used as a caluclator repl, it'd still be quite complex compared to a Forth system (just think about all the code need to handle keyboard input, and talking to the monitor over VGA/HDMI etc).
None of that invalidates your point, that code bloat is a problem, I just think it highlights that there are different kinds of bloat.
Somewhere between colorForth and developing in in a "traditional" language with a "traditional" IDE on a "traditional" OS, we find things like Project Oberon and Smalltalk-80. Or VPRI's recent effort to create a full drivers-through-gui-up-to-and-including-applications system in 50kcloc total, making aggressive use of DSLs to manage the complexity.
That's not runtime though. Apart from some functions which may be delivered and inlined by GCC (it's not necessary however), the result does not require GCC code.
Most of this is probably due to better libraries and forfeiting the static type system. There's very little that Ruby can offer that imho should allow a program to become shorter than the equivalent C++ by more than, say, a factor of 2.
Bash and Perl might be the only exceptions to this due to their incredibly terse syntax for a certain subset of tasks.
Better libraries and loosing static boilerplate is 90% of what dynamic languages offer. Kind of like saying other than the higher horsepower and aerodynamics Porshe Boxter has very little in the way of speed over a Ford Taurus.
Umm...Ruby inherited most of Perl's terse syntax, or at least it did originally? I haven't followed recent versions of Ruby very closely. (And by "recent" I mean post-Rails.)
The Java code itself probably took less than a few days to write. Java/Spring/Hibernate have no licensing fees (in most cases) and is an industry-standard stack.
I'm guessing the original author didn't write it in Bash because either (1) they didn't know Bash, (2) they didn't know if/when requirements would expand beyond what Bash can easily do, (3) they wanted platform independence, or (4) their team wrote everything else on the Java/Spring/Hibernate stack.
You can argue the merits of all those reasons, but I don't see bloat being profitable as a factor.
The difference in lines and executable size is likely due to an apples-to-oranges situation. Bash scripts seem really light weight if you're not counting all of their system dependencies and if you're writing a program that Bash excels at (such as moving data around).
J2EE would imply Oracle. But there are no fees implied. Additionally, most of the Java libraries have a linking exception to prevent any GPL licensing from affecting other components. Granted, there could be fees if you're redistributing Java or embedding it on mobile hardware. It's all like this because that's how Sun did it for many years and Oracle can't/won't upset that momentum.
Proprietary vendors have had a large part in writing these open standards, but they're mostly interested in creating a standardized, free commons.
GP's story is not about programmers at a hardware company, and it's probably not directly applicable to colorForth.
When they say "wrote in Bash" more than likely they used Bash to script together existing unix tools, than used Bash to do some algorithm that couldn't be done as code-efficiently in java.
You write bloated code for job security? Shame on you.
Typically bloated code comes from legacy and years of organic growth. I'd imagine almost any 30k LOC project in any language that has been developed more than just a few years could be rewritten in a small fraction of the original size, now that you actually know what the end result must do, a luxury you don't have on the first go.
>>You write bloated code for job security? Shame on you.
Try to give him the benefit of the doubt. There's no indication that that's what he meant. Instead, he probably meant that one possible reason people write bloated code -- or at the very least, do not attempt to refactor it after writing it -- is that it provides them with job security. Which is a fair point.
>But I'm game. Give me a problem with 1,000,000 lines of C. But don't expect me to read the C, I couldn't. And don't think I'll have to write 10,000 lines of Forth. Just give me the specs of the problem, and documentation of the interface.
I could be wrong, but I'd wager that a lot of projects that are 1,000,000 lines of C don't have well defined interfaces or documentation.
However, if the author wants to give it a shot, they could try replacing v8 with pure forth. That's a fairly well defined problem, they just have to maintain compatibility with the exposed v8 api.
I'm not sure the stats on this one, but it seems to be that something with 1,000,000 loc would have to have decent documentation, or it would be totally unmaintainable and wouldn't grow to that size without dying a horrible death.
You would think that, but it can grow without understanding by adding code at the edges which expands then starts the cycle over. A code archeologist[1] would see a central island with bridges to developed islands with more bridges and various ships running between interfaces. Its particularly fun with multiple subsystem teams. Each one builds its own island of code and the bridges are scary.
Vernor Vinge's novel A Deepness in the Sky has "programmer archaeologists" working on their starship fleets that have thousands of years worth of software going back to "the 0-second of one of Humankind's first operating systems": https://en.wikipedia.org/wiki/A_Deepness_in_the_Sky#Interste...
I have been just such a consultant/employee doing this, even started calling it software archaeology some time in the '90s. It's a very painful way of earning money, but there's a whole lot of "poorly documented or undocumented legacy software implementations" out there. Generally effectively undocumented, for many projects start out with some documentation, don't keep it up to date, and by the time you get your hands on the mess it all, including comments, are more a statement of intention at one point in time than something directly and immediately useful.
But.. strangely fun, in a twisted way?
I love a blank canvas or a well-written codebase as much as the next guy, but I find that fixing "legacy"[1] codebases can be pretty enjoyable.
Slowly figuring out and fixing a "legacy" codebase, to me, is basically solving a giant, complicated puzzle that was left for you by your predecessors.
Granted, you're mired in evil gunk from the past, but every time you figure out a small piece of the garbage and refactor it into something nice, you get to feel awesome. Of course, this assumes that you've convinced management that the codebase must be tamed[2] and that this will take time; otherwise you're just fighting with your hands tied.
Then again, maybe I'm just a masochist.
[1] "Legacy", because people don't like it when you call it "evil gunk from the past that must be destroyed".
[2] And it must indeed be tamed, because otherwise it will just grow more and more evil until development grinds to a halt. If management understands this, they'd be crazy to choose the "let the beast grow" option.
It certainly has its moments. Hell, just looking through the legacy code and seeing what other people did can be entertaining in a 'holy shit would you look at that!' way. I will never forget when I was working on a codebase in which the author apparently did not know how to loop or how to hold user interface controls in an array or something. There were 10 textboxes on the screen. Instead of looping through the textboxes and calling a stored procedure with the content of each one (it was PL/SQL inside an Oracle Form... a dead structure never meant to be used to create a full application which is what they'd done with it), they first handled the case where textbox 1 had content, but the others did not, with one procedure call. In the else, it handled the situation that textboxes 1 and 2 has content but the others did not with 2 procedure calls.... and on and on and on for all 10 textboxes. Pages and pages of nearly identical code. I would have expected any self-respecting programmer to either have said 'there HAS to be a better way' and stubbornly search for such a way, refusing to proceed along the terrible path I saw before me, or else leave the profession entirely. But someone somewhere went through and constructed the whole repugnant edifice...
I would think that anyone taking on maintenance of a code base that's been under active development for more than 5 years and no longer employs the original developers will necessarily be practicing code archeology.
The book title made me think of why I still wake up at night from reliving a conversation I had with a developer when I was doing an interface, who was so proud I couldn't recalculate the data in a stored procedure that he could do in [image based language] and how he didn't think it was a problem that he didn't store his results thus denying and interface to another system that would cut the customer's check. It almost turned into a real crime scene.
That book looks quite cool, might have to pick it up. Thanks for the link.
Never underestimate what sorts of systems are possible to create and force to continue running by current organization management practices. I personally worked on an extremely (and unnecessarily) large system that had no documentation. When I started, and asked for something that at least explained what the different end executables did, I found that no such document existed. When looking at a problem report which said "system X has problem Y", finding out what code was responsible for system X required talking to large numbers of different team members until you could find someone who knew where system X lived.
This system survived because no one in management could bear to pull the bandaid on doing significant rewrites and instead preferred to pay much larger amounts spread over longer time periods to continue accumulating technical debt. It basically made it take 6 months to a year before a new hire could be useful, but once someone was familiar with some corner of the system they mostly just stayed there. Keep in mind that most software engineers don't approach their job with passion. Most of the people I worked with didn't even want to touch a computer when they got home. They didn't spend their weekends contributing to open source projects, or keeping up with the latest techniques. They kept their head down and collected their paycheck. That is what the majority of companies actually want, and it's what they get.
That system has since (after a billion-dollar-plus contract to create a replacement system) moved to a 'devops' atmosphere. No automated testing or continuous integration, no, that would threaten the large test team. DevOps to them means what I suspect it means at many non-startup places. It means developers developing on the production environment. It means 10x as many problems cropping up, but management is thrilled because problems are fixed in hours instead of weeks. Development no longer consists of understanding anything, or designing anything, it's just an endless series of adrenaline fueled hasty patches while everyone keeps their fingers crossed that the data doesn't get so corrupted that everything grinds to a halt.
The most important idea in Forth is factoring. Breaking complex tasks into small pieces. In an ideal Forth codebase, most definitions are about a line long, contain at most one control structure and reference no named variables.
Forth's stack-oriented semantics make function calls very cheap- a function call and return is literally two instructions, whereas languages which use activation records will turn a function call into dozens of instructions- copying arguments around, backing up registers, etc. Since it's computationally inexpensive, you don't need to sweat over whether your optimizing compiler will manage to inline a function.
Short, simple and mostly pure functions are easy to test in isolation- there aren't deeply nested code paths to exercise.
Lots of short functions allow your codebase to reuse more of those functions- any redundancy can be collapsed together, both shrinking the codebase and improving test coverage. This effect is magnified in more complex programs.
You can do this sort of thing in other languages, but forth's syntax and semantics for function definition and invocation are both very lightweight, and this makes it much easier to apply.
It sounds like what Clean Code[1] really wanted. I still think it's a great shame that all his examples are in Java (although, in fairness, they are very pseudocode-ish).
I think that the author of this article is mistaking LOC for readability. Despite colorForth being wonderfully concise and independent of any operating system, that doesn't make it convenient or readable for future programmers. Just because something is a number of lines longer than another something, doesn't mean that it's less readable.
Take Perl, for example. I often write Perl scripts, and they're very few lines, but nobody can read them, which is really what matters. So, I'll expand it to be more C-like (calling functions with parentheses, using less implied arguments and scalar/vector contexts, etc.), thus making it very easily readable by people that don't even know Perl. I regard this revised version much better than the more Perl-like and obfuscated (although shorter!) version.
I do think readability is key, and found the article intriguing so I took a very brief look at some colorForth sample code: http://colorforth.com/ide.html . It may be that I'm just extremely new to a very foreign programming language, but I did not understand it at all.
I thought I was reading a satire article. That was the first source I found when looking through the site. I still wasn't sure if it was satire or not.
From reading k/q code (or even C code written like the J Incunabulum http://www.jsoftware.com/jwiki/Essays/Incunabulum), I generally find that it may take me 10x as long to fully understand it due to density and my lack of proficiency, but that still makes me an order of magnitude more efficient than reading (and scrolling!) through the 100x longer mainstream code, often spread out over multiple files and directories. Better yet, referring back to a piece of code I have already read is even faster - I am not hunting or scrolling for anything.
I recently experimented with writing a web application in q while limiting myself to a single screenful of code - the efficiency is amazing! I am still trying to figure out what factors contribute, but it reminds of solving math problems. Write down what you know and stare at it - no distractions.
>Code is scattered in a vast heirarchy of files. You can't find a definition unless you already know where it is.
Not if you're using a proper IDE (or the right emacs/vim plugins, which are effectively an IDE).
>Code is indented to indicate nesting. As code is edited and processed, this cue is often lost or incorrect.
Not if you're using a proper IDE configured for the project's chosen style. My engineering org also has linters in pre-commit hooks and automatically commenting on code reviews.
>Sometimes a line of code contains only a parenthesis, or semicolon. This reduces the density of the code, and the difficulty of reading it.
Depends on the style guide you choose. Also, some people find this more readable.
>There's no documentation. Except for the ubiquitous comments. These interrupt the code, further reducing density, but rarely conveying useful insight.
The only documentation I've ever found useful (aside from comments) was automatically generated from comments, explaining the signatures of methods and what they do.
Not commenting on any language in particular, but you can make any language's issues easier with a combination of a better IDE, commit hooks, linters etc. However, it would require less effort if the language itself enforced these constraints so everyone using it is writing code the same way.
I think you meant mixed up. It certainly does seem, though, that thinking your weak and static type system is strongly typed can lead to errors that could kill a program, and, depending on what the program does, harm people and other systems.
Well... yes and no. Some values are multiple machine words, pushed onto the stack.
Some of these (such as double-words) have language-defined representations. (They're always stored with the low word at the lower address, regardless of your machine endianness or, as far as I can tell, whether your stack is growing up or down. Which means that on some platforms the low word is pushed first, and on others the high word first. This shouldn't matter, because you should be using the dword words to manipulate them, except that it's common practice to assemble and disassemble dwords manually.)
Some have implementation-defined implementation-defined representations. For example, the various bits of state that the control flow words push onto the return stack. You don't know how big any of these are, so accessing the return stack from inside loops is basically impossible.
And some are just weird. Floating-point numbers live on their own stack. Except not necessarily; the implementation is allowed to store them on the data stack. So you can't use NIP TUCK OVER PICK etc because you don't know the layout of your stack.
Basically, at every point, you have to know the type of the values on your stack so you can pick the right word (DROP vs 2DROP vs FDROP) to operate on the stack... and if you get it wrong, you get stack corruption and horrible, hard-to-debug crashes.
tl;dr: Forth is typed. Forth is very typed. Forth just doesn't check types.
That's one definition of "typed", but not a very useful one. By that definition _everything_ is typed. That's an interesting metaphysics discussion worth having, but not relevant to explaining Forth to a C programmer.
Vanilla Forth values do not have runtime type information (such as class pointers or discriminator tags) nor does a traditional forth compiler have compile time type information (such as an abstracted understanding of the state of the stack at any given point of execution).
Meanwhile, Factor, a modern stack language has both: SmallTalk-style object-oriented class pointers on every value, and a compile-time stack-effect checker. Factor values have a Lisp-like dynamic typing discipline. That is to say "uni-typed" as "dynamic", or a discriminated union.
Wouldn't that make it terrible for working on problems which deal with data which maps to machine words very poorly (ie practically every problem non-OS developers face)?
I remember Forth only from the time I was looking to replace `dc` with a more powerful calculator. I figured Forth could fit the bill because it was a stack language.
Whoops we didn't get along immediately. I realized that Forth was not meant to be my calculator, if anything I could have made it my calculator.
When you look at code you wrote in the past, whether it is 1000 lines, 3000 or 100,000 you could always think of a way to remove the redundancies and such and turn it into a smaller number of lines.
But the first time you write it, the first time you are solving the problem, it is much harder to focus on that part.
This is not to say that it's ok to have unmaintainable million lines of code application, but more that it is not something we willingly do to make ourselves look more important in a company.
Proclaiming that the syntax only causes problems stands in complete defiance of most people's preferences in programming syntax.
We like syntax errors because they mold thoughts more precisely. Going against that is also going against a lot of "productivity enhancers" - we build them because we expect to have a lot of code, and we want to reduce the kinds of errors that it may contain. That doesn't mean that we should abuse them to write as much code as we can, but it does turn out that way in practice.
So in an odd way Moore and co are right - we would be more free if we also constrained ourselves more on this point.
But the broader thought is that you can apply such constraints at any time, in tandem with what you already like. And that is more likely to produce an innovation than "just Forth".
I think that needless compiler errors is an interesting point that isn't discussed often, so that at least is an interesting point.
We have those productivity enhancers because the language makes it easy to make a mistake there. Ideally we would want to eliminate those mistakes without the syntactic headache.
Although obviously how to do that is a complex exercise, I do think that saying "code bloat is bad" even against "bugs are worse" can provide interesting insights into what we really want from languages.
It's discussed fairly often with static analyzers. Static analysis is a powerful tool, but false positives erode user confidence in the tool.
I've worked with code bases where thousands of warnings (and even several errors) were left in place, which meant that you had to basically ignore all diagnostic messages every time you compiled. At that point, you might as well be 2>/dev/null. Someone even wrote a wrapper script that would diff the stderr against the "known good" stderr.
My general impression of that project was that it was dying a slow death unless someone went in and restored developer confidence in the build process, and the most likely person was me :-/
If some C programmers have that scoff at higher level programmers attitude, Chuck Moore is the equivalent but addressed at C programmers (and higher, I guess).
I have a feeling he's not the kind of person a C programmer could say things like "C is high-level (obvious one-to-one mapping) assembly" and "C is (blanket) efficient" around, and get away with it.
Chuck Moore came out and spoke to us at work not long ago, and I have to admit I had a total fanboy moment. There are very few people in the world with as clear a view of how to make an entire system, hardware and software, directly reflect the problem to be solved with no cruft. There's alot to learn just from the way he decomposes a problem.
As things have been developing, first with virtualization becoming widespread, and then with things like Docker continuing the trend of packing things off into 'separated' containers, I have wondered at what point we will get to the place where a tool is developed which provides a highly modular kernel along with the tools to quickly spin up a whole system which has exactly and only the resources a particular subsystem needs. And, of course, a basic system that spins these things up when needed. Not just microservices, but atto-scale services. It sounds like this is the sort of thing Forth promoters could help us realize.
1% the code, 0.01% the reader comprehension. I'll stick with a good ALGOL or LISP language anyday. Especially with macros plus support for real HW rather than 18-bit etc.
I agree with how verbose some of these languages are, but this is a bogus argument. You can shorten almost any design on a rewrite, regardless of language. Not to mention the "density" metric is ridiculous. Minified javascript is certainly more dense than regular javascript, but nobody would argue it's simpler.
I just started to learn the Factor programming language. I think of it as a modern implementation of a stack based language like Forth (similar what Clojure is for Lisp).
I admit, the first step were a bit brain-twisting, but after getting hands dirty, programming feels relieving. Like a rain shower on a steamy hot day. No worrying about syntax and punctation, elegantly shuffling the stack is fun. For me a perfect demonstration of KISS.
As someone who's still doing a lot of Perl: Can't we just ease up on the line count fetish? Some languages are really bloated, but I'd still say that more often than not, the sheer amount of lines isn't the most major factor of a program's cognitive load.
You have got to be kidding me. I mean come on, you're joking, right? If you gave me some qualification, maybe. I know that FORTH is more compact than C, and the code decrease in size might be reasonable in some cases, but the reasons you give are insane! come on!
Elaborate Syntax: Your syntax is simple, C's is less simple. If you want elaborate, look at PERL or Haskell.
Redundancy and Confusing Types: Fair enough, but the types have to be the ones they are because C's supposed to be close to the metal. Therefore, you need specific bit sized types. It could be better than it is, though.
Strong typing causes errors: Um. Take that up with the angry Haskellers lining up behind you. I'll just duck off this way... Honestly, C's type system is pretty awful, but that's not because it's strongly typed.
Infix:...Yeah, pretty much, but the mainstream won't except anything else.
Parens: Tell that to the lispers. Parens shine for syntax parsing, and the incredibly common editor support makes it even nicer than Python for editing.
Unclear how source will be translated: Well, I guess, but more so than anything higher up the stack, like, just to pick a random example, most modern FORTH implementations.
Subroutine calls are expensive: Compared to what? 1 mov or push instruction for for each arg, and than a call instruction, at least on x86. Most other processor architectures seem to have similar things going. Those are all pretty fast.
Elaborate Compiler, Object Libraries: Yes, compared to FORTH, the compiler is elaborate, but that isn't why object libraries are distributed, seeing as there's a C compiler for every system under the sun. And distributing object libraries isn't exactly hard.
Lots and lots of files: Yeah, this sucks. But every other system is going to have the same problem, and a proper module system that associates functions to files is really the only way to fix it, aside from oddly specific introspection utilities. Most other languages, say, for example, FORTH, don't have this, AFAIK.
Indentation as an indication of nesting: And you suggest counting braces as an alternative? or just not having nesting? Because most languages, say, for instance, FORTH, have nesting.
No Docs: In BAD C code, this is true. As in bad code in any language, like, for instance, FORTH, as any language designer knows that the language cannot make bad programmers good.
Names are Hyphenated: "I think hyphens suck" isn't a problem with the language, it's just a personal preference.
Constants are named: With my passing familiarity with FORTH, I don't know what he's on about here. Could somebody explain, so I can see if I agree?
Preoccupation with Contingencies: Because everybody loves leaky abstractions that fail at critical moments for ill-defined reasons.
Conditional Compilation: This feature sucks. It really does. The problem is, I don't see what else they could have done. So my response here is, Do you have any better way to do cross platform compatibility? Because I would honestly love to see it. I'm serious.
Hooks: Ahhh, I see. You're from the beautiful-diamond camp of software design. I can recognize you guys by your catchy slogan, "Don't Design With The Future In Mind."
Programmers best interest to exaggerate complexity: No language can fix this. It's impossible, because this is a social pressure.
Portability: Yeah, basically.
Maybe I haven't seen enough C. Maybe I haven't seen enough FORTH. Maybe I just can't get into Chuck's mindset here. But his claims just seem insane.
EDIT: I read some more off of Chuck's pages, he seems to be thinking more in terms of embedded architectures and single purpose code. Fair enough, in those situations, some of his reasoning makes sense. But the presentation of this page still makes him look like he's off his rocker.
> Maybe I just can't get into Chuck's mindset here.
I think this is the crux of it. As far as I can tell, Moore's attitude is basically that "software architecture" in the conventional sense is mostly a lot of churn solving problems of its own creation instead of solving user/customer problems. Looking at it from another angle, the main function of operating systems, virtualization monitors, container frameworks, etc. is to let multiple applications share a single computer. Moore's approach is to make lots of computers so that you can easily afford to have multiple computers per application. [1]
>Conditional Compilation: This feature sucks. It really does. The problem is, I don't see what else they could have done. So my response here is, Do you have any better way to do cross platform compatibility? Because I would honestly love to see it. I'm serious.
A simple answer would be conditional compilation for entire functions not individual lines of code.
That actually sounds really neat. Kind of like compile time polymorphism, but based on platform. I like it. Although there are some things it doesn't solve, it seems to work for the most part.
A year ago, I rewrote 424 lines of Java+Spring+Hibernate in 18 lines of bash. This is less glorious, but if you compare the size of the deliverable, it's 39Mb for the J2EE webapp against… 772 bytes for the shell script.
At my current job, I maintain an app with 500,000 lines of code that could easily fit in 20,000.
Code bloat is indeed a recurrent problem.