Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I run into this a lot and I find it hard to justify to reviewers why I decided to keep seemingly similar logic/functions separate rather than combining them. I usually don't have anything better than a "gut feeling" or "professional experience" that although they might seem to be the same they're either actually very slightly conceptually different, or that I feel that they're likely to diverge in the future even if they might be similar now.

Junior devs rush usually rush to combine things like that together, especially because combining like code is such an knee jerk thing to point out in reviews, and they don't have the experience to push back against it (or to even see where to push back). In the future the combined code then gets ugly when requirements slightly shift and the code that was combined should no longer be combined, but it's never split back up, it's usually just made complex.



This is an old debate and yet one that is difficult to think through. I share your experience of having trouble convincing someone else not to combine things too hastily. Partly it's a "my gut feel is different to yours" situation, but I often don't have the confidence that I could articulate my reasoning without indulging in a wordy lecture on minutiae.

Lately I have been fixating on the following line of thinking: the unit of deduplication--usually a function, but sometimes even bigger--is the same thing as the unit of abstraction. When you dedupe, you've also given birth to a new abstraction, and those don't come for free. Now there's a new thing in the world that you had to give a name to, and that somebody else might come along and re-use as well, perhaps not in a context where you originally intended. The new thing is now bearing the load of different concerns, and without anyone intending it, it now connects those concerns. The cost of deduplication isn't just the work of the initial refactor; it's the risk that those future connections will break something or make your system harder to understand.

This reminds me of another famous Carmack pronouncement about the value of longer functions [1], which I think has some parallels here. In the same way we're taught to DRY up our code, we're taught to break up long functions. I sort of think of these two things as the same problem, because I view their costs as essentially the same: they risk proliferating new and imperfect abstractions where there weren't any before.

[1] http://number-none.com/blow/blog/programming/2014/09/26/carm...


I understand your sentiment, but this line:

    When you dedupe, you've also given birth to a new abstraction, and those don't come for free.
It feels like you can write the inverse with equivalent impact, e.g., code duplication doesn't come for free.


Sure, but the costs of code duplication are well known. We know it increases maintenance, and can sometimes lead to issues if you forget to update one or another of something, and so on.

So there can be an assumption that deduplicating removes costs, when it may, but it may also create further ones. That removing something can make some things more difficult, isn't intuitive for everyone.

Like all things programming, their is a balance, pros and cons, of each approach. Knowing when to use which approach, that's part of the profession, and everybody can get it wrong sometimes. And the environment can change, and the choice may become invalidated - and then you get stuck with the hard choice of changing abstraction, or keeping the same. And that's a hard choice, as well.

Nothing in coding comes for free, but sometimes it can look like it does.


What's the cost? A slightly bigger binary & codebase? It seems like it's close to free to me. Am I missing a cost? Or are these costs bigger than I'm assigning them?


> What's the cost?

The cost is exactly what is pointed in the original tweet:

> a requirement to keep two separate things aligned through future changes is an “invisible constraint” that is quite likely to cause problems eventually

Code changes, and if those two identical or similar pieces of code are likely to change together, now whenever you change one you have the cognitive load to also change the other, or risk having them go out of sync.

Of course, when the two pieces of similar code aren't likely to be changed together, they should be kept separated


For sure. Most of the time when I copy-paste code from one place to another, a change in one place doesn't imply a change in the other. I certainly have seen that happen though.


I dunno, call me skeptical about Carmack's text.

I agree very much with using pure functions wherever you can. It fact, I would argue writing a pure function should be the default approach. (See the Function Core, Imperative Shell talk on DestroyAllSoftware.com.) Let the compiler handle the inlining and memory optimizations and use const everywhere.

OTOH, Carmack doesn't even consider testing. Breaking up your code into multiple functions facilitates that a lot. On top of that, if your functions are pure, it is even easier to test them.

He also doesn't consider the cost of reading & maintaining a piece of code that lives somewhere inside a big (multi-page) function. You have to keep track of all that function-global state. Side effects sprinkled all over the function are common. Ugghh.

> Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done.

This, too, sounds a bit ridiculous today. Languages usually have access modifiers (public/private/…) or conventions to declare something "internal" to the class or module (e.g. `__foo` in Python). On top of that, you can always use something like ArchUnit to enforce your architectural rules and prevent usage of function X in module Y.

Yes, correctly cutting your modules & scopes is never easy. But this is simply at the heart of the game of software development.

The arguments regarding latency & performance are certainly valid but it feels like a very, very specific case he discusses. It's difficult to generalize the conclusions he draws from it.


This is a good insight. One quibble I’d make is to point to the “into a loop or function” in Carmack’s post. When you’re consolidating some repeated code into a loop, the weight of the abstraction is lower than a function. Also, the problem of “future maintainer pulls the abstraction out of necessary context” is less likely.


Code reviews on refactors are so frustrating because you know what the next PR is going to look like, and it's going to change half the things people are complaining about in this one.

You could literally be a day away from making those two code paths very different.


There are some rule of thumbs that I've come across in the literature and from personal experience on how one can tell when A and B are actually different and only superficially similar.

- If a hypothetical competitor implements A differently, will they by necessity also implement B differently in the same way? If not, it's likely A and B are actually different things.

- People in the organisation likely disagree (even if just gently) on some aspects of the design or requirements of both A and B. If you assign people to camps, and the camps don't overlap strongly when first considering A, then B, then it's likely A and B are actually different things.

- Can people in the organisation imagine a useful change to A that doesn't require a corresponding change to B? Then it's likely A and B are actually different things.

And the opposite:

- Does A support two entirely disparate sets of operations, where each component only ever uses operations from one set and not the other, then maybe A is actually two different things A and B.

In arguing for keeping things separate in peer reviews, I have found test 3 to be the most easily convincing, i.e. giving a concrete example of a useful change to one that wouldn't necessarily result in a change to the other.

----

It's also worth considering that sometimes it's not a matter of A and B being entirely different or the same thing, but rather that they should both be built on a common abstraction C that we just haven't thought of yet (and may not for another few months until we see another example that would need it, that happens to trigger the right neural pathway to conjure C up in the mind).

----

That said, when in doubt, I still think it's better to err on the side of combining into one. It's far easier to split up a thing that has been combined too far than it is to ensure to always co-evolve things that have been accidentally split up too far. (And then deal with the consequences when they have drifted apart despite not meaning to.)


> Junior devs rush usually rush to combine things like that together, ….

I know low-effort comments don't do well on HN, and this is one, but I just can't let this pass without sharing my intense happiness in reading this typo in a comment against DRY zealotry.

There must be something really subconsciously tempting about this kind of typo (https://twitter.com/BillHeyman/status/1646653124864606210):

> This is correct. It's often a very bad idea to create an abstraction without out having at least three instances that can use it.


I would go even further. Only create the abstraction when you have no other choice. If copying the code isn't causing you pain don't touch it, it's working code. In the rare case you need to update more than one copy you can grep a little bit for a while.

Taking a bunch of duplicated but "flat" code and creating an abstraction around it gets easier the more copies of the code you have because you have a more complete picture of what the right abstraction actually is.

Code is like clay, once it DRYs it stops being malleable.


It requires good familiarity with code base or domain knowledge to know that one has to grep and for how long. The abstraction triggers you to look at the right places.

That said, I am firmly in the camp that requires at least three instances.


Totally. A few copy/pastes describes the author. An abstraction describes the project, and you have to be so careful here to not break (or dramatically widen) the mental models.


How about something like this:

Is there a specific benefit you see in combining the code here? Combining these together introduces technical risk through increased coupling. Splitting the functionality has very little cost.


Combining helps when you later have to make changes to the common part, by making it impossible to forget that there is another place in the code you also need to change. Splitting also introduces technical risk.


Let the duplication live for a while. You may still make changes to the new code due to new insights and bugs.

Make a note or an issue to revisit after a few weeks.

Always refactor when a third or fourth identical case show up.


A good rule of thumb is "if one of these were to change in the future, would it be expected that the other one would change in the same way?" (where change can mean a bugfix or can mean a change in required behaviour).


I've been on calls where non-technical (or just technical enough to be dangerous) people advocate combining things. Like you said, it can be hard to justify not doing it so it inevitably becomes the path forward.


When in that situation I tend to go ahead and factor out the similar code anyway. I almost never end up committing it, but it usually helps turn the "gut feeling" into a better understanding of the differences.


Whether or not to split is more a measure of whether or not these two concepts are likely to split down the road than whether or not share similarity today.


Yes, the way I phrase this when defending it in review is "these things have different reasons why they would change, despite being the same right now." Thanks Sandi Metz.


'Idiomatic' is another good word.

Some people are so addicted to DRY that they want to write helper functions for every 3 lines of code that appear together. Nobody else can figure out what the fuck their code does, but only 3 of them will tell them to their faces.


Sandi Metz is a great resource for working through some of the dogmas out there. She's fully bought into OO and design patterns so it's not a situation where an outsider is assailing your whole worldview, and she puts into words what my intuition is about these things super well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: