It seems like when people talk about inheriting a bad codebase people often mention: spaghetti code, not making use of language patterns like inheritance/interfaces/modules/etc, no or bad documentation.
To be honest, besides the documentation thing idgaf about stuff like this. If the code works it works and I can still fix or improve it slowly over time incrementally. What I care about instead:
  • Obscure libraries or dependencies. I have seen this so often that old code uses obscure dependencies without updates since 5y when it could have used apache or pandas or stuff. When there are bugs or security problems here refactoring
  • No or few unit tests. Refactoring code is hard when I can't be sure about all cases and exceptions. If you work on something with inexperienced or bad programmers - please tell them to write more effing tests instead of "pretty" code
  • Non-migratable custom stuff. If your program absolutely needs a custom build pipeline (I hate this concept in general but idk not here to judge, maybe you need it) at least choose something that can be migrated to a new technology in 10 years. e.g. Javascript build script or bash yes - some weird building.xml file that only works in eclipse no no no
What are your thoughts on this subject? Ever inherited a 10yo codebase that seemed like a black hole? Ever had to refactor spaghetti code? Disagree with me which of these is the worst or doesn't matter?
Your list so far is good and I agree with those. But here are some additional things that particularly irk me.
  1. Code that's too clever. Yea, you can munge some data structure to do something in one line, but it's going to take me 20 minutes to figure out what that line does and it often has subtle bugs in there because it doesn't do any error checking. Bonus points if there's no documentation saying what it does.
  2. Bad variable names. This is really an art and good code makes obvious what the variables are doing and how to change them. Bad code makes this a complete guess because they're named something like "data" or "element" or even worse "x". There are exceptions, of course, but it really hampers readability or changes down the line when variable names are vague, or even worse, deceiving.
  3. Too much abstraction. This is particularly egregious with Java programmers, who seem to want classes for everything. Now you have to open 13 different files to find out where the actual implemented function is and it's not at all obvious where you have to make changes to fix the bug.
  4. Uberfunctions. A method or function that's 3 pages long and does 30 different things is just asking for trouble. I've seen so many of these that the programmer was too lazy to separate. The problem is that a function that's really long is very hard to conceptually keep in your mind when you're fixing something. And it's really easy to break. I can almost guarantee that these functions also lack unit tests, making changes to them very difficult.
These are some of the things that really annoy me, though there's probably a lot more. It just smells bad.
reply
  • Stuff that is implemented twice. (Copy & Paste) You fix a bug an find out that the bug is also some where else.
  • Non runnable code after you got if from git or no documentation what to do to run the solution. (Install Database, Dependenies, whatever...)
  • Code documentation about WHAT the code is doing instead of WHY it is doing it. I think thats one thing beginners often do because they are not used to read code.
To be honest, if you look at code you wrote 10 years ago you'll probably find stuff like that ;)
reply
Stuff that is implemented twice
I agree, this is annoying most of the time.
But the thing I run into a lot and really can't stand is premature abstraction: "These 2 things look the same. Let's build a package with some base class so that everything is DRY"
Some time later, it turns out that the 2 things aren't remotely the same. Meanwhile, layers upon layers of crap have been built with this package as a dependency and I've got to trawl through it, tease the 2 things apart, refactor anything that uses the package, and/or convince other teams/users that they need to do the same in their code. Not fun
reply
Stuff that is implemented twice. (Copy & Paste) You fix a bug an find out that the bug is also some where else.
I forgot about that but as soon as you referenced bugs twice it sparked memories lmfao
reply
Humans are good at suspecting ;)
reply
I once saw a PowerPoint created to explain how shit an inherited codebase was. Lots of stuff you mentioned, duplicated methods, breaking abstraction layers all over the place, stuff like that. I felt bad for the original owners being called out like that, but they also kinda deserved it lol
reply
I'm not a software engineer but a data scientist.
But I can tell you that in the data world, unit tests are not yet that common practice. But it really should be. People string together SQL queries that give insensible output and no one is checking, at least not as part of regular workflow.
I'm part-timer so not really able to push for culture change on this at my org, but something i've noticed and it causes problems for sure
reply
But I can tell you that in the data world, unit tests are not yet that common practice.
Are you sure about that? Or is it just the industry or company you're in?
reply
I personally like code to be easy to understand, but I also agree with all your other points.
not making use of language patterns like inheritance/interfaces/modules/etc, no or bad documentation.
I think a lot of this overhyped. The easiest to understand code has minimally viable abstraction.
reply
Here are some things that are not necessarily always bad but that I think smell:
  1. No unit tests: The first thing I do in a new codebase is open the tests folder. A good set of tests should basically be API documentation for your code, showing me what it typically does, how it could go wrong, and what mitigations have been put in place.
  2. Excessive use of mocks in tests: It's fine, and helpful, to mock out a few http requests, etc. but when your mocking code becomes as complex as the code under test, you have to wonder what is even being tested. Also, the source code becomes difficult to change: want to change one line? Now all of the tests fail because the mocks don't make sense any more and you have to spend much longer fixing up the mocks than it took to make the original code change.
  3. No readme (or some useless, generic one that has nothing to do with this particular bit of code): I don't have time to analyze all of your code. Tell me what it does and how to test/package/run it.
  4. Magic numbers: e.g. retry(5). What is that? Retry in 5 seconds? Retry 5 times? If the latter, then why not MAX_RETRIES = 5 or something then retry(MAX_RETRIES)?
  5. Main/handler functions that basically do everything with no smaller functions. Impossible to test properly and difficult to add functionality.
  6. Functions that take too many arguments: This is normally a sign that your functions have too many responsibilities or that you are passing data around in your code unnecessarily.
  7. Singleton classes that exist solely to serve as a namespace. Especially annoying in something like Python where the file is a namespace.
  8. Files or directories named "utils": these normally end up containing a bunch of misc. stuff because devs were too lazy to figure out where things should actually go because that might involve a little bit of refactoring and tidying up
  9. Lack of type annotations (if the language supports them). Running a static type checker can eliminate a whole class of bugs. Annotations also help with code readability.
reply
I consider shit code, assuming it works, to be hard to read and hard to build. If I have to read every line of a function/method to figure out what it does, it probably means the name of the function/method is inadequate at explaining it. If I have to build something that only works on a certain operating system, or has dependencies that only work on computers with specific local files, it's shitty code.
Premature optimization also makes for shitty code. One code snippet I remember used custom bitwise logic for multiplication. It was both a wasted effort (because of compiler optimizations), and made the code unnecessarily complex for newer junior devs on the team to understand.
Cleverness is fine imo, as long as the clever implementation is behind a good interface that can be swapped out easily if problems arise. I would prefer to start off with a well-abstracted dumb implementation and swap it for a clever one afterwards so that the commit history can be called upon for easy file reverts.
reply