One of the reasons that open-source software is so solid is that we use some of the most cutting-edge open-source code analytic-tools available to ensure our software does what we intend in a bug-free manner. In this post, I will talk about one tool we use, Sonar, and two specific metrics I've found useful in focusing the resolution of our technical debt.
Technical debt is defined as all the stuff you should have done that you didn't have time to do. For example, you may have left off a couple of unit-tests. Or, perhaps you decided not refactor that 6,000 line java class because it worked as a prototype and "if it ain't broke...". As your applications grow, the amount of technical debt will also grow. In many commercial and consulting settings, it may not be realistic to take a few months off of implementing new features to resolve the technical debt. In that same spirit of realism, the only time a team will realistically focus on resolving technical debt is when there is absolutely nothing else to do. You know, after all tasks are completed, the Kanban board's WIP column is clear, and your team is tired of playing Call of Duty.
This lack of priority and time to resolve technical debt creates a problem. Too much technical debt, and your codebase becomes unmaintainable. Your team is literally one production outage away from working 18 hours days seven days a week until the bug is found. You need to have a way to focus the resolution of your technical debt in order to reduce the risk of a production outage.
Sonar is a source-code analysis tool that has proven very useful in doing this. Specifically, there are two metrics that are very useful for targeting the work: Cyclomatic Complexity and LCOM4. Together, these two metrics will provide you with a very easy, and inexpensive way to target your technical debt resolution.
The measure of the number of unique pathways through a class, method, or application is called "cyclomatic complexity". This metric was originally proposed by Thomas McCabe in 1963. It already has a write-up on Wikipedia that goes into great technical detail about what it is, how it is calculated, and even has some pretty pictures. Instead of completely describing it here, instead I'll hope you clicked the link and skimmed the wiki before continuing to read further.
Another way to think about cyclomatic complexity is as a measure of the difficulty a new developer will have understanding source-code. Usually, a cyclomatic complexity of 5 or less is considered good. Anything between 6 and 11 is considered moderately risky. And, any source-code with a complexity over 10 is considered poor.
In my experience, you should also take into consideration the difference between classes encapsulating your business algorithms, and those containing a large number of utility methods. A utility class may have 100 methods, each doing something very small. Your complexity for the utility class may be over 100, but when you look at the average complexity per method, it will be 1 or less because your methods will be very discreet. Now, compare that with a class containing methods which implement business logic. In the prototype phase of development, these will likely consist of multiple if statements each with for loops and switches. This kind of class will have a complexity which will grow with each "if" statement in your method. While you usually can easily ignore large utility classes, you should absolutely refactor classes containing business logic with high complexity.
When using Cyclomatic Complexity to target technical debt resolution, identify those classes with the highest complexity that are not utility classes. Sonar provides this information is an easy format, and is fairly easy to set-up and use!
LCOM stands for "Lack of Cohesion of Methods" and generically is a set of metrics that measure the how methods in a class interact with each other. This metric was updated a number of times until LCOM4 was introduced by Hitz & Montazeri. LCOM4 measures the connected components within a class. The term "connected components" refers to related methods and class-scope attributes. LCOM4 suggests that only methods and attributes that rely on each other should be in a class.
If you think about it, from a maintainability standpoint, it is a lot easier to understand a class if all of the components of the class refer to each other. Think of this as a single unit of algorithmic activity. Consider if your hello world class contained methods and attributes that convert between Celsius and Fahrenheit in addition to methods to print out the words "Hello World". How much easier would it be for a new developer to maintain that code if the Celcius-to-Fahrenheit conversion code were in a different class than then hello-world code?
This may seem like a pretty simple example, but imagine a prototype composed of 500 classes each with 1000 lines or more, and with an average LCOM4 score over 5? Can you imagine being handed this codebase to maintain? Better yet, can you imagine being asked, 4 years after you wrote the code, to come back and "upgrade" it to use the latest-and-greatest architecture?
Just as with complexity, you should also consider the difference between utility classes and classes containing business logic. The LCOM4 score of a utility class may be in the 100's. While this is completely unacceptable for classes containing the implementation of business algorithms, it is completely acceptable for utility classes. When you use LCOM4 to identify classes to refactor, make sure that you don't focus on your utility classes.
Using Them Together
LCOM4 and Cyclomatic Complexity are related to each other, and by taking them both into account, you will be able to determine where to focus your technical debt. Below should help when you compare a given class' LCOM4 versus Cyclomatic Complexity. I am rating the re-factoring on a scale of one to four, where one is the first priority for re-factoring, and four is the lowest.:
- Complexity is high and LCOM4 is high: This class has a refactor priority of one. This class implements a number of business algorithms and the methods are very complex. Refactor each algorithm into its own class, then simplify the methods.
- Complexity is high and LCOM4 is low: This class has a refactor priority of two. This class contains a low number of distinct business algorithms, but its methods are very complex. First simplify the methods. Then, refactor the algorithms into their own classes. This order works here because the problem isn't the mixture of the algorithms but rather the complexity of the methods. By simplifying the methods, you will see that some of the methods were written to apply to both algorithms. Refactoring the methods first will result in an easier refactoring of algorithms.
- Complexity is low and LCOM4 is high: This class has a refactor priority three This is a utility class. There are a large number of methods which implement different business algorithms. There is no need to bother with this class.
- Complexity is low and LCOM4 is low: This class has a refactor priority of four. This is a well-written class. Don't touch it. Consider giving the developer accolades, bonuses, or perhaps not stealing their lunch from the fridge on brown-bag Fridays, Marvin!!
Technical debt represents the skeletons in a software development project's closet. Tools like Sonar give you access to metrics like LCOM4 and cyclomatic complexity. Don't be afraid though, the good thing about being able to see your skeletons is that you can fix them. In this case you can fix potential problems as they arise.