Finding Java classes that are only used by tests
Background/Why you want to do this
Modern development practice is to write unit tests for code. This is obviously a net win for maintainability and code quality. But as the code base evolves it can easily lead to a bad situation where you have code that’s only ever used by the unit tests, not your production code. This slows down your build, slows down your tests, slows down your onboarding of new hires and requires more maintenance on every JVM upgrade. Anecdotally this is becoming a larger issue in the last few years.
Example of how this happens
Imagine a service with the main class being Application.java
which depends on BusinessLogic.java
which depends on SubBusinessLogic.java
all of which have tests, ApplicationTest.java
, BusinessLogicTest.java
and SubBusinessLogicTest.java
. Then the requirements change Application.java
now needs to calls BusinessLogicService
instead and stops using BusinessLogic.java
. The change to do this will likely be in two or three separate changes/pull requests1.
- Feature flag calling the
BusinessLogicService
- Compare the results to make sure there aren’t any difference
- Stop calling
BusinessLogic.java
class
The third pull request probably feels like a formality and so the developer could easily forget to delete BusinessLogic.java
and BusinessLogicTests.java
much less SubBusinessLogic.java
and SubBusinessLogicTests.java
A dev won’t notice this over the course of normal development. While most modern IDEs will indicate if a method or class is completely unused, because of SubBusinessLogicTests.java
it won’t show SubBusinessLogic.java
as being unused. You can run an inspection in Intelij called ‘Find unused declarations’ and limit it to only looking at Production code, but it can have issues handling Lombok, some annotations and certain build setups.
Why this is becoming more common
Java turns 30 next year. Anecdotally more development time is spent on Java projects that are older than 5 years than younger than 5 years2. Between attrition and reorgs a 5 year old project probably doesn’t have anyone from the original team and might not have anyone who’s even worked with anyone on the original team. This means that you’re less likely to have a developer who will be in the codebase, see SubBusinessLogic.java
and think “We don’t need this anymore”. The post 2022 hiring binge hangover means there are a lot fewer rewrites from scratch and a lot more junior developers which exacerbates this trend.
How to find code only used by tests
1. Generate separate jars for both prod and test code
First generate both your jars. This is going to vary with which build system you use (Ant, Maven, Gradle, etc.). But you need to generate your prod_jar
which just has the classes that are used by the code that you ship to production and your test_jar
that includes all of your test code. If it also includes your production code that’s expected as step 4 will handle it
2. Generate class dependencies
You want a list of what classes each class depends on. I initially used jdeps, but it wouldn’t include SubBusinessLogic.java
as a dependency of BusinessLogic.java
if they are in the same package. So I ended up using classycle as it does output that type of dependency. To generate them you run
java -jar classycle.jar -raw test_jar > test_jar_raw.txt
java -jar classycle.jar -raw prod_jar > prod_jar_raw.txt
If you have multiple jars for prod and test you want to add all the dependencies in the same file
3. Filter output to make them comparable
The output produces is meant to be human readable, with each class in your project being in it’s own section and looking like
class com.yourcorp.BusinessLogic.java
known internal class com.yourcorp.SubBusinessLogic.java
unknown external class java.util.List
You only want the dependencies, not the class. You can filter these by first grepping for the lines with leading spaces. Then you most likely only want the files that start with com.yourcorp
as those are the ones you are most likely to own.
grep ' ' test_jar_raw.txt | grep -o 'com.yourcorp\S*'> ref_by_test
grep ' ' prod_jar_raw.txt | grep -o 'com.yourcorp\S*'> ref_by_prod
You can also use this output to get counts of how many places a class is referenced. If a class is only referenced in one place, and it’s not a design decision it’s often good candidates for merging/inlining
4. Get a singular sorted list of each jar’s dependant classes
cat ref_by_test | sort | uniq > sorted_test_deps
cat ref_by_prod | sort | uniq > sorted_prod_deps
5. Compare them and examine results.
comm -23 sorted_test_deps sorted_prod_deps
These are classes that are only referenced by tests and are candidates for deletion. Using Intelij my process is to go through them case by case
- Verify that they are uneeded (see Limitations below)
- Safe delete the refrencing test class
- Safe delete the class
Then run locally, run integration tests and send for review. By doing this you can easily delete a few thousand lines of code in an afternoon
Limitations
Non-recursive
After the example code change above Application.java
no longer references BusinessLogic.java
but BusinessLogic.java
still references SubBusinessLogic.java
. So in the first pass of running this BusinessLogic.java
would come up but SubBusinessLogic.java
would not. So you want to run this more than once to get all the unneeded classes.
Test helpers
There will also be some false positives. Some of these will be test helper classes, that are used for common test setup or requirements. You can easily filter most of the test helpers with grep -iv test
. There might still be some test helpers that don’t use test
in the name though. For example ApplicationTest.java
might inject a subclass of BusinessLogic.java
called SimpleBusinessLogic.java
, which for testing purposes doesn’t call to a database
Entry Points
Some of the false positives will be your package’s entry points, API hooks, http endpoints or main
depending on the type of java application. So in the example above Application.java
would appear because it is not referenced by any other prod class, but is tested. Usually these will be in the same subpackage so can be filtered with grep -v com.yourcorp.api
or whatever package the entry points are in.
-
I once had to do this in as step 6 of a larger process of safely changing an API ↩
-
The issue here has been an issue since at least 2012 ↩