Finding Java classes that are only used by tests

Background/Why you want to do this

Modern development practice is to write unit tests for code. This is obviously a net win for maintainability and code quality. But as the code base evolves it can easily lead to a bad situation where you have code that’s only ever used by the unit tests, not your production code. This slows down your build, slows down your tests, slows down your onboarding of new hires and requires more maintenance on every JVM upgrade. Anecdotally this is becoming a larger issue in the last few years.

Example of how this happens

Imagine a service with the main class being Application.java which depends on BusinessLogic.java which depends on SubBusinessLogic.java all of which have tests, ApplicationTest.java, BusinessLogicTest.java and SubBusinessLogicTest.java. Then the requirements change Application.java now needs to calls BusinessLogicService instead and stops using BusinessLogic.java. The change to do this will likely be in two or three separate changes/pull requests1.

  1. Feature flag calling the BusinessLogicService
  2. Compare the results to make sure there aren’t any difference
  3. Stop calling BusinessLogic.java class

The third pull request probably feels like a formality and so the developer could easily forget to delete BusinessLogic.java and BusinessLogicTests.java much less SubBusinessLogic.java and SubBusinessLogicTests.java

A dev won’t notice this over the course of normal development. While most modern IDEs will indicate if a method or class is completely unused, because of SubBusinessLogicTests.java it won’t show SubBusinessLogic.java as being unused. You can run an inspection in Intelij called ‘Find unused declarations’ and limit it to only looking at Production code, but it can have issues handling Lombok, some annotations and certain build setups.

Why this is becoming more common

Java turns 30 next year. Anecdotally more development time is spent on Java projects that are older than 5 years than younger than 5 years2. Between attrition and reorgs a 5 year old project probably doesn’t have anyone from the original team and might not have anyone who’s even worked with anyone on the original team. This means that you’re less likely to have a developer who will be in the codebase, see SubBusinessLogic.java and think “We don’t need this anymore”. The post 2022 hiring binge hangover means there are a lot fewer rewrites from scratch and a lot more junior developers which exacerbates this trend.

How to find code only used by tests

1. Generate separate jars for both prod and test code

First generate both your jars. This is going to vary with which build system you use (Ant, Maven, Gradle, etc.). But you need to generate your prod_jar which just has the classes that are used by the code that you ship to production and your test_jar that includes all of your test code. If it also includes your production code that’s expected as step 4 will handle it

2. Generate class dependencies

You want a list of what classes each class depends on. I initially used jdeps, but it wouldn’t include SubBusinessLogic.java as a dependency of BusinessLogic.java if they are in the same package. So I ended up using classycle as it does output that type of dependency. To generate them you run

java -jar classycle.jar -raw test_jar > test_jar_raw.txt

java -jar classycle.jar -raw prod_jar > prod_jar_raw.txt

If you have multiple jars for prod and test you want to add all the dependencies in the same file

3. Filter output to make them comparable

The output produces is meant to be human readable, with each class in your project being in it’s own section and looking like

class com.yourcorp.BusinessLogic.java
    known internal class com.yourcorp.SubBusinessLogic.java 
    unknown external class java.util.List

You only want the dependencies, not the class. You can filter these by first grepping for the lines with leading spaces. Then you most likely only want the files that start with com.yourcorp as those are the ones you are most likely to own.

grep '    ' test_jar_raw.txt | grep -o 'com.yourcorp\S*'> ref_by_test
grep '    ' prod_jar_raw.txt | grep -o 'com.yourcorp\S*'> ref_by_prod

You can also use this output to get counts of how many places a class is referenced. If a class is only referenced in one place, and it’s not a design decision it’s often good candidates for merging/inlining

4. Get a singular sorted list of each jar’s dependant classes

cat ref_by_test | sort | uniq > sorted_test_deps

cat ref_by_prod | sort | uniq > sorted_prod_deps

5. Compare them and examine results.

comm -23 sorted_test_deps sorted_prod_deps

These are classes that are only referenced by tests and are candidates for deletion. Using Intelij my process is to go through them case by case

  1. Verify that they are uneeded (see Limitations below)
  2. Safe delete the refrencing test class
  3. Safe delete the class

Then run locally, run integration tests and send for review. By doing this you can easily delete a few thousand lines of code in an afternoon

Limitations

Non-recursive

After the example code change above Application.java no longer references BusinessLogic.java but BusinessLogic.java still references SubBusinessLogic.java. So in the first pass of running this BusinessLogic.java would come up but SubBusinessLogic.java would not. So you want to run this more than once to get all the unneeded classes.

Test helpers

There will also be some false positives. Some of these will be test helper classes, that are used for common test setup or requirements. You can easily filter most of the test helpers with grep -iv test. There might still be some test helpers that don’t use test in the name though. For example ApplicationTest.java might inject a subclass of BusinessLogic.java called SimpleBusinessLogic.java, which for testing purposes doesn’t call to a database

Entry Points

Some of the false positives will be your package’s entry points, API hooks, http endpoints or main depending on the type of java application. So in the example above Application.java would appear because it is not referenced by any other prod class, but is tested. Usually these will be in the same subpackage so can be filtered with grep -v com.yourcorp.api or whatever package the entry points are in.


  1. I once had to do this in as step 6 of a larger process of safely changing an API 

  2. The issue here has been an issue since at least 2012