Notes/Primer on Clang Compiler Frontend (4) : Creating our Clang Plugin Project and Clang-tidy Linter Framework

Posted on Aug 8, 2024

Notes/Primer on Clang Compiler Frontend: Creating our Clang Plugin Project and Clang-tidy Linter Framework

These are my notes on chapter 5 of the Clang Compiler Frontend by Ivan Murashko. (I’ve referenced this book extensively, and a lot of the snippets here are from this book. I’d highly recommend buying it for a deeper dive: https://www.amazon.com/Clang-Compiler-Frontend-Understand-internals/dp/1837630984)

Today we’re going to be doing two exciting things:

We are going to be using everything we’ve learned so far to build a Clang plugin.
We are going to dive deep into the Clang-tidy linter framework!

Let’s begin!

Clang Plugin Project

We will create a test project that will estimate class complexity, a class is deemed complex if the number of its methods exceeds a certain threshold. We will make use of a recursive visitor, Clang diagnostics, and we will create a LIT test for out project.

First Step: Create unique build configuration for LLVM:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=../install \
-DLLVM_TARGETS_TO_BUILD="AArch64" \
-DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_USE_SPLIT_DWARF=ON \
-DBUILD_SHARED_LIBS=ON ../llvm

(I’m using “AArch64” because i’m building it for my M1 Macbook)

Then we run:

ninja install

After Installation we can create our build configuration.

(IMPORTANT NOTE: Make sure you include Clang in your PATH. We discussed how in the first chapter, and it should be easily googleable)

Build Configuration:

We will name our project ‘classchecker’, and we will create a CMakelists.txt file as follows:

mkdir classchecker
cd classchecker
touch CMakeLists.txt

And then we can access it using ’nano’ or your choice of editors, then we will use the following configuration (The same one from RecursiveVisitor of Chapter 3, but with a few changes):

cmake_minimum_required(VERSION 3.16)

project(classchecker)

if ( NOT DEFINED ENV{LLVM_HOME})
  message(FATAL_ERROR "$LLVM_HOME is not defined")
else()
  message(STATUS "$LLVM_HOME found: $ENV{LLVM_HOME}")
  set(LLVM_HOME $ENV{LLVM_HOME} CACHE PATH "Root of LLVM installation")
  set(LLVM_LIB ${LLVM_HOME}/lib)
  set(LLVM_DIR ${LLVM_LIB}/cmake/llvm)
  set(LLVM_BUILD $ENV{LLVM_BUILD} CACHE PATH "Root of LLVM build")
  find_package(LLVM REQUIRED CONFIG)
  include_directories(${LLVM_INCLUDE_DIRS})
  link_directories(${LLVM_LIBRARY_DIRS})
  # Add the plugin's shared library target
  add_library(classchecker MODULE
    ClassChecker.cpp
  )
  set_target_properties(classchecker PROPERTIES COMPILE_FLAGS "-fno-rtti")
  target_link_libraries(classchecker
    LLVMSupport
    clangAST
    clangBasic
    clangFrontend
    clangTooling
  )

As you can see, we will construct a shared library instead of an executable:

# Add the plugin's shared library target
add_library(classchecker MODULE
  ClassChecker.cpp
)

We also set up a config parameter for the LLVM build folder. This parameter is necessary to locate the LIT executable, which is not included in the standard installation process:

set(LLVM_BUILD $ENV{LLVM_BUILD} CACHE PATH "Root of LLVM build")

After completing the build configuration, let’s build the source file, and let’s start with the first component: a recursive visitor class, ‘ClassVisitor’:

Recursive visitor class:

Our visitor class is going to be located in a header file ‘ClassVisitor.hpp’, its going to be a recursive visitor that handles ‘clang::CXXRecordDecl’, which are the AST nodes for C++ class declarations:

#include "clang/AST/ASTContext.h"
#include "clang/AST/RecursiveASTVisitor.h"

namespace clangbook {
namespace classchecker {
class ClassVisitor : public clang::RecursiveASTVisitor<ClassVisitor> {
public:
  explicit ClassVisitor(clang::ASTContext *C, int T)
      : Context(C), Threshold(T) {}

  bool VisitCXXRecordDecl(clang::CXXRecordDecl *Declaration) {
    if (Declaration->isThisDeclarationADefinition()) {
      int MethodCount = 0;
      for (const auto *M : Declaration->methods()) {
        MethodCount++;
      }

      if (MethodCount > Threshold) {
        clang::DiagnosticsEngine &D = Context->getDiagnostics();
        unsigned DiagID =
            D.getCustomDiagID(clang::DiagnosticsEngine::Warning,
                              "class %0 is too complex: method count = %1");
        clang::DiagnosticBuilder DiagBuilder =
            D.Report(Declaration->getLocation(), DiagID);
        DiagBuilder << Declaration->getName() << MethodCount;
      }
    }
    return true;
  }

private:
  clang::ASTContext *Context;
  int Threshold;
};
} // namespace classchecker
} // namespace clangbook

As we can see, we calculate the number of methods and emit diagnostics if the threshold is exceeded, here is the relevant part of this code:

bool VisitCXXRecordDecl(clang::CXXRecordDecl *Declaration) {
    if (Declaration->isThisDeclarationADefinition()) {
      int MethodCount = 0;
      for (const auto *M : Declaration->methods()) {
        MethodCount++;
      }

      if (MethodCount > Threshold) {
        clang::DiagnosticsEngine &D = Context->getDiagnostics();
        unsigned DiagID =
            D.getCustomDiagID(clang::DiagnosticsEngine::Warning,
                              "class %0 is too complex: method count = %1");
        clang::DiagnosticBuilder DiagBuilder =
            D.Report(Declaration->getLocation(), DiagID);
        DiagBuilder << Declaration->getName() << MethodCount;
      }
    }
    return true;
  }

You can see that our diagnostic message accepts two parameters: the class name and the number of methods for the class. These parameters are encoded with the ‘%0’ and ‘%1’ placeholders, the actual values for these parameters are passed in Line 25, where the diagnostic message is constructed using the DiagBuild object. This object is an instance of the clang::DiagnosticBuilder class, which implements the RAII pattern (Resource Acquisition is Initialization). It emits the actual diagnostics upon its destruction.

(Note: The RAII principle is a common idiom used to manage resource lifetimes by tying them to the lifetime of an object. When an object goes out of scope, its destructor is automatically called, ensuring that the resources are freed)

Now, ClassVisitor is going to be created within an AST consumer class.

AST consumer class

The AST consumer class is implemented in ‘ClassConsumer.hpp’ and represents the standard AST consumer:

#include <clang/AST/ASTConsumer.h>
#include <ClassVisitor.hpp>

namespace clangbook {
namespace classchecker {
class ClassConsumer : public clang::ASTConsumer {
public:
  explicit ClassConsumer(clang::ASTContext *Context, int Threshold)
      : Visitor(Context, Threshold) {}

  virtual void HandleTranslationUnit(clang::ASTContext &Context) {
    Visitor.TraverseDecl(Context.getTranslationUnitDecl());
  }

private:
  ClassVisitor Visitor;
};
} // namespace classchecker
} // namespace clangbook

The code above initializes ‘Visitor’, then utilizes The ‘Visitor’ class to traverse the declarations starting with the top one (translation unit declaration). The consumer must be created from a special AST action class.

AST action Class

The code for the AST action is implemented in ClassAction.hpp:

#include "ClassConsumer.hpp"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendAction.h"

namespace clangbook {
namespace classchecker {
class ClassAction : public clang::PluginASTAction {
protected:
  std::unique_ptr<clang::ASTConsumer>
  CreateASTConsumer(clang::CompilerInstance &CI, llvm::StringRef) {
    return std::make_unique<ClassConsumer>(&CI.getASTContext(),
                                           MethodCountThreshold);
  }

  bool ParseArgs(const clang::CompilerInstance &CI,
                 const std::vector<std::string> &args) {
    for (const auto &arg : args) {
      if (arg.substr(0, 9) == "threshold") {
        auto valueStr = arg.substr(10); // Get the substring after "threshold="
        MethodCountThreshold = std::stoi(valueStr);
        return true;
      }
    }
    return true;
  }
  ActionType getActionType() { return AddAfterMainAction; }

private:
  int MethodCountThreshold = 5; // default value
};
} // namespace classchecker
} // namespace clangbook

We notice a few things:

We inherit our ‘ClassAction’ from ‘clang::PluginASTAction’
We instantiate ‘ClassConsumer’ and utilize ‘MethodCountThreshold’, which is derived from an optional plugin argument
We process the optional ’threshold’ argument for our plugin

We are almost done and ready to initialize our plugin!

Plugin Code:

This is going to be our ClassChecker.cpp:

#include "clang/Frontend/FrontendPluginRegistry.h"

#include "ClassAction.hpp"

static clang::FrontendPluginRegistry::Add<clangbook::classchecker::ClassAction>
    X("classchecker", "Checks the complexity of C++ classes");

As we can see, the majority of the code is hidden by the helper classes, and we only need to pass our implementation to ‘clang::FrontendPluginRegistry::Add’.

Now lets build and test our clang plugin!

Building and running plugin code

Run it using the standard procedure we’ve been used to:

export LLVM_HOME=~/Desktop/llvm-project/install
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ..
ninja classchecker

The build artifacts are going to be in the ‘build’ folder, then we can run our plugin on a test file as follows:

$ <...>/llvm-project/install/bin/clang -fsyntax-only\
            -fplugin=./build/libclasschecker.so\
            <filepath>

For example, if we use a test file named ’test.cpp’ that defines a class with three methods we will not receive any warnings, however if we specify a smaller threshold we will receive a warning for the file.

$ <...>/llvm-project/install/bin/clang
  -fsyntax-only \
            -fplugin-arg-classchecker-threshold=2 \
            -fplugin=./build/libclasschecker.so \
            test.cpp

Now lets create a LIT test for our plugin!

LIT Tests for plugin

Let’s begin by describing the project organization. We’ll adopt the common pattern used in the clang source code, and place our tests in the ’test’ folder. This folder will contain the following files:

lit.site.cfg.py.in: This is the main config file, a CMake config file. It replaces patterns marked as @..@ with corresponding values defined during the CMake configuration. It also loads lit.cfg.py.
lit.cfg.py: This serves as the primary configuration file for LIT tests.
simple_test.cpp: Our LIT test file.

The basic workflow is as follows: CMake takes ’lit.site.cfg.py.in’ as a template and generates the corresponding ’lit.cfg.py’ in the ‘build/test’ folder. This file is then utilized by LIT tests as a seed to execute the tests.

LIT config files

There are two config files for LIT tests. This is lit.site.cfg.py.in:

config.ClassComplexityChecker_obj_root = "@CMAKE_CURRENT_BINARY_DIR@"
config.ClassComplexityChecker_src_root = "@CMAKE_CURRENT_SOURCE_DIR@"
config.ClangBinary = "@LLVM_HOME@/bin/clang"
config.FileCheck = "@FILECHECK_COMMAND@"

lit_config.load_config(
        config, os.path.join(config.ClassComplexityChecker_src_root, "test/lit.cfg.py"))

This file is a CMake template that will be converted into a python script. The most crucial part is where the main LIT config is loaded, it is sourced from the main source tree and is not copied to the ‘build’ folder. Here is the config file:

# lit.cfg.py
import lit.formats

config.name = 'classchecker'
config.test_format = lit.formats.ShTest(True)
config.suffixes = ['.cpp']
config.test_source_root = os.path.dirname(__file__)

config.substitutions.append(('%clang-binary', config.ClangBinary))
config.substitutions.append(('%path-to-plugin',
    os.path.join(config.ClassComplexityChecker_obj_root, 'libclasschecker.so')))
config.substitutions.append(('%file-check-binary', config.FileCheck))

Here we determine which files should be utilized for tests (everything with the suffix ‘.cpp’). We also detail the substitutions that will be employed in the LIT tests, these include the path to the clang binary, the path to the shared library with the plugin, and the path to the ‘FileCheck’ utility.

We have defined only one basic LIT test, simple_test.cpp:

// RUN: %clang-binary -fplugin=%path-to-plugin -fsyntax-only %s 2>&1 | %file-check-binary %s

class Simple {
public:
  void func1() {}
  void func2() {}
};

// CHECK: :[[@LINE+1]]:{{[0-9]+}}: warning: class Complex is too complex: method count = 6
class Complex {
public:
  void func1() {}
  void func2() {}
  void func3() {}
  void func4() {}
  void func5() {}
  void func6() {}
};

We can observe the use of substitutions in the first line, where paths to the clang binary, the plugin share library, and the FileCheck utility are referenced. Special patterns recognized by the utility are also referenced:

// CHECK: :[[@LINE+1]]:{{[0-9]+}}: warning: class Complex is too complex: method count = 6

The final piece of the puzzle is the CMake configuration!

CMake configuration for LIT tests

We need to adjust our CMakeLists.txt to support LIT tests. The necessary changes are as follows:

# Locate the 'lit' tool and FileCheck utility, both of which are required
find_program(LIT_COMMAND llvm-lit PATH ${LLVM_BUILD}/bin)
find_program(FILECHECK_COMMAND FileCheck ${LLVM_BUILD}/bin)
if(LIT_COMMAND AND FILECHECK_COMMAND)
  message(STATUS "$LIT_COMMAND found: ${LIT_COMMAND}")
  message(STATUS "$FILECHECK_COMMAND found: ${FILECHECK_COMMAND}")

  # Point to our custom lit.cfg.py
  set(LIT_CONFIG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/test/lit.cfg.py")

  # Configure lit.site.cfg.py using current settings
  configure_file("${CMAKE_CURRENT_SOURCE_DIR}/test/lit.site.cfg.py.in"
                 "${CMAKE_CURRENT_BINARY_DIR}/test/lit.cfg.py"
                 @ONLY)

  # Add a custom target to run tests with lit
  add_custom_target(check-classchecker
                    COMMAND ${LIT_COMMAND} -v ${CMAKE_CURRENT_BINARY_DIR}/test
                    COMMENT "Running lit tests for classchecker clang plugin"
                    USES_TERMINAL)
else()
  message(FATAL_ERROR "It was not possible to find the LIT executables at ${LLVM_BUILD}/bin")
endif()

We add this to the end of the CMakeLists file right before the endif(). We can see that we search for the necessary utilities, ’llvm-lit’ and ‘FileCheck’. Then we generate ’lit.site.cfg.py’ from the template file ’lit.site.cfg.py.in’. Finally, we establish a custom target to execute the LIT tests. Now lets run the LIT tests!

Running LIT tests

We must set an environment variable that points to the ‘build’ folder, compile the project, and then execute the custom target, ‘check-classchecker’. Here’s how this can be done:

export LLVM_BUILD=~/Desktop/llvm-project/build
export LLVM_HOME=~/Desktop/llvm-project/install
rm -rf build; mkdir build; cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ..
ninja classchecker
ninja check-classchecker

After running it you should see the following output:

[2/2] Linking CXX shared module libclasschecker.so
[0/1] Running lit tests for classchecker clang plugin
-- Testing: 1 tests, 1 workers --
PASS: classchecker :: simple_test.cpp (1 of 1)

Testing Time: 4.27s
Total Discovered Tests: 1
  Passed: 1 (100.00%)

Congratulations, we built our first plugin! Now let’s move on to Clang-tidy!

Clang-Tidy Linter Framework

Now we are going to discuss Clang-Tidy, the clang-based linter framework that utilizes the AST to identify anti-patterns in our code. We will cover the following topics:

Overview of Clang-Tidy
The internal design of Clang-Tidy
How to create a custom Clang-Tidy check

Overview of Clang-Tidy

Clang-Tidy is a linter and static analysis tool for C and C++ code that is built on top of (you guessed it) the Clang frontend. Here are some key terms associated with Clang-Tidy that are useful to understand:

Checks: Clang-Tidy contains a series of “checks” that identify various issues and possible enhancements. These checks range from performance improvements and potential bugs to coding style and modern C++ best practices. For example, it might suggest using “push_back” instead of “emplace_back” for certain cases or identify areas where you might be accidentally using integer overflow.
Extensibility: New checks can be added to Clang-Tidy, making it a highly extensible tool. If you have specific coding guidelines or practices you want to enforce you can write a check for it (which we will be doing soon!)
Integration: Clang-Tidy is often used within CI/CD pipelines or integrated with development environments. Many IDEs support Clang-Tidy directly or via plugins, so you can get real-time feedback on your code as you write it.
Automatic fixes: Clang-Tidy can automatically fix a lot of issues using the ‘-fix’ option. However, it’s important to review the changes as they might not always be appropriate.
Configuration: You can configure which checks Clang-Tidy performs using a configuration file or CLI options. This allows teams to enforce specific coding standards or prioritize certain types of issues. For example, the -checks='-*, modernize-*' CLI option will disable all checks but not the checks from modernize set.
Modern C++ best practices: One of Clang-Tidy’s most appreciated features is its emphasis on modern C++ idioms and best practices, which can guide developer into writing safer, more performant, and better code overall!

Now let’s examine how Clang-Tidy can be built:

We will use the basic build configuration and build Clang-Tidy with the following Ninja command which will install the clang-tidy binary under the llvm-project/install/bin folder:

ninja install-clang-tidy

Since Clang-Tidy is part of Clang-Tools-Extra and its tests are part of the clang-tools CMake target then we can run the tests with the following command:

ninja check-clang-tools

This command will run LIT tests for all Clang-Tidy checks, but if you want to run a specific LIT test separately (For example, for modernize-loop-convert) then we can use the following command:

$ cd <...>/llvm-project
$ build/bin/llvm-lit -v \
    clang-tools-extra/test/clang-tidy/checkers/modernize/loop-convert-basic.cpp

Which will run the following output:

-- Testing: 1 tests, 1 workers --
PASS: Clang Tools :: clang-tidy/checkers/modernize/loop-convert-basic.cpp (1 of 1)

Testing Time: 1.38s
  Passed: 1

Now lets run it on some code examples (Again, provided by Ivan so go buy his book!):

We are going to use the following test program, which is written in the older C++ style (<C++ 11):

#include <iostream>
#include <vector>

int main() {
  std::vector<int> numbers = {1, 2, 3, 4, 5};
  for (std::vector<int>::iterator it = numbers.begin(); it != numbers.end();
       ++it) {
    std::cout << *it << std::endl;
  }
  return 0;
}

Clang-Tidy has a set of checks that encourage adopting more of the modern C++ styles/idioms.

Here is an example code:

/path/to/llvm-project/install/bin/clang-tidy \
    -checks='-*,modernize-*' \
    loop-convert.cpp \
    -- -std=c++17

We specify the path to the Clang-Tidy binary, then we remove all checks using the '-*' option, then we choose all the modernization checks using the 'modernize-*' option (I recommend playing with the checks flag) and we specify the code to be tested and the standard that we want to check against (C++17).

The output will look like this:

loop-convert.cpp:4:5: warning: use a trailing return type for this function
    [modernize-use-trailing-return-type]
   4 | int main() {
     | ~~~ ^
     | auto      -> int

loop-convert.cpp:6:3: warning: use range-based for loop instead
    [modernize-loop-convert]
   6 |   for (std::vector<int>::iterator it = numbers.begin();
              it != numbers.end();
     |   ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     |       (int & number : numbers)
   7 |         ++it) {
     |         ~~~~~
   8 |      std::cout << *it << std::endl;
     |                   ~~~
     |                   number

loop-convert.cpp:6:8: warning: use auto when declaring iterators
    [modernize-use-auto]
   6 |   for (std::vector<int>::iterator it = numbers.begin();
              it != numbers.end();
     |        ^
note: this fix will not be applied because it overlaps with another fix

Some fixes were suggested But some of these fixes conflict with each other (they can’t be applied at the same time)! To mitigate that we can run one specific check to avoid conflicts:

/path/to/llvm-project/install/bin/clang-tidy \
  -checks='-*,modernize-loop-convert' \
  -fix \
  loop-convert.cpp \
  -- -std=c++17

Using the ’-fix’ flag here instructs Clang-tidy to apply the fixes suggested, and loop-convert.cpp will look something like this:

#include <iostream>
#include <vector>

int main() {
  std::vector<int> numbers = {1, 2, 3, 4, 5};
  for (int & number : numbers) {
    std::cout << number << std::endl;
  }
  return 0;
}

As we can see, Lines 6 and 7 have changed compared to the original code.

This is what makes Clang-Tidy so powerful — it doesn’t just detect issues, it can actually fix them for you.

Clang-Tidy Checks

Clang-Tidy has a ton of checks grouped into different categories. Here’s a quick rundown of the main categories with an example from each:

boost-*: Boost library stuff. Example: boost-use-to-string suggests swapping boost::lexical_cast<std::string> for boost::to_string.
bugprone-*: Common bug patterns. Example: bugprone-integer-division warns when integer division in a floating-point context might silently lose precision.
cert-*: CERT C++ Secure Coding Standard. Example: cert-dcl03-c ensures macros aren’t used in unsafe contexts.
cppcoreguidelines-*: C++ Core Guidelines. Example: cppcoreguidelines-slicing catches object slicing where a derived object gets assigned to a base, chopping off the derived parts.
google-*: Google’s coding conventions. Example: google-build-using-namespace flags using-directives.
llvm-*: LLVM coding conventions. Example: llvm-namespace-comment ensures namespaces have closing comments.
misc-*: Miscellaneous. Example: misc-unused-parameters flags unused parameters.
modernize-*: Modern C++ adoption. Example: modernize-use-auto recommends auto for variable declarations when appropriate.
performance-*: Performance. Example: performance-faster-string-find suggests faster alternatives for string searching.
readability-*: Readability. Example: readability-identifier-naming enforces consistent naming conventions.

This is just a subset! Each category has many more checks and there are additional categories too. You can always check the official Clang-Tidy docs or run clang-tidy -list-checks on your system for the full list.

Clang-Tidy’s Internal Design

Now let’s look under the hood at how Clang-Tidy actually works internally.

At its core, Clang-Tidy leverages Clang’s ability to parse source code into an AST. Each check defines patterns or conditions to match against this AST — when a match is found, a diagnostic gets raised and sometimes an automatic fix is suggested. Checks are implemented as plugins which makes the whole thing extensible. The ASTMatchers library is what makes writing these checks ergonomic — it gives you a domain-specific language for querying the AST (we covered this in Section 3.5). Clang-Tidy also supports compilation databases for context like compile flags (Chapter 9 covers this in depth).

Internal Organization

The code is organized within the clang-tools-extra repository, and at a high level it breaks down like this:

Source and headers: Main code lives in the clang-tidy directory inside clang-tools-extra.
Main driver: ClangTidyMain.cpp in the tool subfolder is the entry point.
Core infrastructure: ClangTidy.cpp and ClangTidy.h manage core functionalities and options.
Checks: Organized into subdirectories by category (e.g., bugprone, modernize).
Utilities: The utils directory has utility classes and functions.
AST Matchers: The ASTMatchers library (Section 3.5) is integral for querying the AST.
Clang diagnostics: Uses the Clang diagnostics subsystem (Section 4.4.2) for printing messages and suggesting fixes.
Tests: Located in the test directory, uses LLVM’s LIT framework. The test folder is shared with other projects in clang-tools-extra.
Documentation: The docs directory, also shared with other clang-tools-extra projects.

Configuration and Integration

Clang-Tidy plays nicely with a bunch of external tools. It integrates with IDEs like VS Code, CLion, and Eclipse for real-time feedback. It can plug into build systems like CMake and Bazel to run checks during builds. CI platforms such as Jenkins and GitHub Actions commonly use it to gate pull requests on code quality. Code review platforms like Phabricator can use it for automated reviews too.

Clang-Tidy Configuration

Clang-Tidy uses a .clang-tidy file (YAML format) to specify which checks to run and how to configure them. It has two main keys: Checks and CheckOptions.

The Checks key lets you enable/disable checks. Use - to disable, * as a wildcard, and separate them with commas:

Checks: '-*,modernize-*'

The CheckOptions key sets options per check as key-value pairs:

CheckOptions:
  - key: readability-identifier-naming.NamespaceCase
    value: CamelCase
  - key: readability-identifier-naming.ClassCase
    value: CamelCase

When you run Clang-Tidy, it searches for the .clang-tidy file starting from the directory of the file being processed, walking up through parent directories until it finds one.

Custom Clang-Tidy Check

Now for the fun part — we’re going to transform our plugin from earlier into a proper Clang-Tidy check! Same idea as before: estimate class complexity based on method count, with a configurable threshold.

Clang-Tidy has a handy Python script to help scaffold new checks. Let’s use it.

Creating a Skeleton for the Check

The script is called add_new_check.py and lives in the clang-tools-extra/clang-tidy directory. It takes two arguments: the module (we’ll use misc) and the check name (we’ll call it classchecker).

Running it from the llvm-project directory:

$ ./clang-tools-extra/clang-tidy/add_new_check.py misc classchecker
...
Updating ./clang-tools-extra/clang-tidy/misc/CMakeLists.txt...
Creating ./clang-tools-extra/clang-tidy/misc/ClasscheckerCheck.h...
Creating ./clang-tools-extra/clang-tidy/misc/ClasscheckerCheck.cpp...
Updating ./clang-tools-extra/clang-tidy/misc/MiscTidyModule.cpp...
...
Done. Now it's your turn!

The script generates a bunch of files for us. The important ones we need to modify are:

misc/ClasscheckerCheck.h: Header file for our check
misc/ClasscheckerCheck.cpp: Where our implementation goes

It also generates a LIT test file (ClassChecker.cpp) in clang-tools-extra/test/clang-tidy/checkers/misc, and updates some doc files like ReleaseNotes.rst and list.rst.

Clang-Tidy Check Implementation

Now we replace the generated stub in ClasscheckerCheck.cpp with our actual implementation:

#include "ClasscheckerCheck.h"
#include "clang/AST/ASTContext.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
using namespace clang::ast_matchers;

namespace clang::tidy::misc {

void ClasscheckerCheck::registerMatchers(MatchFinder *Finder) {
  // Match every C++ class.
  Finder->addMatcher(cxxRecordDecl().bind("class"), this);
}

void ClasscheckerCheck::check(const MatchFinder::MatchResult &Result) {
  const auto *ClassDecl =
      Result.Nodes.getNodeAs<CXXRecordDecl>("class");

  if (!ClassDecl || !ClassDecl->isThisDeclarationADefinition())
    return;

  unsigned MethodCount = 0;
  for (const auto *D : ClassDecl->decls()) {
    if (isa<CXXMethodDecl>(D))
      MethodCount++;
  }

  unsigned Threshold = Options.get("Threshold", 5);
  if (MethodCount > Threshold) {
    diag(ClassDecl->getLocation(),
         "class %0 is too complex: method count = %1",
         DiagnosticIDs::Warning)
        << ClassDecl->getName() << MethodCount;
  }
}

} // namespace clang::tidy::misc

The big difference from our plugin is that instead of a RecursiveASTVisitor, we’re now using AST matchers. In registerMatchers(), we register a matcher for cxxRecordDecl() and bind it to the name “class”. Then in check(), we retrieve the matched node, count the methods, and fire a diagnostic if the count exceeds the threshold. The threshold is configurable via Options.get(“Threshold”, 5), defaulting to 5.

The check gets registered under the name “misc-classchecker” in the modified MiscTidyModule.cpp file. After making our changes, we recompile with:

$ ninja install

We can verify it got added by running:

$ <...>/llvm-project/install/bin/clang-tidy -checks '*' -list-checks
...
    misc-classchecker
...

(Note that we enabled all checks using -checks '*' here)

To test it, we can use a simple test file with a class that has three methods:

class Simple {
public:
  void func1() {}
  void func2() {}
  void func3() {}
};

To trigger a warning, we set the threshold to 2:

$ <...>/llvm-project/install/bin/clang-tidy \
    -checks='-*,misc-classchecker' \
    -config="{CheckOptions: [{key:misc-classchecker.Threshold, value:'2'}]}" \
    test.cpp \
    -- -std=c++17

And we get:

test.cpp:1:7: warning: class Simple is too complex: method count = 3
    [misc-classchecker]
class Simple {
      ^

LIT Test

Now let’s write a proper LIT test. We modify the generated classchecker.cpp in clang-tools-extra/test/clang-tidy/checkers/misc:

// RUN: %check_clang_tidy %s misc-classchecker %t

class Simple {
public:
  void func1() {}
  void func2() {}
};

// CHECK-MESSAGES: :[[@LINE+1]]:{{[0-9]+}}: warning:
//   class Complex is too complex: method count = 6 [misc-classchecker]
class Complex {
public:
  void func1() {}
  void func2() {}
  void func3() {}
  void func4() {}
  void func5() {}
  void func6() {}
};

The key differences from our plugin’s LIT test are in Line 1 (which commands to run) and Line 9 (the CHECK-MESSAGES pattern).

Run the test:

$ cd <...>/llvm-project
$ build/bin/llvm-lit -v \
    clang-tools-extra/test/clang-tidy/checkers/misc/classchecker.cpp

And we get:

-- Testing: 1 tests, 1 workers --
PASS: Clang Tools :: clang-tidy/checkers/misc/classchecker.cpp (1 of 1)

Testing Time: 0.12s
  Passed: 1

Our check passes!

Dealing with Compilation Errors

When running our check on real code (as opposed to our toy tests), we might run into something weird. Remember how we discussed in Section 3.7 that compilation errors can mess with the AST? That affects Clang-Tidy too.

Consider this file (error.cpp) where we intentionally misspelled the out-of-line method definition:

class MyClass {
public:
  void doSomething();
};

void MyClass::doSometing() {}  // typo!

If we run our check on this file without any parameters, we get:

error.cpp:1:7: warning: class MyClass is too complex:
    method count = 7 [misc-classchecker]
...
error.cpp:6:15: error: out-of-line definition of 'doSometing'...
Found compiler error(s).

Wait — 7 methods? The class only has one! What’s going on?

Compilation Errors as Edge Cases

Let’s use clang-query to investigate. First we create a corrected version (noerror.cpp) and dump its AST:

$ <...>/llvm-project/install/bin/clang-query noerror.cpp -- --std=c++17
clang-query> set output dump
clang-query> match cxxRecordDecl()

The corrected file shows a clean AST with just the class definition, access specifier, one method, and the usual implicit compiler-generated stuff (default constructor, copy constructor, move constructor, etc.).

Now run the same thing on the broken error.cpp:

$ <...>/llvm-project/install/bin/clang-query error.cpp -- --std=c++17
clang-query> set output dump
clang-query> match cxxRecordDecl()

And there it is — the AST is full of implicitly added methods (constructors, destructors, assignment operators). That’s where the inflated count of 7 comes from! Our check was blindly counting ALL methods, including implicit ones.

The fix is simple — we add !D->isImplicit() to our loop condition:

for (const auto *D : ClassDecl->decls()) {
    if (isa<CXXMethodDecl>(D) && !D->isImplicit())
      MethodCount++;
  }

Now when we run the modified check on error.cpp, we get:

error.cpp:6:15: error: out-of-line definition of 'doSometing'...
    [clang-diagnostic-error]
Found compiler error(s).

The compiler error is still reported, but our check no longer fires a false warning. Fixed!

One important thing to keep in mind though: not every compilation error can be handled this cleanly. The Clang compiler tries to produce an AST even when there are errors (so that IDEs can still give you useful info), but this “error-recovery” AST can contain structures that Clang-Tidy doesn’t expect. So here’s a good rule of thumb:

TIP: Always make sure your code compiles without errors before running Clang-Tidy. This guarantees that the AST is both accurate and complete.

Summary

And that wraps up Chapter 5! We dove into Clang-Tidy (how it works, how it’s configured, and how it’s organized internally). We then built our own custom Clang-Tidy check using AST matchers (basically regex for the AST) to flag classes with too many methods. We also learned how compilation errors can trip up our checks and how to handle that with !D->isImplicit(). For more sophisticated complexity metrics like cyclomatic complexity, you’d need Control Flow Graphs (CFGs), which is exactly what the next chapter covers!