TSANCHEZ'S BLOG

CMake: Generating Sources

Why generate source code?

Programmers write code, but often times it's useful to let the computer generate code for you. During the compile process, information can be available which you want to incorporate into the executable, but you don't want to worry about constantly keeping up to date.

Version Stamps

One of the most obvious pieces of information to include in a build, which is only known at the time you perform the build, is a Version Stamp. A Version Stamp is any metadata you want to include in the executable which the program itself can use to identify when, where, and how it was built.

Knowing how a program was built is important for real world use-cases like crash reporting. In this scenario, you built the code as two pieces. Firstly you build an executable which contains no debug symbols in it. This makes the executable much smaller, and slightly faster than if you were to include the debugging information. Secondly, you still build all the debug symbols, but you have the compiler store them separately (often on a "symbol server" for later retrieval). When sending back a crash report, you need a way to tie the two together. This is where the Version Stamp comes into play, as it tells you exactly which symbol file you need to retrieve.

Being able to see a Version Stamp is also useful in other scenarios as well, such as metrics reporting and monitoring. If the executable exposes the version information in a human readable form, you can manually check which version you're running (live? latest? someone's custom build?). If it exposes it in some machine readable form you can monitor rollouts to see which remote machines have recieved various updates.

Lastly, in a scenario similar to the crash reporting, Version Stamps are useful for tying various external resources to a given version. When your program requires files from or network connections to remote servers the version information can be used by the server to ensure correct behavior (including rejecting the request.).

Packed Resources

Often an executable isn't just source code. You'll have icons, images, and text files which you pack into the final executable. This is super simple when the file already exists in the correct form when you decide to start a build. However, there's plenty of cross-platform scenarios where the resource has to change per-platform. Instead of having a dozen different variations of a file, you often really just want one file.

For instance, you can have a "splash screen" style image resource, but it needs to be scaled differently and compressed differently per-platform. Such operations are easy to script up using some CLI tool (like imagemagic) so why not add that to the build process directly instead of having to run that tool disjointly.

Also it's sometimes useful to convert a resource into source code as a static const char *iconData[] = { ... }; style data block. Thus you have to perform such a conversion before building the executable.

Generated Sources

There's plenty of tools out there like Google's ProtoBuffers which take some definition file in language X and spit out source code in language Y. This can avoid the manual writing of tediusly repetitive serialization code or framework boilerplate. Again, in order to start the build, the generator has to be run.

CMake Build Rules

CMake actually makes it easy to perform all of these types of actions.

Version Stamps

Firstly, CMake's configure_file function can be used to convert a "template" file into a new output file. The "template" allows you to substitute any variable available in the CMake script itself into the generated file.
# Configure a version.h containing the above collected version
# information.
configure_file (
  "${PROJECT_SOURCE_DIR}/version.h.in"
  "${PROJECT_SOURCE_DIR}/.generated/version.h"
)
Could process a template like this:
#define BUILD_BRANCH_ID "@GIT_BRANCH@"
#define BUILD_VERSION_HASH "@GIT_COMMIT_HASH@"
#define BUILD_TIMESTAMP __DATE__##" "##__TIME__
into a header file like:
#define BUILD_BRANCH_ID "master"
#define BUILD_VERSION_HASH "f5b987b"
#define BUILD_TIMESTAMP __DATE__##" "##__TIME__
Since it's substituting variables like GIT_BRANCH you also need to ensure those are set with the CMakeLists.txt file beforehand. You can use CMake's execute_process command to run external tools like git, p4, or svn in order to extract the relevant source control information.
# Get the current working branch
execute_process(
  COMMAND git rev-parse --abbrev-ref HEAD
  WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
  OUTPUT_VARIABLE GIT_BRANCH
  OUTPUT_STRIP_TRAILING_WHITESPACE
)

# Get the latest abbreviated commit hash of the working branch
execute_process(
  COMMAND git log -1 --format=%h
  WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
  OUTPUT_VARIABLE GIT_COMMIT_HASH
  OUTPUT_STRIP_TRAILING_WHITESPACE
)
At this point you're done, you can safely add .generated/ to you .gitignore or other configuration to prevent checking in the generated data. And then any source files can #include ".generated/version.h" knowing it's going to exist.

Source/Resource Generators

This is where things get complicated. Generating source code is a multi-step process in CMake, which gets even more complicated if the code your building is the same code that will perform the generation process. (eg. you've written the conversion tool itself, and need to build it, then process some files, then build the rest of the project).

To start out with, you have to determine where your generator tool is. If it's a pre-installed program, you just reference it.

set(generator_location "/usr/bin/frobulator")
However, if the code is part of the project itself, you need to tell cmake about the dependency. The TARGET_FILE command can be used to cross-reference a previous add_executable.
add_executable(my_generator ...)
  
...
  
set(generator_location $TARGET_FILE:my_generator>)
Once you have the command you want to run, you can loop over the input files and call a add_custom_command operation to actually run the generator on each input file.
foreach(input_file ${inputs})
  
  ...
  
  add_custom_command(
    OUTPUT ${output_name}
    COMMAND ${generator_location}
    ARGS --whatever
         --args
         --like
         --input_file ${input_file}
         --output_file ${output_name}
    DEPENDS my_generator
    WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
    COMMENT "Generating output for my_generator ${input_file}"
    USES_TERMINAL
  )
  
  ...
  
endforeach()
Running such a custom command however doesn't actually tell CMake anything useful about the files that get generated. You have to perform your own bookkeeping, and you have to be sure to tell CMake that the files won't exist until the command is run.

Thus, for each file you generate you need to keep your own list, and mark everything in the list as GENERATED.

foreach(...)
  ...
  list(APPEND ${outputs} ${output_name})
  ...
endforeach()
set_source_files_properties(${${outputs}} PROPERTIES GENERATED TRUE)
Finally you can include that output file listing in your other build rules.
add_executable(foo foo_main.cpp ${outputs})
Since that's a lot of code to write, it's useful to wrap up the whole process into a function which you can call, which might look something like this:
function(fsbuff_generate SOURCES)
  if(NOT ARGN)
    message(SEND_ERROR "fsbuff_generate() called without schema files")
  endif()

  set(${SOURCES})
  foreach(schema_file ${ARGN})
    get_filename_component(file_path "${schema_file}" ABSOLUTE)
    file(RELATIVE_PATH genfile_rel_dir ${SOURCE_DIR} ${file_path})
    set(output_base "${GENERATED_ROOT}/${genfile_rel_dir}")
    get_filename_component(file_dir "${output_base}" PATH)
    file(MAKE_DIRECTORY ${file_dir})

    set(fsbuffc_location $<TARGET_FILE:fsbuffc>)
    add_custom_command(
      OUTPUT  "${output_base}.cpp" "${output_base}.h"
      COMMAND "${fsbuffc_location}"
      ARGS --input "${file_path}"
           --outdir "${file_dir}/"
           --allow_overwrite "true"
      DEPENDS fsbuffc
      WORKING_DIRECTORY ${SOURCE_DIR}
      COMMENT "Generating fsbuff for: ${schema_file}"
      USES_TERMINAL
    )

    list(APPEND ${SOURCES} "${output_base}.cpp")
    list(APPEND ${SOURCES} "${output_base}.inl")
    list(APPEND ${SOURCES} "${output_base}.h")
  endforeach()

  set_source_files_properties(${${SOURCES}} PROPERTIES GENERATED TRUE)
  set(${SOURCES} ${${SOURCES}} PARENT_SCOPE)
endfunction()

Links

The full example code can be found on my github

Copyright © 2002-2019 Travis Sanchez. All rights reserved.