Profiling template metaprograms with C++ Build Insights

This post has been republished via RSS; it originally appeared at: Microsoft Developer Blogs.

The use of templates in C++ programs can sometimes lead to longer builds. C++ Build Insights puts tools at your disposal for analyzing template usage patterns and their associated impact on build time. In this article, we show you how to use the vcperf analysis tool and the C++ Build Insights SDK to understand and fix problematic template patterns. We demonstrate the practical use of these tools through a case study in which we reduced build time for the Sprout open-source metaprogramming library by 25%. We hope these methods will help you approach your template endeavors with confidence! How to obtain and use vcperf The examples in this article make use of vcperf, a tool that allows you to capture a trace of your build and to view it in the Windows Performance Analyzer (WPA). The latest version is available in Visual Studio 2019. 1. Follow these steps to obtain and configure vcperf and WPA: Download and install the latest Visual Studio 2019. Obtain WPA by downloading and installing the latest Windows ADK. Copy the perf_msvcbuildinsights.dll file from your Visual Studio 2019’s MSVC installation directory to your newly installed WPA directory. This file is the C++ Build Insights WPA add-in, which must be available to WPA for correctly displaying the C++ Build Insights events. MSVC’s installation directory is typically: C:\Program Files (x86)\Microsoft Visual Studio\2019\{Edition}\VC\Tools\MSVC\{Version}\bin\Hostx64\x64. WPA’s installation directory is typically: C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit. Open the perfcore.ini file in your WPA installation directory and add an entry for the perf_msvcbuildinsights.dll file. This tells WPA to load the C++ Build Insights add-in on startup. You can also obtain the latest vcperf and WPA add-in by cloning and building the vcperf GitHub repository. Feel free to use your built copy in conjunction with Visual Studio 2019! 2. Follow these steps to collect a trace of your build: Open an elevated x64 Native Tools Command Prompt for VS 2019. Obtain a trace of your build: Run the following command: vcperf /start /level3 MySessionName. The /level3 option enables the collection of template events. Build your C++ project from anywhere, even from within Visual Studio (vcperf collects events system-wide). Run the following command: vcperf /stop /templates MySessionName outputFile.etl. This command will stop the trace, analyze all events, including template events, and save everything in the outputFile.etl trace file. Open the trace you just collected in WPA. Viewing template information in WPA The most time-consuming activity related to templates for the compiler is to instantiate them. C++ Build Insights provides a WPA view called Template Instantiations that allows you to see the instantiation time of the top templates in your program. After opening your trace in WPA, you can open this view by dragging it from the Graph Explorer pane to the Analysis window, as shown below. A note on the presence of template events in a vcperf trace and WPA If you do not see a Template Instantiations view in the Graph Explorer pane, make sure that you have completed the WPA configuration steps from the How to obtain and use vcperf section correctly, and that you have passed the right arguments to vcperf when starting and stopping your trace. For scalability reasons, vcperf only writes in the output trace the templates that take the most time to instantiate in your build. If template instantiations were not a significant factor in your build times, vcperf will omit template information, and the Template Instantiations view will not appear. Case study: speeding up the Sprout template metaprogramming library In this case study, we use a real open source project from GitHub to show how you can use vcperf and WPA to diagnose and treat long build times in template metaprogramming code. Specifically, we demonstrate how we've used these tools to reduce the build time for the Sprout library by around 25%. Use these steps if you would like to follow along: Clone the Sprout C++ Libraries repository from GitHub. Checkout the following commit: 6b5addba9face0a. Obtain a trace for a full rebuild of the Sprout tests: Open an elevated x64 Native Tools Command Prompt for VS 2019. Run the following command: vcperf /start /level3 Sprout Build the Sprout tests by running the following command: cl /std:c++latest /D_HAS_DEPRECATED_IS_LITERAL_TYPE=1 /D_SILENCE_CXX17_IS_LITERAL_TYPE_DEPRECATION_WARNING /EHsc /I. /constexpr:steps100000000 .\testspr\sprout.cpp. Run the following command: vcperf /stop /templates Sprout sprout.etl. This will save a trace of the build in sprout.etl. The trace will contain template information. Open the trace in WPA. We open the Build Explorer and Template Instantiations views one on top of the other. The Build Explorer view indicates that the build lasted around 13.5 seconds. This can be seen by looking at the time axis at the bottom of the view (labeled A). The Template Instantiations view shows a locus of condensed template instantiation activity somewhere between time 8 and 10.5 (labeled B). By default, all template specializations are grouped by the primary template’s name. For example, the std::vector<int> and std::vector<double> specializations would both be grouped under the std::vector primary template name. In our case, we would like to know if there is one specialization in particular that causes an issue, so we reorganize the view’s columns such that the entries are grouped by Specialization Name instead. This operation is shown below. We notice that there is one large instantiation of the sprout::tpp::all_of template that is responsible for around 2.15 seconds. There are also 511 instantiations of sprout::tpp::detail::all_of_impl. We surmise that sprout::tpp::all_of is a root template that calls into a highly recursive sprout::tpp::detail::all_of_impl template. This discovery is shown below. Investigating the codebase We find that the culprit sprout::tpp::all_of template instantiation is triggered by a call to operator() of the following type in sprout\random\shuffle_order.hpp: This type internally contains a compile-time array of 256 elements, which ends up being passed to the default_remake_container function in the sprout\container\container_construct_traits.hpp header. This function has the following three template definitions. Function bodies have been replaced with comments for simplicity. These definitions are all enabled or disabled based on certain conditions using the std::enable_if standard type traits class. Can you spot the sprout::tpp::all_of call in the second definition’s std::enable_if condition? We replicate it below: Looking at the condition as a whole, we can see that sprout::tpp::all_of does not need to be evaluated if default_remake_container is called with a number of arguments different from 2. In our case we have 256 arguments and we know that the condition will be false regardless of what sprout::tpp::all_of returns. In the eyes of the compiler, this does not matter. It will still evaluate sprout::tpp::all_of on our 256 arguments when trying to resolve the call to default_remake_container, resulting in a large, time-consuming recursive template instantiation. Finding a new solution We fix this situation by adding a level of indirection between default_remake_container and the sprout::tpp::all_of call. We first branch on the number of arguments: Only when an argument count of 2 is confirmed do we evaluate sprout::tpp::all_of through a new function called default_remake_container_two_args: You can see the full details of this code change in our fork on GitHub. Evaluating the final result After fixing the code, we collect a second trace of the compilation using the same steps that we covered at the beginning of the case study. We notice that the build time has decreased by around 25% to a total of approximately 9.7 seconds. The Template Instantiations view is also gone, meaning that template instantiations are no longer an issue in this build. Victory! Identifying problematic template instantiations using the C++ Build Insights SDK Recursive, time-consuming template instantiations are not an uncommon problem in codebases that use template metaprogramming, so we want to identify these problems faster in the future without having to go through the hassle of launching WPA and inspecting a trace manually. Luckily, most analysis tasks performed manually with vcperf and WPA can also be performed programmatically using the C++ Build Insights SDK. To illustrate this point, we’ve prepared the RecursiveTemplateInspector SDK sample. It prints out the most time-consuming template instantiation hierarchies in a build, along with statistics about them such as recursion tree depth, total number of instantiations and the root specialization name that started it all. Let’s repeat the Sprout case study from the previous section, but this time by using the RecursiveTemplateInspector to see what it finds. Use these steps if you want to follow along: Clone the C++ Build Insights SDK samples GitHub repository on your machine. Build the Samples.sln solution, targeting the desired architecture (x86 or x64), and using the desired configuration (debug or release). The sample’s executable will be placed in the out/{architecture}/{configuration}/RecursiveTemplateInspector folder, starting from the root of the repository. Follow the steps from the Case study: speeding up a template metaprogramming library section to collect a trace of the Sprout compilation. Use the vcperf /stopnoanalyze Sprout sprout-raw.etl command instead of the /stop command when stopping your trace. Pass the sprout-raw.etl trace as the first argument to the RecursiveTemplateInspector executable. As shown below, RecursiveTemplateInspector correctly identifies the sprout::tpp::all_of template instantiation that has been causing us problems, which triggers other instantiations recursively for a total of 4043 instantiations. Rerunning RecursiveTemplateInspector on our fixed codebase shows that all problematic template instantiations are gone. The ones that remain have a low duration and can be ignored. Understanding the sample code We first filter all stop activity and simple events by asking the C++ Build Insights SDK to forward what we need to the OnTemplateRecursionTreeBranch, and OnSymbolName functions. The name of the functions has no effect on how the C++ Build Insights SDK will filter the events; only their parameters matter. Continue reading to understand what is actually being filtered. We use the OnTemplateRecursionTreeBranch function to capture each branch of the template instantiation recursion tree one by one. Because C++ Build Insights events are represented as stacks, capturing recursive relationships between events is something that can be done with little effort. The TemplateInstantiationGroup capture class automatically unwinds the event stack, and presents all template instantiations that occurred in the parsing thread as a vector-like C++ container ordered from root to leaf. Because we tied the OnTemplateRecursionTreeBranch function to the stop activity event, we will always receive the branches in a given recursion tree at the point where the parsing thread is coming back from the deepest level. We take advantage of this fact to compute the maximum depth of the recursion tree as we process all its branches. Once the root instantiation itself reaches its stop event, we wrap up the instantiation tree by storing the total instantiation time for the tree, along with the translation unit in which it occurred. Template instantiation events don’t contain the name of the symbol that was instantiated. Deriving the name is a costly operation, and doing so while measuring an instantiation would skew the time measurement. Instead, a numeric key is issued, which we can later use to match with the appropriate name by listening to the SymboName event. The OnSymbolName function does just that, and stores the name of every root template instantiation. At the end of our analysis, we traverse all root template instantiations, sort them by longest duration, and dump the top ones. Tell us what you think! We hope the information in this article has helped you understand how you can use the tools provided by C++ Build Insights to profile your usage of templates. Give vcperf a try today by downloading the latest version of Visual Studio 2019, or by cloning the tool directly from the vcperf Github repository. Try out the RecursiveTemplateInspector sample from this article by cloning the C++ Build Insights samples repository from GitHub, or refer to the official C++ Build Insights SDK documentation to build your own analysis tools. Have you found problematic template instantiations using vcperf or the C++ Build Insights SDK? Let us know in the comments below, on Twitter (@VisualC), or via email at visualcpp@microsoft.com.   This article contains code snippets from Sprout, Copyright (c) 2011-2019 Bolero MURAKAMI, distributed under the Boost Software License, Version 1.0.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.