= Valgrind & debuginfo Mark J. Wielaard I am Mark Wielaard, I hack on valgrind, I live in the Netherlands and I work for Red Hat. Where I help maintain valgrind for Fedora, RHEL and the Red Hat Developer Tools products. For the last valgrind 3.18[.1] release I helped integrate debuginfod support, written by Aaron Merey, a new way to find debuginfo files. And I improved the DWARF reading speed so that instead of seconds it only takes a few hundred milli-seconds at startup. So I would like to talk a bit on why and how valgrind uses debuginfo, which issues I faced and which other improvements we could make. = Why debuginfo? Valgrind can mostly do its work on binary code and addresses. But the user will likely appreciate symbolic function names and source files and line numbers. Specifically the user will likely provide Valgrind with symbolic function names (for example in suppression files) and Valgrind will want to report addresses as symbolic function names and when possible as source lines. When valgrind reports an issue the backtrace is the most useful piece of information. Because it not only tells where an issue was encountered, but also how we got at that location. But these days compilers often (partially) inline functions if that is cheaper than calling them. Especially with LTO (link time optimization) this is done a lot and sometimes multiple levels deep. In that case it is really helpful to show the "virtual backtrace" (this function was called from another, but instead of calling it, the compiler inlined the code into the caller). This is so useful that we enable this by default. Finally it would be nice if we could report the variables corresponding to the addresses (or registers) that are being manipulated by the program. But that is fairly expensive, and it isn’t really accurate enough, which is why –read-var-info is off by default. We used to have an experimental tool sgcheck which was supposed to be a stack and global array overrun detector, based on the var-info. But it didn't really work that well, so it has been removed. It would be really nice to figure out a better way to use the variable location information to make it less expensive and more accurate. For this talk we don't regard unwind tables debuginfo. They are described in the DWARF standard, but in practice implemented slightly differently as .eh_frame tables, which are always available in the executable and work on the address level. We also ignore that symbol names might be mangled and need to be demangled to make sense to the user, that would be interesting to discuss in another talk especially since we also enabled rust symbol demangling in the last release. And we also ignore non-DWARF debuginfo like PDB as used by Wine programs, since I simply know nothing about it, but valgrind does have a reader for it. = What debuginfo? There are three main sources of "debuginfo". First we have the symbol tables. We always have the dynamic symbols (since Valgrind only works on dynamicly linked executables). These are the exported functions that can be called between libraries and executables. Then there is the .symtab, which are the symbols needed for linking, like internal/static function names. The .symbtab is often moved into a separate .debug file because it isn't really needed at runtime. The symbol tables give us a simple mapping from address (range) to function names. Then there is the .debug_line table. This is also often moved into a separate .debug file. Technically it is a directory and file table, plus a simple program producing a matrix from address to file, line and other properties. Like the symbol table this provides us with a simple direct mapping from address to source file and line number. Till DWARF5 it isn't fully independent, you'll need the CU (Compile Unit) from the DIE (Debug Information Entry) tree to know the main file and directory. Which brings us to the last major debuginfo part. The Debug Information Entry tree, which describes the program scope, function arguments, variable locations, types and more. Where all of the above were fairly simple tables, the DIEs are described as trees per Compile Unit (CUs). It was originally designed to describe one Compilation Unit at a time. Which follows the classical compilation model of compiling one source file at a time and then linking the object files together. But that does produce a lot of duplicate information. So these days the units can also be pure type units or shared partial units. When using LTO (link time optimization) the produced units might also contain Debug Information Entries from completely separate source files. To make things even more complicated (for a DWARF reader) is that the description and encoding of the DIEs (the abbrevs) are separate from the actual data (in the .debug_info section). Which makes sense because you often have the same kind of DIEs, with the same attributes. And there are lots of cross-references to other debug sections. Like which line table is associated with a particular DIE tree. Where to find the ranges describing the program scope (function) entries. And the location lists describing where to find variables. And for those variables we also like to know the types, if only to know the sizes. When we don’t want any inline information and no variable information we use a simple DWARF reader that only reads the top of the DIE tree to extract the compile unit source file and reference to the debug_line table. But if read-inline or read-varinfo is enabled we use a full DWARF reader that reads the whole DIE tree looking for the interesting DIEs. In practice we would always use the full DWARF reader because we want to read the program scope entries, which tell us which function is inlined into another. Much of the DWARF reader speedup came from skipping all the none program scope entries which would only be interesting for the variable location information. = Where debuginfo? Sometimes we can find all the debuginfo in the binary itself (the dynamic symbol table is always there, sometime the whole symtab). But especially for distro binaries and system libraries the symtab and .debug sections are all moved into separate .debug files. If you install the debug (sub) packages they can be found through a standard build-id based path name or through the .gnu_debuglink section in the original binary that provides a file name and a standard search path. Valgrind comes with the (experimental) valgrind-di-server which serves (compressed) debuginfo files (in chunks) from the current working directory. It is completely manual, you have to find and store the .debug files by hand. It was implemented to run valgrind on a remote mobile device with limited storage, so you can serve the debug files from your workstation. I am not sure how many people are actually using it. The documentation is fairly minimal. Finally there is the new debuginfod support which works "automagically" if the debuginfod-find utility is installed and the DEBUGINFOD_URLS environment variable is set to a debuginfod server. = debuginfod-find support - Since valgrind 3.18.1, patch by Aaron Merey - spawns debuginfod-find (*) $XDG_CACHE_HOME/debuginfod_client/ Also caches negative lookups. Shared with other utilities, gdb, elfutils and binutils utilities, systemtap, etc. - Some distros now set DEBUGINFOD_URLS by default Specifically Fedora 35 does, debian also has an official debuginfod server, as do various other distros. - Federating server https://debuginfod.elfutils.org/ That URL also gives more background information on debuginfod which client programs support it and which distros have an official or experimental debuginfod server. (*) https://bugs.kde.org/show_bug.cgi?id=445011 SIGCHILD from debuginfod-find exiting might be delivered to inferior process. We know how to fix. = valgrind DWARF reader (was) slow - Exposed by debuginfod support Suddenly there was always debuginfo for everything - C++ hello world (linked against libstdc++, fedora, build with LTO then post-processed by dwz) Default valgrind (memcheck) with debuginfo (100MB total!) The is libstdc++, glibc, libgcc, etc. - Before ~12 seconds - Now ~0.45 seconds Without debuginfo ~0.25 Stopped optimizing since it was in range of "normal" overhead. = Why is/was reading debuginfo so slow? https://bugs.kde.org/show_bug.cgi?id=442061 That bug has all the patches if you are interested in the details. Basically valgrind was to eager to read all DWARF information, even when some data could be skipped, and it would not reuse some data that was shared. - Fully skip CUs and children of DIEs without addresses For inline-info we are only interested in those DIE sub-trees that contain function descriptions, the program scope entries. So we can skip any subtrees, or even a whole CU if they aren’t associated with any addresses. We partially did that already, but when we didn’t know the size of the subtree we would fall back on reading and interpretation all data anyway. We now really skip these subtrees. If we don’t know the size we might still need to read the data, but we don’t try to interpret it or store any references to it anymore. - Don’t read line tables for CUs without addresses The same was true for for reading the line tables. Some CUs only have source paths (to show where types are defined). But no associated (function) addresses. We used to read those line tables, and store those source files anyway. Now we don’t. - Reuse of line tables and abbrevs (dwz) DWARF constructs can easily be shared between units, but valgrind would not notice sharing of line tables or abbrevs between CUs. For both line tables and abbrevs we now reuse them when the previous CU used the same tables. - Lazy reading of abbrevs Often we only need to read the top of the DIE tree to determine whether we even need to read the rest of the tree. But we would first create a all the abbrevs for the CU upfront so we could read all the DIES. Now we lazily read the abbrev we need for each DIE. We do cache the abbrevs because they are shared between DIEs. But if we don’t actually read all DIEs for a CU (and with the other changes, we often don’t) we might not read all abbrevs. This was the main difference between the two DWARF readers. The simple one only read the top of the CU tree to determine where the .debug_line table was. = What more can be done? - There are still two DWARF readers But using the full fledged one really should not be slower now. So to clean up the code we really should get rid of the simple/minimal one. - DWARF6 might introduce multi-level line-table Might save us having to read the DIE trees. But have to wait on a concrete proposal (and producers to create it). - Even more lazy reading (read CU on first use of address) - Use .debug_aranges This would be a somewhat bigger rewrite. But we should know when an address range (block) is used. We could read the associated CU only on first contact. If there is a .debug_aranges section we can even find that particular CU quickly (but if there isn’t one we have to build an addresses range to CU table upfront). - Only do --read-var-info reading on error reporting Kind of same as the above. - But DWARF info is wrong way around It goes from variable (name) to location (per program scope). And the locations can be an address, constant value, a register value or even an expression combining those. I don’t know how to efficiently construct an address to variable mapping (because per program scope, and might not actually be at a particular address). Maybe we should only try to do this for globally variables? - Make debuginfod-find only read “chunks” like valgrind-di-server (or get rid of chunks?) - Long running debuginfod-find (keep connection) This is what for example gdb does. It reuses the connection to the debuginfod server to more quickly download any new debug files it needs. Might save some time on first run (and is a workaround for the debuginfod-find SIGCHILD issue). Questions?