Overview

HPerf reads a Linux perf trace (perf.data) and annotates the corresponding disassembly. This is similar to perf-report / perf-annotate, but with a GUI, a different layout, and additional features.

Download hperf 6.14.5. Live demo.

Features

Hotspot pinpointing

Hotspots consist of groups of contiguous instructions with high trace samples counts (as opposed to a single instruction).
Hotspot detection still works in the absence of symbol information.
Still, per-symbol sample counts are available as well.

Branch stack / branch sampling

Requires perf record -b on Intel, perf record -b -e cycles,branches on AMD.

Branch taken and branch mispredict statistics.
Jump landings (count, jump source).
Cycle count per branchless span.

Assembly and source visualization

Side-by-side assembly and source code (as opposed to interleaved, like the output of objdump).
Assembly-source highlighting (through hover and selection).
Syntax highlighting for source code.

How it works

The hperf command reads a perf.data trace file and outputs a single self-contained html file (with both data and a javascript UI). The UI can be customized with user-provided css.

Limitations

HPerf is well suited for long perf traces, but generation may be slow with large binaries. This is because it will get from objdump the full disassembly of all the DSOs encountered in the trace, and all of it needs to fit in memory. Trace samples are then counted against their corresponding instruction, allowing for arbitrarily long traces. Note that the output will contain the disassembly of all hotspots (plus some context) and the content of all corresponding source files.

Dependencies

Build dependencies: gcc or clang, make

Runtime dependencies: perf, objdump, highlight (optional), a browser with javascript enabled.

Building

make

Usage

Usage: hperf [options]

Options:

  -i   file         input file, produced by perf-record (default: perf.data)
  -o   file         output file (default: report.html)
  -s   count[%]     minimum number of samples per insn (default: 1)
  -t   count[%]     minimum total number of samples per hotspot (default: 2)
  -c   n            merge hotspots separated by up to n insn (default: 5)
  -d   n            output n insn before and after hotspots (default: 100)
  -S   file         replacement css file path (default: none)
  -A   file         additional css file path (default: none)
  -T   theme        initial theme: 'dark' or 'light' (default: light)
  -v   level        verbosity level (default: 1)

Change log

2025-05-21 hperf 6.14.5 Fixed handling of code blocks with no source information in objdump output.
2024-05-15 hperf 6.8.8 Added support for C++ symbol information, which may contain spaces.
2023-12-19 hperf 6.6.3 Graciously skip syntax highlight when highlight is unavailable. Fixed handling of unavailable source files.
2023-11-07 hperf 6.5.4 Adjusted for changes in perf-script output. Made hperf version number follow supported perf version in lockstep.
2023-03-14 hperf 1.4 Added disassembly caching.
2023-03-01 hperf 1.2 Adjusted for changes in perf-script output.
2022-10-25 hperf 1.1 Support branch samples in perf-script output.
2022-01-27 hperf 1.0 First release.

Author

Laurent Poirrier

HPerf

Linux perf trace visualizer