This is a cache of https://www.linaro.org/blog/expanding-profiling-capabilities-with-windowsperf-372-release/. It is a snapshot of the page at 2024-12-13T00:53:57.490+0000.
Expanding profiling capabilities with WindowsPerf 3.7.2 release | Blog | Linaro Expanding profiling capabilities with WindowsPerf 3.7.2 release | Blog | Linaro
Linaro Logo

Expanding profiling capabilities with WindowsPerf 3.7.2 release

 Przemyslaw Wirkus

Przemyslaw Wirkus

Tuesday, September 10, 20248 min read

Submit

We are happy to announce the latest WindowsPerf bug-fix release ver. 3.7.2. This release based on ver. 3.5.0-beta is a continuation of WindowsPerf development effort. Current release focuses on extending profiling capabilities and user-space functionality. For previous releases check our GitLab release page.

These updates aim to improve the performance and stability of the kernel Driver. Please update to the latest version to benefit from these improvements!

Highlights from 3.7.2 Release

This release includes several important updates

  • New wperf man <name> command which allows users to print manual style help about Telemetry Solution data (events, metrics and group of metrics).
  • Update Telemetry Solution data with Neoverse-V2.
  • We’ve enabled Event Tracing for Windows (ETW) output on the wperf application and wperf-driver by default for Release project configuration.
  • Various command line options updates:
    • Improvements to how we define core(s), for compatibility with Linux perf CLI.
      • CPU ranges for -c are now available. Now you can use notation like -c 1-3 which corresponds to -c 1,2,3. Use wherever applicable.
      • New --cpu alias for -c.
    • Add time units to --timeout/sleep and -i command line options. Now you can use suffixes:
      • ms for milliseconds, s for seconds, m for minutes, h for hours, d for days. For example --timeout 10s for 10 seconds timeout.
      • Note: However, --timeout 2h30m is not allowed.
  • Other minor improvements include:
    • New FeatureString is added to wperf --version command to show users which ENABLE_* macros were enabled during driver and application build.
    • We now correctly escape \n and \t when we output JSON with wperf ... --json commands.
  • Preview support for Arm Statistical Profiling Extension.

Arm Statistical Profiling Extension (SPE) support in WindowsPerf

In addition to our roadmap improvements, we’ve also started working on support for Arm Statistical Profiling Extension (SPE). WindowsPerf added in sample and record command support for the SPE. SPE is an optional feature in ARMv8.2 hardware that allows CPU instructions to be sampled and associated with the source code location where that instruction occurred.

Note: Currently SPE is available on Windows On Arm in Test Mode only, due to OS SPE support limitations. WindowsPerf SPE implementation was tested on Windows 11 Pro 24H2 (OS build 26100.1150).

Users can use the same --annotate and --disassemble interface of WindowsPerf with one exception. To reach out to SPE resources please use -e command with arm_spe_0// options. For example:

> wperf record -e arm_spe_0//` -c 0 --timeout 10 -- workload.exe

SPE detection

Note: SPE support in WindowsPerf is in development! You can enable beta code of SPE support with the ENABLE_SPE macro or just rebuild the project with Debug+SPE configuration.

You can check if your hardware supports SPE or if WindowsPerf can detect it with wperf test command. See below an example of spe_device.version_name property value on a system with feature SPE. FEAT_SPE indicates your hardware has the necessary SPE extension baked in.

>wperf test
        Test Name                                           Result
        =========                                           ======
...
        spe_device.version_name                             FEAT_SPE

How do I know if WindowsPerf binaries support optional SPE?

You can check the feature string (FeatureString) of both wperf and wperf-driver with the wperf --version command. If FeatureString for both components (wperf and wperf-driver) contains +spe (and spe_device.version_name contains FEAT_SPE) you are good to go!

> wperf --version
        Component     Version  GitVer          FeatureString
        =========     =======  ======          =============
        wperf         3.7.2    4338371a        +etw-app+spe
        wperf-driver  3.7.2    4338371a        +trace+spe

How to use SPE from command line

Users can specify SPE filters with arm_spe_0//. We added CLI parser function for -e arm_spe_0/*/ notation for record command. Where * is a comma separated list of supported filters. Currently we support filters. Users can define filters such as store_filter=, load_filter=, branch_filter= or short equivalents like st=, ld= and b=. Use 0 or 1 to disable or enable a given filter. For example:

> wperf record -e arm_spe_0/branch_filter=1/
> wperf record -e arm_spe_0/load_filter=1,branch_filter=0/
> wperf record -e arm_spe_0/st=0,ld=0,b=1/

SPE register PMSFCR_EL1.FT enables filtering by operation type. When enabled PMSFCR_EL1.{ST, LD, B} define the collected types:

  • ST enables collection of store sampled operations, including all atomic operations.
  • LD enables collection of load sampled operations, including atomic operations that return a value to a register.
  • B enables collection of branch sampled operations, including direct and indirect branches and exception returns.

Filters store_filter=, load_filter=, branch_filter= enable these filters.

Example

> wperf record -e arm_spe_0/ld=1/ -c 8 -- cpython\PCbuild\arm64\python_d.exe -c 10**10**100
base address of 'cpython\PCbuild\arm64\python_d.exe': 0x7ff69e251288, runtime delta: 0x7ff55e250000
sampling ...e..........e... done!
======================== sample source: LOAD_STORE_ATOMIC-LOAD-GP/retired+level1-data-cache-access+tlb_access, top 50 hot functions ========================
        overhead  count  symbol
        ========  =====  ======
           93.75     15  x_mul:python312_d.dll
            6.25      1  _Py_ThreadCanHandleSignals:python312_d.dll
100.00%        16  top 2 in total

Annotate example with ld=1 filter enabled: enables collection of load sampled operations, including atomic operations that return a value to a register.

WindowsPerf: what’s next

The WindowsPerf ecosystem has recently expanded its capabilities and now supports a Visual Studio extension that is readily available in the Microsoft Marketplace. This integration allows developers to leverage the robust performance analysis of WindowsPerf ver. 3.7.2 directly within the familiar environment of Visual Studio, enhancing productivity and efficiency. Read the extension Wiki pages for more details.

Furthermore, we are excited to announce that our team is actively working on extending this support to Visual Studio Code. This upcoming feature will broaden the accessibility of WindowsPerf, enabling even more developers to optimize their applications with ease. Stay tuned for more updates on this front!

What to expect in the next release?

  • Stable support for Statistical Profiling Extension, please note that we may release a version with stable SPE-beta support before release 4.0.0 planned at the end of September.
  • Improved documentation on Event Tracing for Windows (ETW) support. We plan to write a comprehensive tutorial on how to use WindowsPerf and its WPA-plugin-etl.

Where to find us?

For source code and binary releases please visit our WindowsPerf webpage at GitLab. Additional project resources include WindowsPerf Wiki and WindowsPerf JIRA project board.

If you have any questions, issues you would like to raise please visit our WindowsPerf GitLab issue page and create a new issue with a clear description of the problem you’re facing or issue you want help with.