GProf

Overview

  • Tutorial: 45 min

    Objectives:
    1. Understand the basics of GProf and how to use it for profiling applications.

GProf (GNU Profiler) is a performance analysis tool for Unix-like operating systems. It helps developers identify which parts of their code are consuming the most execution time, allowing them to optimize performance-critical sections. GProf works by collecting and analyzing data about function calls and execution times during the program’s runtime.

First, start an interactive job on the cluster

1qsub -I -q normal -P vp91 -l walltime=02:00:00 -l ncpus=48 -l mem=192GB -l wd

Then load the necessary modules to compile and run the application with gprof enabled.

1module load gcc/10.3.0
2module load openmpi/4.1.7
3module load cuda/12.9.0
4module load papi/7.1.0

In this tutorial, we will use the LULESH application as an example to demonstrate how to use gprof for profiling. First, set the environment variables for the source and build directories

1export SRCDIR=/scratch/vp91/$USER/LULESH
2
3export INSTALLDIR=/scratch/vp91/$USER/LULESH/build

where

  • INSTALLDIR is the directory where the application will be installed (build)

  • SRCDIR is the directory where the source code of the application is located (git repository)

Then compile the application with gprof enabled using CMake. Use the following command to configure the build:

1cmake -G Ninja \
2    -DCMAKE_BUILD_TYPE=Debug \
3    -DMPI_CXX_COMPILER="$(which mpicxx)" \
4    -DWITH_MPI=On \
5    -DWITH_OPENMP=On \
6    -DCMAKE_INSTALL_PREFIX="$INSTALLDIR" \
7    -DCMAKE_CXX_FLAGS="-pg" \
8    "$SRCDIR"
  • -DCMAKE_CXX_FLAGS="-pg" is used to enable profiling with gprof.

Now execute the following command to run the application and generate the profiling data:

1./lulesh2.0 -s 20

This will generate a file named gmon.out in the current directory after the program completes.

To ensure that the profiling data is saved in a specific location, set the environment variable GMON_OUT_PREFIX before running the application. This variable specifies the prefix for the output files generated by gprof.

1export GMON_OUT_PREFIX=/scratch/vp91/$USER/gmon.out
2
3./lulesh2.0 -s 20

To analyze the profiling data, use the following command:

1gprof ./lulesh2.0 gmon.out > profile.txt

This will create a file named profile.txt containing the profiling results, which can be viewed using any text editor or command-line tools like less or cat.

Flat Profile

The flat profile section of the gprof output provides a summary of the time spent in each function.

GProf columns in the output file:

  1. % time

    • The percentage of the total program runtime that was spent inside this function (not including functions it calls).

    • Example: If your program ran for 10 seconds total and foo() spent 4 seconds in its own code, then % time = 40%.

  2. cumulative seconds

    • The total time spent in this function and all functions it calls, up to and including this function.

    • Example: If foo() called bar() which took 3 seconds, and foo() itself took 4 seconds, then cumulative seconds = 7 seconds.

  3. self seconds

    • The total time spent in this function alone, excluding time spent in functions it calls.

    • Example: If foo() took 4 seconds and called bar() which took 3 seconds, then self seconds = 4 seconds.

  4. calls

    • The number of times this function was called during the program’s execution.

    • Example: If foo() was called 5 times, then calls = 5.

  5. self s/call

    • The average time spent in this function per call, calculated as self seconds divided by calls.

    • Example: If foo() took 4 seconds and was called 5 times, then self ms/call = 800 ms/call.

  6. total s/call

    • The average time spent in this function and all functions it calls per call, calculated as cumulative seconds divided by calls.

    • Example: If foo() took 7 seconds (including calls to bar()) and was called 5 times, then total s/call = 1400 ms/call.

  7. name

    • The name of the function being profiled.

    • Example: If the function is named foo(), then name = foo().

Call Graph

The call graph section of the gprof output provides a detailed view of function calls and their relationships.

  1. index

    • A unique identifier for each function in the call graph.

    • Example: If foo() is the first function listed, it might have index = 1.

  2. % time
    • The percentage of the total program runtime that was spent inside this function (not including functions it calls).

    • Example: If your program ran for 10 seconds total and foo() spent 4 seconds in its own code, then % time = 40%.

  3. self

    • The total time spent in this function alone, excluding time spent in functions it calls.

    • Example: If foo() took 4 seconds and called bar() which took 3 seconds, then self seconds = 4 seconds.

  4. children

    • The total time spent in functions called by this function.

    • Example: If foo() called bar() which took 3 seconds, then children = 3 seconds.

  5. called

    • The number of times this function was called during the program’s execution.

    • Example: If foo() was called 5 times, then calls = 5.

    • Sometimes total calls and the number of calls made by the parent function are shown separately.

  6. name

    • The name of the function being profiled.

    • Example: If the function is named foo(), then name = foo().

Key Points

  1. GProf is a performance analysis tool for Unix-like operating systems.

  2. It helps identify performance bottlenecks in code by analyzing function calls and execution times

  3. To use GProf with a C++ application, compile the code with the -pg flag.

  4. Run the application to generate a gmon.out file.

  5. Analyze the data using the gprof command.

  6. Flat Profile section summarizes time spent in each function.

  7. Call Graph section provides a detailed view of function calls and their relationships.