GProf
Overview
Tutorial: 45 min
- Objectives:
Understand the basics of GProf and how to use it for profiling applications.
GProf (GNU Profiler) is a performance analysis tool for Unix-like operating systems. It helps developers identify which parts of their code are consuming the most execution time, allowing them to optimize performance-critical sections. GProf works by collecting and analyzing data about function calls and execution times during the program’s runtime.
First, start an interactive job on the cluster
1qsub -I -q normal -P vp91 -l walltime=02:00:00 -l ncpus=48 -l mem=192GB -l wd
Then load the necessary modules to compile and run the application with gprof enabled.
1module load gcc/10.3.0
2module load openmpi/4.1.7
3module load cuda/12.9.0
4module load papi/7.1.0
In this tutorial, we will use the LULESH application as an example to demonstrate how to use gprof for profiling. First, set the environment variables for the source and build directories
1export SRCDIR=/scratch/vp91/$USER/LULESH
2
3export INSTALLDIR=/scratch/vp91/$USER/LULESH/build
where
INSTALLDIRis the directory where the application will be installed (build)SRCDIRis the directory where the source code of the application is located (git repository)
Then compile the application with gprof enabled using CMake. Use the following command to configure the build:
1cmake -G Ninja \
2 -DCMAKE_BUILD_TYPE=Debug \
3 -DMPI_CXX_COMPILER="$(which mpicxx)" \
4 -DWITH_MPI=On \
5 -DWITH_OPENMP=On \
6 -DCMAKE_INSTALL_PREFIX="$INSTALLDIR" \
7 -DCMAKE_CXX_FLAGS="-pg" \
8 "$SRCDIR"
-DCMAKE_CXX_FLAGS="-pg"is used to enable profiling with gprof.
Now execute the following command to run the application and generate the profiling data:
1./lulesh2.0 -s 20
This will generate a file named gmon.out in the current directory after the program completes.
To ensure that the profiling data is saved in a specific location, set the environment variable
GMON_OUT_PREFIX before running the application. This variable specifies the prefix for the
output files generated by gprof.
1export GMON_OUT_PREFIX=/scratch/vp91/$USER/gmon.out
2
3./lulesh2.0 -s 20
To analyze the profiling data, use the following command:
1gprof ./lulesh2.0 gmon.out > profile.txt
This will create a file named profile.txt containing the profiling results, which can be
viewed using any text editor or command-line tools like less or cat.
Flat Profile
The flat profile section of the gprof output provides a summary of the time spent in each function.
GProf columns in the output file:
% time
The percentage of the total program runtime that was spent inside this function (not including functions it calls).
Example: If your program ran for 10 seconds total and foo() spent 4 seconds in its own code, then % time = 40%.
cumulative seconds
The total time spent in this function and all functions it calls, up to and including this function.
Example: If foo() called bar() which took 3 seconds, and foo() itself took 4 seconds, then cumulative seconds = 7 seconds.
self seconds
The total time spent in this function alone, excluding time spent in functions it calls.
Example: If foo() took 4 seconds and called bar() which took 3 seconds, then self seconds = 4 seconds.
calls
The number of times this function was called during the program’s execution.
Example: If foo() was called 5 times, then calls = 5.
self s/call
The average time spent in this function per call, calculated as self seconds divided by calls.
Example: If foo() took 4 seconds and was called 5 times, then self ms/call = 800 ms/call.
total s/call
The average time spent in this function and all functions it calls per call, calculated as cumulative seconds divided by calls.
Example: If foo() took 7 seconds (including calls to bar()) and was called 5 times, then total s/call = 1400 ms/call.
name
The name of the function being profiled.
Example: If the function is named foo(), then name = foo().
Call Graph
The call graph section of the gprof output provides a detailed view of function calls and their relationships.
index
A unique identifier for each function in the call graph.
Example: If foo() is the first function listed, it might have index = 1.
- % time
The percentage of the total program runtime that was spent inside this function (not including functions it calls).
Example: If your program ran for 10 seconds total and foo() spent 4 seconds in its own code, then % time = 40%.
self
The total time spent in this function alone, excluding time spent in functions it calls.
Example: If foo() took 4 seconds and called bar() which took 3 seconds, then self seconds = 4 seconds.
children
The total time spent in functions called by this function.
Example: If foo() called bar() which took 3 seconds, then children = 3 seconds.
called
The number of times this function was called during the program’s execution.
Example: If foo() was called 5 times, then calls = 5.
Sometimes total calls and the number of calls made by the parent function are shown separately.
name
The name of the function being profiled.
Example: If the function is named foo(), then name = foo().
Key Points
GProf is a performance analysis tool for Unix-like operating systems.
It helps identify performance bottlenecks in code by analyzing function calls and execution times
To use GProf with a C++ application, compile the code with the -pg flag.
Run the application to generate a gmon.out file.
Analyze the data using the gprof command.
Flat Profile section summarizes time spent in each function.
Call Graph section provides a detailed view of function calls and their relationships.