GProf ========================== .. admonition:: Overview :class: Overview * **Tutorial:** 45 min **Objectives:** #. Understand the basics of GProf and how to use it for profiling applications. GProf (GNU Profiler) is a performance analysis tool for Unix-like operating systems. It helps developers identify which parts of their code are consuming the most execution time, allowing them to optimize performance-critical sections. GProf works by collecting and analyzing data about function calls and execution times during the program's runtime. First, start an interactive job on the cluster .. code-block:: bash :linenos: qsub -I -q normal -P vp91 -l walltime=02:00:00 -l ncpus=48 -l mem=192GB -l wd Then load the necessary modules to compile and run the application with gprof enabled. .. code-block:: bash :linenos: module load gcc/10.3.0 module load openmpi/4.1.7 module load cuda/12.9.0 module load papi/7.1.0 In this tutorial, we will use the **LULESH** application as an example to demonstrate how to use gprof for profiling. First, set the environment variables for the source and build directories .. code-block:: bash :linenos: export SRCDIR=/scratch/vp91/$USER/LULESH export INSTALLDIR=/scratch/vp91/$USER/LULESH/build where * ``INSTALLDIR`` is the directory where the application will be installed (build) * ``SRCDIR`` is the directory where the source code of the application is located (git repository) Then compile the application with gprof enabled using CMake. Use the following command to configure the build: .. code-block:: bash :linenos: cmake -G Ninja \ -DCMAKE_BUILD_TYPE=Debug \ -DMPI_CXX_COMPILER="$(which mpicxx)" \ -DWITH_MPI=On \ -DWITH_OPENMP=On \ -DCMAKE_INSTALL_PREFIX="$INSTALLDIR" \ -DCMAKE_CXX_FLAGS="-pg" \ "$SRCDIR" * ``-DCMAKE_CXX_FLAGS="-pg"`` is used to enable profiling with gprof. Now execute the following command to run the application and generate the profiling data: .. code-block:: bash :linenos: ./lulesh2.0 -s 20 This will generate a file named ``gmon.out`` in the current directory after the program completes. To ensure that the profiling data is saved in a specific location, set the environment variable ``GMON_OUT_PREFIX`` before running the application. This variable specifies the prefix for the output files generated by gprof. .. code-block:: bash :linenos: export GMON_OUT_PREFIX=/scratch/vp91/$USER/gmon.out ./lulesh2.0 -s 20 To analyze the profiling data, use the following command: .. code-block:: bash :linenos: gprof ./lulesh2.0 gmon.out > profile.txt This will create a file named ``profile.txt`` containing the profiling results, which can be viewed using any text editor or command-line tools like ``less`` or ``cat``. Flat Profile ---------------- The flat profile section of the gprof output provides a summary of the time spent in each function. GProf columns in the output file: 1. **% time** * The percentage of the total program runtime that was spent inside this function (not including functions it calls). * Example: If your program ran for 10 seconds total and foo() spent 4 seconds in its own code, then % time = 40%. 2. **cumulative seconds** * The total time spent in this function and all functions it calls, up to and including this function. * Example: If foo() called bar() which took 3 seconds, and foo() itself took 4 seconds, then cumulative seconds = 7 seconds. 3. **self seconds** * The total time spent in this function alone, excluding time spent in functions it calls. * Example: If foo() took 4 seconds and called bar() which took 3 seconds, then self seconds = 4 seconds. 4. **calls** * The number of times this function was called during the program's execution. * Example: If foo() was called 5 times, then calls = 5. 5. **self s/call** * The average time spent in this function per call, calculated as self seconds divided by calls. * Example: If foo() took 4 seconds and was called 5 times, then self ms/call = 800 ms/call. 6. **total s/call** * The average time spent in this function and all functions it calls per call, calculated as cumulative seconds divided by calls. * Example: If foo() took 7 seconds (including calls to bar()) and was called 5 times, then total s/call = 1400 ms/call. 7. **name** * The name of the function being profiled. * Example: If the function is named foo(), then name = foo(). Call Graph ---------------- The call graph section of the gprof output provides a detailed view of function calls and their relationships. 1. **index** * A unique identifier for each function in the call graph. * Example: If foo() is the first function listed, it might have index = 1. 2. **% time** * The percentage of the total program runtime that was spent inside this function (not including functions it calls). * Example: If your program ran for 10 seconds total and foo() spent 4 seconds in its own code, then % time = 40%. 3. **self** * The total time spent in this function alone, excluding time spent in functions it calls. * Example: If foo() took 4 seconds and called bar() which took 3 seconds, then self seconds = 4 seconds. 4. **children** * The total time spent in functions called by this function. * Example: If foo() called bar() which took 3 seconds, then children = 3 seconds. 5. **called** * The number of times this function was called during the program's execution. * Example: If foo() was called 5 times, then calls = 5. * Sometimes total calls and the number of calls made by the parent function are shown separately. 6. **name** * The name of the function being profiled. * Example: If the function is named foo(), then name = foo(). .. admonition:: Key Points :class: hint #. GProf is a performance analysis tool for Unix-like operating systems. #. It helps identify performance bottlenecks in code by analyzing function calls and execution times #. To use GProf with a C++ application, compile the code with the `-pg` flag. #. Run the application to generate a `gmon.out` file. #. Analyze the data using the `gprof` command. #. Flat Profile section summarizes time spent in each function. #. Call Graph section provides a detailed view of function calls and their relationships.