The options used for basic PGO optimizations are:
-prof_gen for generating instrumented code
-prof_use for generating a profile-optimized executable
In cases where your code behavior differs greatly between executions, you have to ensure that the benefit of the profile information is worth the effort required to maintain up-to-date profiles. In the basic profile-guided optimization, the following options are used in the phases of the PGO:
The -prof_gen option instruments the program for profiling to get the execution count of each basic block. It is used in phase 1 of the PGO to instruct the compiler to produce instrumented code in your object files in preparation for instrumented execution. Parallel make is automatically supported for -prof_gen compilations.
The -prof_use option is used in phase 3 of the PGO to instruct the compiler to produce a profile-optimized executable and merges available dynamic-information (.dyn) files into a pgopti.dpi file.
The dynamic-information files are produced in phase 2 when you run the instrumented executable.
If you perform multiple executions of the instrumented program, -prof_use merges the dynamic-information files again and overwrites the previous pgopti.dpi file.
-fnsplit- disables function splitting. Function splitting is enabled by -prof_use in phase 3 to improve code locality by splitting routines into different sections: one section to contain the cold or very infrequently executed code and one section to contain the rest of the code (hot code).
You can use -fnsplit- to disable function splitting for the following reasons:
Most importantly, to get improved debugging capability. In the debug symbol table, it is difficult to represent a split routine, that is, a routine with some of its code in the hot code section and some of its code in the cold code section.
The -fnsplit- option disables the splitting within a routine but enables function grouping, an optimization in which entire routines are placed either in the cold code section or the hot code section. Function grouping does not degrade debugging capability.
Another reason can arise when the profile data does not represent the actual program behavior, that is, when the routine is actually used frequently rather than infrequently.
Note
For ItaniumŪ-based applications, if you intend to use the -prof_use
option with optimizations at the -O3
level, the -O3 option must be on. If you intend
to use the -prof_use option with optimizations
at the -O2 level or lower, you can generate the
profile data with the default options.