Debugging Multithreaded Programs

The debugging of multithreaded program discussed in this section applies to both the OpenMP Fortran API and the Intel Fortran parallel compiler directives. When a program uses parallel decomposition directives, you must take into consideration that the bug might be caused either by an incorrect program statement or it might be caused by an incorrect parallel decomposition directive. In either case, the program to be debugged can be executed by multiple threads simultaneously.

To debug the multithreaded programs, you can use:

Intel Debugger for IA-32 and Intel Debugger for Itanium-based applications (idb)
Intel Fortran Compiler debugging options and methods; in particular, Compiling Source Lines with Debugging Statements.
Intel parallelization extension routines for low-level debugging.
VTune(TM) Performance Analyzer to define the problematic areas.

Other best known debugging methods and tips include:

Correct the program in single-threaded, uni-processor environment
Statically analyze locks
Use trace statement (such as print statement)
Think in parallel, make very few assumptions
Step through your code
Make sense of threads and callstack information
Identify the primary thread
Know what thread you are debugging
Single stepping in one thread does not mean single stepping in others

Watch out for context switch

Debugger Limitations for Multithread Programs

Debuggers such as Intel Debugger for IA-32 and Intel Debugger for Itanium-based applications support the debugging of programs that are executed by multiple threads. However, the currently available versions of such debuggers do not directly support the debugging of parallel decomposition directives, and therefore, there are limitations on the debugging features.

Some of the new features used in OpenMP are not yet fully supported by the debuggers, so it is important to understand how these features work to know how to debug them. The two problem areas are:

Multiple entry points
Shared variables

You can use routine names (for example, padd) and entry names (for example, _PADD, ___PADD_6__par_loop0). FORTRAN Compiler, by default, first mangles lower/mixed case routine names to upper case. For example, pAdD() becomes PADD(), and this becomes entry name by adding one underscore. The secondary entry name mangling happens after that. That's why "__par_loop" part of the entry name stays as lower case. Debugger for some reason didn't take the upper case routine name "PADD" to set the breakpoint. Instead, it accepted the lower case routine name "padd".

Debugging Parallel Regions

The compiler implements a parallel region by enabling the code in the region and putting it into a separate, compiler-created entry point. Although this is different from outlining – the technique employed by other compilers, that is, creating a subroutine, – the same debugging technique can be applied.

Constructing an Entry-point Name

The compiler-generated parallel region entry point name is constructed with a concatenation of the following strings:

"__" character
entry point name for the original routine (for example, _parallel)
"_" character
line number of the parallel region
__par_region for OpenMP parallel regions (!$OMP PARALLEL)

__par_loop for OpenMP parallel loops (!$OMP PARALLEL DO),

__par_section for OpenMP parallel sections (!$OMP PARALLEL SECTIONS)

sequence number of the parallel region (for each source file, sequence number starts from zero.)

Debugging Code with Parallel Region

Example 1 illustrates the debugging of the code with parallel region. Example 1 is produced by this command:

ifc -openmp -g -O0 -S file.f90

Let us consider the code of subroutine parallelin Example 1.

Subroutine PARALLEL() source listing

1 subroutine parallel

2 integer id,OMP_GET_THREAD_NUM

3 !$OMP PARALLEL PRIVATE(id)

4 id = OMP_GET_THREAD_NUM()

5 !$OMP END PARALLEL

6 end

The parallel region is at line 3. The compiler created two entry points: parallel_ and ___parallel_3__par_region0. The first entry point corresponds to the subroutine parallel(), while the second entry point corresponds to the OpenMP parallel region at line 3.

Example 1 Debuging Code with Parallel Region

Machine Code Listing of the Subroutine parallel()

        .globl parallel_
parallel_:
..B1.1:                    # Preds ..B1.0
..LN1:
pushl     %ebp                                    #1.0
movl      %esp, %ebp                              #1.0
subl      $44, %esp                               #1.0
pushl     %edi                                    #1.0
movl      $.2.1_2_kmpc_loc_struct_pack.0, (%esp) #1.0
call      __kmpc_global_thread_num                #1.0
                     # LOE eax
..B1.21:                   # Preds ..B1.1
addl      $4, %esp                                #1.0
movl      %eax, -44(%ebp)                         #1.0
                     # LOE
..B1.2:                   # Preds ..B1.21
movl      -44(%ebp), %eax                        #1.0
movl      %eax, -24(%ebp)                        #1.0
..LN2:
pushl     %edi                                   #3.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #3.0
call      __kmpc_ok_to_fork                      #3.0
                     # LOE eax
..B1.22:                  # Preds ..B1.2
addl      $4, %esp                               #3.0
movl      %eax, -40(%ebp)                        #3.0
                     # LOE
..B1.3:                   # Preds ..B1.22
movl      -40(%ebp), %eax                        #3.0
testl     %eax, %eax                             #3.0
jne       ..B1.7     # Prob 50%                  #3.0
                     # LOE
..B1.4:                   # Preds ..B1.3
addl      $-8, %esp                              #3.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #3.0
movl      -24(%ebp), %eax                        #3.0
movl      %eax, 4(%esp)                          #3.0
call      __kmpc_serialized_parallel             #3.0
                     # LOE
..B1.23:                  # Preds ..B1.4
addl      $8, %esp                               #3.0
                     # LOE
..B1.5:                   # Preds ..B1.23
addl      $-8, %esp                              #3.0
lea       -24(%ebp), %eax                        #3.0
movl      %eax, (%esp)                           #3.0
movl      $___kmpv_zeroparallel__0, 4(%esp)      #3.0
call      _parallel__3__par_region0              #3.0
                      # LOE
..B1.24:                   # Preds ..B1.5
addl      $8, %esp                               #3.0
                      # LOE
..B1.6:                    # Preds ..B1.24
addl      $-8, %esp                              #3.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #3.0
movl      -24(%ebp), %eax                        #3.0
movl      %eax, 4(%esp)                          #3.0
call      __kmpc_end_serialized_parallel         #3.0
                       # LOE
..B1.25:                    # Preds ..B1.6
addl      $8, %esp                               #3.0
jmp       ..B1.8       # Prob 100%               #3.0
                       # LOE
..B1.7:                     # Preds ..B1.3
addl      $-12, %esp                             #3.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #3.0
movl      $0, 4(%esp)                            #3.0
movl      $_parallel__3__par_region0, 8(%esp)    #3.0
call      __kmpc_fork_call                       #3.0
                       # LOE
..B1.26:                    # Preds ..B1.7
addl      $12, %esp                              #3.0
                       # LOE
..B1.8:                     # Preds ..B1.26 ..B1.25
..LN3:
pushl     %edi                                   #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.2, (%esp) #6.0
call      __kmpc_ok_to_fork                      #6.0
                       # LOE eax
..B1.27:                    # Preds ..B1.8
addl      $4, %esp                               #6.0
movl      %eax, -36(%ebp)                        #6.0
                       # LOE
..B1.9:                     # Preds ..B1.27
movl      -36(%ebp), %eax                        #6.0
testl     %eax, %eax                             #6.0
jne       ..B1.13      # Prob 50%                #6.0
                       # LOE
..B1.10:                    # Preds ..B1.9
addl      $-8, %esp                              #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.2, (%esp) #6.0
movl      -24(%ebp), %eax                        #6.0
movl      %eax, 4(%esp)                          #6.0
call      __kmpc_serialized_parallel             #6.0
                       # LOE
..B1.28:                    # Preds ..B1.10
addl      $8, %esp                               #6.0
                       # LOE
..B1.11:                    # Preds ..B1.28
addl      $-8, %esp                              #6.0
lea       -24(%ebp), %eax                        #6.0
movl      %eax, (%esp)                           #6.0
movl      $___kmpv_zeroparallel__1, 4(%esp)      #6.0
call      _parallel__6__par_region1              #6.0
                       # LOE
..B1.29:                    # Preds ..B1.11
addl      $8, %esp                               #6.0
                       # LOE
..B1.12:                    # Preds ..B1.29
addl      $-8, %esp                              #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.2, (%esp) #6.0
movl      -24(%ebp), %eax                        #6.0
movl      %eax, 4(%esp)                          #6.0
call      __kmpc_end_serialized_parallel         #6.0
                       # LOE
..B1.30:                    # Preds ..B1.12
addl      $8, %esp                               #6.0
jmp       ..B1.14      # Prob 100%               #6.0
                       # LOE
..B1.13:                    # Preds ..B1.9
addl      $-12, %esp                             #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.2, (%esp) #6.0
movl      $0, 4(%esp)                            #6.0
movl      $_parallel__6__par_region1, 8(%esp)    #6.0
call      __kmpc_fork_call                       #6.0
                       # LOE
..B1.31:                    # Preds ..B1.13
addl      $12, %esp                              #6.0
                       # LOE
..B1.14:                    # Preds ..B1.31 ..B1.30
..LN4:
leave                                            #9.0
ret                                              #9.0
                       # LOE
.type parallel_,@function
.size parallel_,.-parallel_
.globl _parallel__3__par_region0
_parallel__3__par_region0:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
..B1.15:                    # Preds ..B1.0
pushl     %ebp                                   #9.0
movl      %esp, %ebp                             #9.0
subl      $44, %esp                              #9.0
..LN5:
call      omp_get_thread_num_                    #4.0
                       # LOE eax
..B1.32:                    # Preds ..B1.15
movl      %eax, -32(%ebp)                        #4.0
                       # LOE
..B1.16:                    # Preds ..B1.32
movl      -32(%ebp), %eax                        #4.0
movl      %eax, -20(%ebp)                        #4.0
..LN6:
leave                                            #9.0
ret                                              #9.0
                       # LOE
.type _parallel__3__par_region0,@function
.size _parallel__3__par_region0,._parallel__3__par_region0
.globl _parallel__6__par_region1
_parallel__6__par_region1:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
..B1.17:                     # Preds ..B1.0
pushl     %ebp                                   #9.0
movl      %esp, %ebp                             #9.0
subl      $44, %esp                              #9.0
..LN7:
call      omp_get_thread_num_                    #7.0
                       # LOE eax
..B1.33:                    # Preds ..B1.17
movl      %eax, -28(%ebp)                        #7.0
                       # LOE
..B1.18:                    # Preds ..B1.33
movl      -28(%ebp), %eax                        #7.0
movl      %eax, -16(%ebp)                        #7.0
..LN8:
leave                                            #9.0
ret                                              #9.0
.align    4,0x90
# mark_end;

Debugging the program at this level is just like debugging a program that uses POSIX threads directly. Breakpoints can be set in the threaded code just like any other routine. With GNU debugger, breakpoints can be set to source-level routine names (such as parallel). Breakpoints can also be set to entry point names (such as parallel_ and _parallel__3__par_region0). Note that Intel Fortran Compiler for Linux converted the upper case Fortran subroutine name to the lower case one.

Debugging Multiple Threads

When in a debugger, you can switch from one thread to another. Each thread has its own program counter so each thread can be in a different place in the code. Example 2 shows a Fortran subroutine PADD(). A breakpoint can be set at the entry point of OpenMP parallel region.

Source listing of the Subroutine PADD()

12.       SUBROUTINE PADD(A, B, C, N)
13.       INTEGER N
14.       INTEGER A(N), B(N), C(N)
15.       INTEGER I, ID, OMP_GET_THREAD_NUM
16. !$OMP PARALLEL DO SHARED (A, B, C, N) PRIVATE(ID)
17.       DO I = 1, N
18.         ID = OMP_GET_THREAD_NUM()
19.         C(I) = A(I) + B(I) + ID
20.       ENDDO
21. !$OMP END PARALLEL DO
22.       END

The Call Stack Dumps

The first call stack below is obtained by breaking at the entry to subroutine PADD using GNU debugger. At this point, the program has not executed any OpenMP regions, and therefore has only one thread. The call stack shows a system runtime __libc_start_main function calling the Fortran main program parallel(), and parallel() calls subroutine padd(). When the program is executed by more than one thread, you can switch from one thread to another. The second and the third call stacks are obtained by breaking at the entry to the parallel region. The call stack of master contains the complete call sequence. At the top of the call stack is _padd__6__par_loop0(). Invocation of a threaded entry point involves a layer of Intel OpenMP library function calls (that is, functions with __kmp prefix). The call stack of the worker thread contains a partial call sequence that begins with a layer of Intel OpenMP library function calls.

ERRATA: GNU debugger sometimes fails to properly unwind the call stack of the immediate caller of Intel OpenMP library function __kmpc_fork_call().

Call Stack Dump of Master Thread upon Entry to Subroutine PADD

Switching from One Thread to Another

Call Stack Dump of Master Thread upon Entry to Parallel Region

Call Stack Dump of Worker Thread upon Entry to Parallel Region

Example 2 Debugging Code Using Multiple Threads with Shared Variables

Subroutine PADD() Machine Code Listing

     .globl padd_
padd_:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
# parameter 3: 16 + %ebp
# parameter 4(n): 20 + %ebp
..B1.1:                      # Preds ..B1.0
..LN1:
pushl     %ebp                                   #1.0
movl      %esp, %ebp                             #1.0
subl      $208, %esp                             #1.0
movl      %ebx, -4(%ebp)                         #1.0
pushl     %edi                                   #1.0
movl      $.2.1_2_kmpc_loc_struct_pack.0, (%esp) #1.0
call      __kmpc_global_thread_num               #1.0
                        # LOE eax
..B1.34:                     # Preds ..B1.1
addl      $4, %esp                               #1.0
movl      %eax, -28(%ebp)                        #1.0
                        # LOE
..B1.2:                      # Preds ..B1.34
movl      -28(%ebp), %eax                        #1.0
movl      %eax, -208(%ebp)                       #1.0
movl      $4, %eax                               #1.0
movl      %eax, -184(%ebp)                       #1.0
movl      %eax, -188(%ebp)                       #1.0
movl      20(%ebp), %eax                         #1.0
movl      (%eax), %eax                           #1.0
movl      %eax, -24(%ebp)                        #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.5        # Prob 50%               #1.0
                       # LOE
..B1.3:                     # Preds ..B1.2
  movl      $0, -24(%ebp)                       #1.0
                       # LOE
..B1.5:                     # Preds ..B1.2 ..B1.3
movl      -24(%ebp), %eax                        #1.0
movl      %eax, -164(%ebp)                       #1.0
movl      $1, %eax                               #1.0
movl      %eax, -176(%ebp)                       #1.0
movl      %eax, -168(%ebp)                       #1.0
movl      20(%ebp), %edx                         #1.0
movl      (%edx), %edx                           #1.0
movl      %edx, -172(%ebp)                       #1.0
movl      -164(%ebp), %edx                       #1.0
movl      %edx, -192(%ebp)                       #1.0
movl      8(%ebp), %edx                          #1.0
movl      %edx, -196(%ebp)                       #1.0
movl      $4, -204(%ebp)                         #1.0
movl      -204(%ebp), %edx                       #1.0
negl      %edx                                   #1.0
addl      -196(%ebp), %edx                       #1.0
movl      %edx, -200(%ebp)                       #1.0
movl      %eax, -180(%ebp)                       #1.0
movl      -192(%ebp), %eax                       #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.8        # Prob 50%               #1.0
                       # LOE
..B1.6:                     # Preds ..B1.5
movl      -172(%ebp), %eax                       #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.8        # Prob 50%               #1.0
                       # LOE
..B1.7:                     # Preds ..B1.6
movl      $0, -172(%ebp)                         #1.0
                       # LOE
..B1.8:                     # Preds ..B1.6 ..B1.7 ..B1.5
movl      $4, %eax                               #1.0
movl      %eax, -140(%ebp)                       #1.0
movl      %eax, -144(%ebp)                       #1.0
movl      $1, %edx                               #1.0
movl      %edx, -132(%ebp)                       #1.0
movl      %edx, -124(%ebp)                       #1.0
movl      20(%ebp), %ecx                         #1.0
movl      (%ecx), %ecx                           #1.0
movl      %ecx, -128(%ebp)                       #1.0
movl      -164(%ebp), %ecx                       #1.0
movl      %ecx, -148(%ebp)                       #1.0
movl      12(%ebp), %ecx                         #1.0
movl      %ecx, -152(%ebp)                       #1.0
movl      %eax, -160(%ebp)                       #1.0
movl      -160(%ebp), %eax                       #1.0
negl      %eax                                   #1.0
addl      -152(%ebp), %eax                       #1.0
movl      %eax, -156(%ebp)                       #1.0
movl      %edx, -136(%ebp)                       #1.0
movl      -148(%ebp), %eax                       #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.11       # Prob 50%               #1.0
                       # LOE
..B1.9:                     # Preds ..B1.8
movl      -128(%ebp), %eax                       #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.11       # Prob 50%               #1.0
                       # LOE
..B1.10:                    # Preds ..B1.9
movl      $0, -128(%ebp)                         #1.0
                       # LOE
..B1.11:                    # Preds ..B1.9 ..B1.10 ..B1.8
movl      $4, %eax                               #1.0
movl      %eax, -100(%ebp)                       #1.0
movl      %eax, -104(%ebp)                       #1.0
movl      $1, %edx                               #1.0
movl      %edx, -92(%ebp)                        #1.0
movl      %edx, -84(%ebp)                        #1.0
movl      20(%ebp), %ecx                         #1.0
movl      (%ecx), %ecx                           #1.0
movl      %ecx, -88(%ebp)                        #1.0
movl      -164(%ebp), %ecx                       #1.0
movl      %ecx, -108(%ebp)                       #1.0
movl      16(%ebp), %ecx                         #1.0
movl      %ecx, -112(%ebp)                       #1.0
movl      %eax, -120(%ebp)                       #1.0
movl      -120(%ebp), %eax                       #1.0
negl      %eax                                   #1.0
addl      -112(%ebp), %eax                       #1.0
movl      %eax, -116(%ebp)                       #1.0
movl      %edx, -96(%ebp)                        #1.0
movl      -108(%ebp), %eax                       #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.14       # Prob 50%               #1.0
                       # LOE
..B1.12:                   # Preds ..B1.11
movl      -88(%ebp), %eax                        #1.0
testl     %eax, %eax                             #1.0
jg        ..B1.14       # Prob 50%               #1.0
                       # LOE
..B1.13:                   # Preds ..B1.12
movl      $0, -88(%ebp)                          #1.0
                       # LOE
..B1.14:                   # Preds ..B1.12 ..B1.13 ..B1.11
..LN2:
pushl     %edi                                   #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #6.0
call      __kmpc_ok_to_fork                      #6.0
                        # LOE eax
..B1.35:                    # Preds ..B1.14
addl      $4, %esp                               #6.0
movl      %eax, -20(%ebp)                        #6.0
                        # LOE
..B1.15:                    # Preds ..B1.35
movl      -20(%ebp), %eax                        #6.0
testl     %eax, %eax                             #6.0
jne       ..B1.19        # Prob 50%              #6.0
                        # LOE
..B1.16:                    # Preds ..B1.15
addl      $-8, %esp                              #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #6.0
movl      -208(%ebp), %eax                       #6.0
movl      %eax, 4(%esp)                          #6.0
call      __kmpc_serialized_parallel             #6.0
                         # LOE
..B1.36:                     # Preds ..B1.16
addl      $8, %esp                               #6.0
                         # LOE
..B1.17:                     # Preds ..B1.36
addl      $-24, %esp                             #6.0
lea       -208(%ebp), %eax                       #6.0
movl      %eax, (%esp)                           #6.0
movl      $___kmpv_zeropadd__0, 4(%esp)          #6.0
movl      -196(%ebp), %eax                       #6.0
movl      %eax, 8(%esp)                          #6.0
movl      -152(%ebp), %eax                       #6.0
movl      %eax, 12(%esp)                         #6.0
movl      -112(%ebp), %eax                       #6.0
movl      %eax, 16(%esp)                         #6.0
lea       20(%ebp), %eax                         #6.0
movl      %eax, 20(%esp)                         #6.0
call      _padd__6__par_loop0                    #6.0
                        # LOE
..B1.37:                    # Preds ..B1.17
addl      $24, %esp                              #6.0
                        # LOE
..B1.18:                    # Preds ..B1.37
addl      $-8, %esp                              #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #6.0
movl      -208(%ebp), %eax                       #6.0
movl      %eax, 4(%esp)                          #6.0
call      __kmpc_end_serialized_parallel         #6.0
                        # LOE
..B1.38:                    # Preds ..B1.18
addl      $8, %esp                               #6.0
jmp       ..B1.31        # Prob 100%             #6.0
                        # LOE
..B1.19:                    # Preds ..B1.15
addl      $-28, %esp                             #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #6.0
movl      $4, 4(%esp)                            #6.0
movl      $_padd__6__par_loop0, 8(%esp)          #6.0
movl      -196(%ebp), %eax                       #6.0
movl      %eax, 12(%esp)                         #6.0
movl      -152(%ebp), %eax                       #6.0
movl      %eax, 16(%esp)                         #6.0
movl      -112(%ebp), %eax                       #6.0
movl      %eax, 20(%esp)                         #6.0
lea       20(%ebp), %eax                         #6.0
movl      %eax, 24(%esp)                         #6.0
call      __kmpc_fork_call                       #6.0
                        # LOE
..B1.39:                    # Preds ..B1.19
addl      $28, %esp                              #6.0
jmp       ..B1.31        # Prob 100%             #6.0
                        # LOE
..B1.20:                    # Preds ..B1.30
movl      $1, %eax                               #6.0
movl      %eax, -72(%ebp)                        #6.0
..LN3:
movl      -80(%ebp), %edx                        #10.0
..LN4:
movl      %edx, -68(%ebp)                        #6.0
..LN5:
movl      -80(%ebp), %edx                        #10.0
..LN6:
movl      %edx, -64(%ebp)                        #6.0
movl      $0, -60(%ebp)                          #6.0
movl      %eax, -56(%ebp)                        #6.0
addl      $-36, %esp                             #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #6.0
movl      -8(%ebp), %edx                         #6.0
movl      %edx, 4(%esp)                          #6.0
movl      $34, 8(%esp)                           #6.0
lea       -60(%ebp), %edx                        #6.0
movl      %edx, 12(%esp)                         #6.0
lea       -72(%ebp), %edx                        #6.0
movl      %edx, 16(%esp)                         #6.0
lea       -68(%ebp), %edx                        #6.0
movl      %edx, 20(%esp)                         #6.0
lea       -56(%ebp), %edx                        #6.0
movl      %edx, 24(%esp)                         #6.0
movl      %eax, 28(%esp)                         #6.0
movl      %eax, 32(%esp)                         #6.0
call      __kmpc_for_static_init_4               #6.0
                        # LOE
..B1.40:                    # Preds ..B1.20
addl      $36, %esp                              #6.0
                        # LOE
..B1.21:                    # Preds ..B1.40
movl      -72(%ebp), %eax                        #6.0
movl      -64(%ebp), %edx                        #6.0
cmpl      %edx, %eax                             #6.0
jg        ..B1.26        # Prob 50%              #6.0
                        # LOE
..B1.22:                    # Preds ..B1.21
movl      -68(%ebp), %eax                        #6.0
movl      -64(%ebp), %edx                        #6.0
cmpl      %edx, %eax                             #6.0
jg        ..B1.24        # Prob 50%              #6.0
                        # LOE
..B1.23:                    # Preds ..B1.22
movl      -68(%ebp), %eax                        #6.0
movl      %eax, -16(%ebp)                        #6.0
jmp       ..B1.25        # Prob 100%             #6.0
                        # LOE
..B1.24:                    # Preds ..B1.22
movl      -64(%ebp), %eax                        #6.0
movl      %eax, -16(%ebp)                        #6.0
                        # LOE
..B1.25:                    # Preds ..B1.24 ..B1.23
movl      -16(%ebp), %eax                        #6.0
movl      %eax, -68(%ebp)                        #6.0
movl      -72(%ebp), %eax                        #6.0
movl      %eax, -76(%ebp)                        #6.0
jmp       ..B1.27        # Prob 100%             #6.0
                        # LOE
..B1.26:                    # Preds ..B1.28 ..B1.21
addl      $-8, %esp                              #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #6.0
movl      -8(%ebp), %eax                         #6.0
movl      %eax, 4(%esp)                          #6.0
call      __kmpc_for_static_fini                 #6.0
                        # LOE
..B1.41:                    # Preds ..B1.26
addl      $8, %esp                               #6.0
jmp       ..B1.31        # Prob 100%             #6.0
                        # LOE
..B1.27:                    # Preds ..B1.28 ..B1.25
..LN7:
call      omp_get_thread_num_                    #8.0
                         # LOE eax
..B1.42:                     # Preds ..B1.27
movl      %eax, -12(%ebp)                        #8.0
                         # LOE
..B1.28:                     # Preds ..B1.42
movl      -12(%ebp), %eax                        #8.0
movl      %eax, -52(%ebp)                        #8.0
..LN8:
movl      -76(%ebp), %eax                        #9.0
..LN9:
movl      16(%ebp), %edx                         #6.0
..LN10:
movl      -76(%ebp), %ecx                        #9.0
..LN11:
movl      20(%ebp), %ebx                         #6.0
..LN12:
movl      -4(%ebx,%ecx,4), %ecx                  #9.0
addl      -4(%edx,%eax,4), %ecx                  #9.0
addl      -52(%ebp), %ecx                        #9.0
movl      -76(%ebp), %eax                        #9.0
..LN13:
movl      24(%ebp), %edx                         #6.0
..LN14:
movl      %ecx, -4(%edx,%eax,4)                  #9.0
..LN15:
incl      -76(%ebp)                              #10.0
movl      -76(%ebp), %eax                        #10.0
movl      -68(%ebp), %edx                        #10.0
cmpl      %edx, %eax                             #10.0
jle       ..B1.27       # Prob 50%               #10.0
jmp       ..B1.26       # Prob 100%              #10.0
                       # LOE
.type padd_,@function
.size padd_,.-padd_
.globl _padd__6__par_loop0
_padd__6__par_loop0:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
# parameter 3: 16 + %ebp
# parameter 4: 20 + %ebp
# parameter 5: 24 + %ebp
# parameter 6: 28 + %ebp
..B1.30:                     # Preds ..B1.0
..LN16:
pushl     %ebp                                   #13.0
movl      %esp, %ebp                             #13.0
subl      $208, %esp                             #13.0
movl      %ebx, -4(%ebp)                         #13.0
..LN17:
movl      8(%ebp), %eax                          #6.0
movl      (%eax), %eax                           #6.0
movl      %eax, -8(%ebp)                         #6.0
movl      28(%ebp), %eax                         #6.0
..LN18:
movl      (%eax), %eax                           #7.0
movl      (%eax), %eax                           #7.0
movl      %eax, -80(%ebp)                        #7.0
movl      $1, -76(%ebp)                          #7.0
movl      -80(%ebp), %eax                        #7.0
testl     %eax, %eax                             #7.0
jg        ..B1.20       # Prob 50%               #7.0
                       # LOE
..B1.31:                   # Preds ..B1.41 ..B1.39 ..B1.38 ..B1.30
..LN19:
movl      -4(%ebp), %ebx                         #13.0
leave                                            #13.0
ret                                              #13.0
.align    4,0x90
# mark_end;

Debugging Shared Variables

When a variable appears in a PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION clause on some block, the variable is made private to the parallel region by redeclaring it in the block. SHARED data, however, is not declared in the threaded code. Instead, it gets its declaration at the routine level. At the machine code level, these shared variables become incoming subroutine call arguments to the threaded entry points (such as ___PADD_6__par_loop0).

In Example 2, the entry point ___PADD_6_par_loop0 has six incoming parameters. The corresponding OpenMP parallel region has four shared variables. First two parameters (parameters 1 and 2) are reserved for the compiler's use, and each of the remaining four parameters corresponds to one shared variable. These four parameters exactly match the last four parameters to __kmpc_fork_call() in the machine code of PADD.

Note
The FIRSTPRIVATE, LASTPRIVATE, and REDUCTION variables also require shared variables to get the values into or out of the parallel region.

Due to the lack of support in debuggers, the correspondence between the shared variables (in their original names) and their contents cannot be seen in the debugger at the threaded entry point level. However, you can still move to the call stack of one of the subroutines and examine the contents of the variables at that level. This technique can be used to examine the contents of shared variables. In Example 2, contents of the shared variables A, B, C, and N can be examined if you move to the call stack of PARALLEL().