OpenMP

Language: CPP

Concurrency/Parallelism

OpenMP was introduced in 1997 as a standard API for parallel programming on shared-memory architectures. It is supported by major compilers like GCC, Clang, Intel, and MSVC. OpenMP has become a key tool in scientific computing, engineering simulations, and data-intensive applications where developers need to scale across multiple cores without manually managing threads.

OpenMP (Open Multi-Processing) is an API that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran. It provides a set of compiler directives, runtime library routines, and environment variables for parallelizing code easily on multi-core CPUs.

Installation

linux: GCC/Clang support OpenMP with the flag: -fopenmp

mac: brew install libomp && compile with -Xpreprocessor -fopenmp -lomp

windows: MSVC supports OpenMP with /openmp flag

Usage

OpenMP provides compiler pragmas (directives) for parallel loops, sections, tasks, synchronization, and reductions. It enables shared-memory parallelism with minimal changes to code.

Parallel for loop

#include <omp.h>
#include <iostream>

int main() {
    #pragma omp parallel for
    for (int i = 0; i < 8; i++) {
        std::cout << "Thread " << omp_get_thread_num() << " processing index: " << i << std::endl;
    }
    return 0;
}

Distributes loop iterations across available threads.

Reduction (sum)

#include <omp.h>
#include <iostream>

int main() {
    int sum = 0;
    #pragma omp parallel for reduction(+:sum)
    for (int i = 1; i <= 100; i++) sum += i;
    std::cout << "Sum: " << sum << std::endl;
}

Performs a parallel reduction to compute the sum of numbers.

Parallel sections

#pragma omp parallel sections
{
    #pragma omp section
    { task1(); }

    #pragma omp section
    { task2(); }
}

Runs independent tasks concurrently in different sections.

Tasks

#pragma omp parallel
{
    #pragma omp single
    {
        #pragma omp task
        taskA();

        #pragma omp task
        taskB();
    }
}

Creates explicit parallel tasks for fine-grained concurrency.

Critical section

#pragma omp parallel for
for (int i = 0; i < 100; i++) {
    #pragma omp critical
    {
        std::cout << "Index: " << i << std::endl;
    }
}

Ensures that only one thread executes the critical section at a time.

Barrier synchronization

#pragma omp parallel
{
    initialize();
    #pragma omp barrier
    compute();
}

Synchronizes all threads before continuing.

Error Handling

Unexpected results due to race conditions: Use `critical`, `atomic`, or reduction clauses to protect shared data.

Program slower with OpenMP enabled: Check if workload is too small; overhead may outweigh benefits.

Excessive memory usage: Limit private copies of large arrays; use `shared` where appropriate.

Best Practices

Start with coarse-grained parallelism (e.g., loop parallelization) before fine-grained tasks.

Use `reduction` for safe accumulation of results across threads.

Avoid false sharing by aligning shared data on cache line boundaries.

Use `schedule(dynamic)` for irregular workloads to balance load.

Profile performance; more threads do not always mean faster execution.

Official Docs Github