Language: CPP
Concurrency/Parallelism
OpenMP was introduced in 1997 as a standard API for parallel programming on shared-memory architectures. It is supported by major compilers like GCC, Clang, Intel, and MSVC. OpenMP has become a key tool in scientific computing, engineering simulations, and data-intensive applications where developers need to scale across multiple cores without manually managing threads.
OpenMP (Open Multi-Processing) is an API that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran. It provides a set of compiler directives, runtime library routines, and environment variables for parallelizing code easily on multi-core CPUs.
GCC/Clang support OpenMP with the flag: -fopenmpbrew install libomp && compile with -Xpreprocessor -fopenmp -lompMSVC supports OpenMP with /openmp flagOpenMP provides compiler pragmas (directives) for parallel loops, sections, tasks, synchronization, and reductions. It enables shared-memory parallelism with minimal changes to code.
#include <omp.h>
#include <iostream>
int main() {
#pragma omp parallel for
for (int i = 0; i < 8; i++) {
std::cout << "Thread " << omp_get_thread_num() << " processing index: " << i << std::endl;
}
return 0;
}Distributes loop iterations across available threads.
#include <omp.h>
#include <iostream>
int main() {
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 1; i <= 100; i++) sum += i;
std::cout << "Sum: " << sum << std::endl;
}Performs a parallel reduction to compute the sum of numbers.
#pragma omp parallel sections
{
#pragma omp section
{ task1(); }
#pragma omp section
{ task2(); }
}Runs independent tasks concurrently in different sections.
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
taskA();
#pragma omp task
taskB();
}
}Creates explicit parallel tasks for fine-grained concurrency.
#pragma omp parallel for
for (int i = 0; i < 100; i++) {
#pragma omp critical
{
std::cout << "Index: " << i << std::endl;
}
}Ensures that only one thread executes the critical section at a time.
#pragma omp parallel
{
initialize();
#pragma omp barrier
compute();
}Synchronizes all threads before continuing.
Start with coarse-grained parallelism (e.g., loop parallelization) before fine-grained tasks.
Use `reduction` for safe accumulation of results across threads.
Avoid false sharing by aligning shared data on cache line boundaries.
Use `schedule(dynamic)` for irregular workloads to balance load.
Profile performance; more threads do not always mean faster execution.