Intel oneTBB

Language: CPP

Concurrency/Parallelism

Threading Building Blocks (TBB) was originally developed by Intel in 2006 to address the growing need for parallelism as CPUs moved to multi-core architectures. In 2021, Intel contributed TBB to the Linux Foundation under the new name oneTBB, making it open and community-driven. Today, oneTBB is widely used in HPC, finance, gaming, scientific computing, and data analytics.

Intel oneTBB (Threading Building Blocks) is a C++ template library that simplifies parallel programming by providing high-level abstractions for tasks, parallel loops, pipelines, and concurrent data structures. It enables developers to harness multicore processors efficiently without directly managing threads.

Installation

linux: sudo apt install libtbb-dev
mac: brew install tbb
windows: vcpkg install tbb

Usage

oneTBB provides parallel loops, parallel algorithms, concurrent containers, and task schedulers. It abstracts away thread creation and synchronization, making it easier to write scalable parallel code.

Parallel for loop

#include <tbb/parallel_for.h>
#include <iostream>

int main() {
    tbb::parallel_for(0, 10, [](int i) {
        std::cout << "Processing index: " << i << std::endl;
    });
    return 0;
}

Runs a parallel loop from 0 to 9 across available CPU cores.

Parallel reduce

#include <tbb/parallel_reduce.h>
#include <tbb/blocked_range.h>
#include <iostream>

int main() {
    int sum = tbb::parallel_reduce(
        tbb::blocked_range<int>(0, 100), 0,
        [](const tbb::blocked_range<int>& r, int init) {
            for (int i = r.begin(); i < r.end(); ++i)
                init += i;
            return init;
        },
        [](int x, int y) { return x + y; }
    );
    std::cout << "Sum: " << sum << std::endl;
}

Performs a parallel sum reduction of numbers from 0 to 99.

Parallel pipeline

#include <tbb/pipeline.h>
#include <iostream>

int main() {
    tbb::parallel_pipeline(
        4,
        tbb::make_filter<void, int>(tbb::filter_mode::serial_in_order, [](tbb::flow_control& fc) -> int {
            static int count = 0;
            if (count < 10) return count++;
            fc.stop();
            return 0;
        }) &
        tbb::make_filter<int, void>(tbb::filter_mode::parallel, [](int x) {
            std::cout << "Processing item: " << x << std::endl;
        })
    );
}

Implements a parallel pipeline with a producer and consumer stage.

Concurrent hash map

#include <tbb/concurrent_hash_map.h>
#include <iostream>

int main() {
    tbb::concurrent_hash_map<int, int> cmap;
    cmap.insert({1, 100});
    cmap.insert({2, 200});
    tbb::concurrent_hash_map<int, int>::const_accessor a;
    if (cmap.find(a, 1)) std::cout << "Key 1 value: " << a->second << std::endl;
}

Demonstrates thread-safe access to a concurrent hash map.

Task arenas

#include <tbb/task_arena.h>

int main() {
    tbb::task_arena arena(2); // restrict to 2 threads
    arena.execute([] {
        tbb::parallel_for(0, 5, [](int i) {
            std::cout << "Running in restricted arena: " << i << std::endl;
        });
    });
}

Executes tasks inside a custom thread arena.

Error Handling

Performance degradation due to false sharing: Ensure data structures are properly padded/aligned to avoid cache contention.
Deadlocks in pipelines: Check that filters properly forward or terminate flow control.
Oversubscription of CPU cores: Avoid mixing TBB with raw threads; rely on TBB's task scheduler.

Best Practices

Prefer TBB algorithms (`parallel_for`, `parallel_reduce`) over manual thread management.

Use concurrent containers (`concurrent_vector`, `concurrent_hash_map`) to avoid data races.

Avoid oversubscribing threads by letting TBB manage the thread pool.

Combine with STL algorithms and lambda expressions for clean, modern C++ parallelism.

Use task arenas to isolate workloads with different concurrency needs.