Amount of OpenMP threads limited to 8

Hi,

We noticed that some omp parallel loops are set to use only up to 8 threads (eg. here: openfhe-development/src/core/include/lattice/hal/default/dcrtpoly-impl.h at 7b8346f4eac27121543e36c17237b919e03ec058 · openfheorg/openfhe-development · GitHub). Looking at the git history, it seems related to this thread: Single Threading performs faster than Multi-Threading.
Is there some documentation on the analysis that was done to arrive to an 8 threads maximum and an intuition for why you think there is a slowdown for more threads?
Any chance this maximum could be made configurable, or are you sure more threads would not scale well?

Thanks!

Hi Lena,
you are correct–a maximum thread limit for select OMP loops was imposed to address the issue you referenced.

Most OMP for loops in OpenFHE parallelize over a number of independent towers. The OMP loops where limits were imposed parallelize over the ring dimension and require a single thread to access data in all towers, leading to a higher potential for false sharing and much greater memory movement overhead when the thread count is high. There is no documentation, but we performed extensive benchmarking on 16, 32, and 72 core systems to determine limits that produced the best average case performance (in some cases 4 and in others 8). Currently, we have no plans to make the thread limits a configurable parameter.