Hi Lena,
you are correct–a maximum thread limit for select OMP loops was imposed to address the issue you referenced.
Most OMP for loops in OpenFHE parallelize over a number of independent towers. The OMP loops where limits were imposed parallelize over the ring dimension and require a single thread to access data in all towers, leading to a higher potential for false sharing and much greater memory movement overhead when the thread count is high. There is no documentation, but we performed extensive benchmarking on 16, 32, and 72 core systems to determine limits that produced the best average case performance (in some cases 4 and in others 8). Currently, we have no plans to make the thread limits a configurable parameter.