Hi there,
I know that OpenFHE internally uses OpenMP for parallelizing certain operations, including CKKS bootstrapping. Furthermore, when a user employs OpenMP at the application level, the parallelization at the library level is disabled. My question is: is there a way to enable both application-level and library-level parallelization simultaneously?
I need to bootstrap multiple ciphertexts, and I’ve found that application-level parallelization is much faster compared to library-level. I am currently processing 16 ciphertexts on a machine with 48 CPU cores, so I’m far from achieving full CPU utilization. I believe performance could improve if both levels of parallelization were active.
In general, you should be able to do it using nested parallelism in OpenMP (see c++ - OpenMP: What is the benefit of nesting parallelizations? - Stack Overflow for an example discussion). We do not support this configuration as it is easy to misuse it and cause the system to get locked up. Internally the low-level code simply does loop parallelization over all threads that are currently available. You should be able to control in the outer (application) code how many threads you are making available for a given CKKS bootstrapping instance.
I am not sure I am getting your point. I currently have a std::vector of 16 ciphertexts and I am bootstrapping them using a for loop with #pragma omp parallel for. When this function runs, only 16 cores are utilized. However, I know that the computations required for bootstrapping can be parallelized internally, and they are if there is no parallelism at the application level. Given that I have 32 free cores, nested parallelism could result in a great speedup.
In your application code you can split 32 threads into smaller groups, e.g. 16 or 8 each. Then instantiate the bootstrapping within that code (it will use the threads only in that group). Again, this is not supported. Please read the OMP literature on nested parallelism to see how this could be achieved.