Running OpenFHE with in the most efficient way for an intensive application

I have access to 96CPUs of AMD EPYC 7713 64-Core Processor to run my OpenFHE application. When I run it naively, on the system, It takes about 90mins to complete a single round of my job and I have 100 rounds to go through in my for-loop. I am heavily under utilizing this sources as I use just only about 10% of the resources I have access too.

What is the best way to parallelize openFHE or get the maximum throughput for my job within this system? I am not really resource bounded. I have looked at openfhe-development/src/core/examples/parallel.cpp at main · openfheorg/openfhe-development · GitHub and I am not very sure whether I should try to use openMP to parallelize all my loops or there are some openFHE configurations I can do in the build process of the library or my code to increase the performance by default.

Thanks