Are there any published results showing that quantitatively compare latency and precision before and after the PR? It would be great to see what kind of effect we should expect in our applications.
In terms of speed-up, we saw between 1.04x and 1.53x speedup for various configurations, with the beginning of the spectrum corresponding to very sparse packing and the end of the spectrum (largest speedup) corresponding to full packing. But I don’t think these benchmarks have been published.
In terms of precision, some benchmarks showcasing the precision of the two modes are here.
Thank you very much!! A couple of questions/notes:
Does this multi-thread mode require any procedure to be activated? Or is it activated by default in any multithread CPU?
It looks like under this new update the function that i borrowed from (OpenFHE CKKS chooses modulus chain that doesn't guarantee 128 bits of security), namely print_moduli_chain now only returns moduli in Q and not in the whole QP anymore… nothing crazy, for now I am using the EstimateLogP function in CryptoParametersRNS, however I was wondering if something in this new update prevents the poly.GetNumOfElements() function to return the auxiliary moduli or is it a wanted behavior.
thank you very much as always for the hard work and keeping everything open source :)))
I would like to add some clarifications because there are two major optimizations for CKKS bootstrapping in v1.5:
@andreea.alexandru described the second one (issue #1074) and the speed-up she mentioned is for the single-threaded execution for the StC-First mode as compared to the already existing ModRaise mode. BTW, the updated documentation for CKKS bootstrapping is available here (the relevant flag is BTSlotsEncoding). An example of the new CKKS mode is provided in the updated simple CKKS bootstrapping example.
The second optimization is independent and primarily affects the multithreaded execution of the ModRaise mode of CKKS bootstrapping and the functional CKKS bootstrapping (which both existed in v1.4.2). The speed-ups for this case are summarized here. The speed-up for regular CKKS bootstrapping can be up to 2x (on a server machine). For functional bootstrapping (for larger bit sizes), it can even be larger than 2x (on a server machine).
The StC-First mode takes advantage of both optimizations in the multithreaded setting.
Regarding EstimateLogP, I suggest adding a new topic in the Library Questions section, and we will look into it there. This topic focuses on clarifications related to the v1.5.0 release.