Hardware-Friendly Algorithm/Reference for the Modulus Switching in TFHE bootstrapping

Hi, there!
Are there any hardware-friendly algorithm for the modulus switching (MS) in TFHE bootstrapping? The two points below illustrate why I have this question.

  1. Take STD128 GINX style gate bootstrapping for example. To my knowledge, there are two MS operations during the process. The first MS is performed before the key switching to reduce the switching key size, and the second one converts the current modulus back to the original modulus. (Please let me know if there is any mistake.)
  2. When performing the second MS, the simple shift operation can be replaced by RoundqQ as given in lwe-pke.cpp since both q and Q are the powers of 2. However, the first MS switches the modulus from the NTT-friendly prime (in this case, 134215681) to powers of 2 (in this case, 2^14). Therefore, it requires expensive computations such as floating/fixed point multiplications or the division for the first MS.

In summary, I am wondering if there is any reference discussing the topic. For example, something like the MS in BGV involves calculating correction terms to compensate and enable the division operation to be evenly divided.

The pictures display the parameters provided in the STD128, noting that it is not the latest version.

Thanks for the reading!