I have a question about the ForwardTransformToBitReverseInPlace method in the NumberTheoreticTransformNat class.
Source name is transformnat-impl.h.
Function Name is below. template <typename VecType> void NumberTheoreticTransformNat<VecType>::ForwardTransformToBitReverseInPlace(const VecType& rootOfUnityTable, const VecType& preconRootOfUnityTable, VecType* element);
Does the Cooley-Tukey (CT) butterfly operation follow the structure shown in the diagram “Figure 1”?
The legend is “Figure 2”.
This structure differs from the general Cooley-Tukey butterfly operation shown “Figure 3”
Figure (1) is not complete. First, we do not skip points as we move from one stage to the next. Also, the twiddle factors for each stage are not shown.
Anyway, forward NTT in OpenFHE employs the standard Cooley-Tukey dataflow with a slight modification of the twiddles order.
The NTT implementation in OpenFHE follows the NTT algorithms in this work.
At a high level, for NTT, in stage 0 \leq s \lt \log_{2}{N}, you divide the points into 2^{s+1} groups of points. The butterfly operation is then applied to these groups in an interleaved manner, processing pairs of groups sequentially. My description might not be very clear, but it should be straightforward to figure out the pattern if you trace ForwardTransformToBitReverseInPlace.
I have traced the code.
I included the traced code (Figure 4) and its output (Figure 5) in the attached images.
The trace code is a butterfly operation part.
The indices of the elements used in the operation (loIdx, hiIdx) and the indices of omega (omegaIdx) were output according to the base code.
I would like to confirm the case of Step m=2, t=2, logt=1 specifically Step i=1.
In this case, the butterfly operation is applied to ‘elements 2–5’.
However, I understand that in the standard Cooley-Tukey FFT, this operation would involve elements 4–7.
Additionally, in Step m=2, t=2, logt=1 no operation is performed for elements 6 and 7.
This does not seem correct to me. You might have implemented it incorrectly. Instead of pulling the code out of OpenFHE and re-implementing it, why don’t you trace it inside OpenFHE itself? This way, you can rule out the possibility of incorrect implementation. Your code differs significantly from OpenFHE’s implementation.
The code you provided must have an issue since all elements must be touched in every stage in NTT.
Upon rechecking in OpenFHE, I identified the cause of my misunderstanding. I had incorrectly interpreted the behavior of GetMSB.
For example, I assumed GetMSB(4) would be equivalent to log2(4) = 2, counting from 0. However, it actually counts from 1, making GetMSB(4) = 3.
As a result, the Cooley-Tukey FFT was being executed correctly.
I will close this Q&A. Thank you very much for your support.