I understand that OpenFHE’s NTT implementation uses the Cooley-Tukey NTT.
[Code Link]
L302: void NumberTheoreticTransformNat<VecType>::ForwardTransformToBitReverseInPlace
I would like to verify the correctness of its computation results.
I compared its output with the Stockham NTT described in the following paper,
but the results did not match, even after considering bit-reversal.
[Paper Link]
https://cris.maastrichtuniversity.nl/ws/portalfiles/portal/125839995/Collins_2022_Computer_Science_for_Continuous_Data.pdf#page=334
Algorithm 1. Stockham radix-2 NTT algorithm
Input Used
I initialized all elements to 2 as follows:
for (uint32_t i = 0; i < element->GetLength(); ++i) {
(*element)[i] = 2;
}
const auto modulus{element->GetModulus()};
const uint32_t n(element->GetLength() >> 1);
Expected Output
In a Stockham NTT implementation, I would expect the DC component (index 0) to contain the total sum of the elements,
i.e., (2 × element->GetLength()) mod modulus, while all other components should be 0.
Actual Output
However, the following results were obtained (excerpt):
873423077313605891
318723509706611100
736541911239092811
…
Question
Is the Cooley-Tukey NTT expected to produce different results from Stockham NTT in this case?
In other words, does Cooley-Tukey NTT inherently distribute the output differently, or should it behave similarly to Stockham NTT in terms of DC component accumulation?
If the results are correct, could you explain why they differ from my expectations?
Any guidance or clarification would be greatly appreciated.
For reference, the modulus used in this NTT implementation is 1152921504606748673.
Thank you for your help!