Confirmation on Cooley-Tukey NTT Results

I understand that OpenFHE’s NTT implementation uses the Cooley-Tukey NTT.

[Code Link]

L302: void NumberTheoreticTransformNat<VecType>::ForwardTransformToBitReverseInPlace

I would like to verify the correctness of its computation results.

I compared its output with the Stockham NTT described in the following paper,
but the results did not match, even after considering bit-reversal.

[Paper Link]
https://cris.maastrichtuniversity.nl/ws/portalfiles/portal/125839995/Collins_2022_Computer_Science_for_Continuous_Data.pdf#page=334
Algorithm 1. Stockham radix-2 NTT algorithm

Input Used

I initialized all elements to 2 as follows:

for (uint32_t i = 0; i < element->GetLength(); ++i) {
(*element)[i] = 2;
}
const auto modulus{element->GetModulus()};
const uint32_t n(element->GetLength() >> 1);

Expected Output

In a Stockham NTT implementation, I would expect the DC component (index 0) to contain the total sum of the elements,
i.e., (2 × element->GetLength()) mod modulus, while all other components should be 0.

Actual Output

However, the following results were obtained (excerpt):

873423077313605891
318723509706611100
736541911239092811

Question

Is the Cooley-Tukey NTT expected to produce different results from Stockham NTT in this case?
In other words, does Cooley-Tukey NTT inherently distribute the output differently, or should it behave similarly to Stockham NTT in terms of DC component accumulation?

If the results are correct, could you explain why they differ from my expectations?
Any guidance or clarification would be greatly appreciated.

For reference, the modulus used in this NTT implementation is 1152921504606748673.

Thank you for your help!

I think you should be able to compute OpenFHE’s NTT using the Stockham (let’s call it fast NTT (FNTT)) algorithm.

One thing you need to consider: NTT in OpenFHE is negacyclic (defined here). So you need to make sure that your Stockham implementation does compute the negacyclic version of NTT. From a first look, the paper you referenced seems to implement cyclic convolution rather than negacyclic. I would check this first.

Thank you for your response.

I have reviewed the concepts of negacyclic and cyclic convolutions based on the reference you provided:
[Reference] Negacyclic Polynomial Multiplication || Math ∩ Programming

However, I am still unsure how these properties specifically impact the results of OpenFHE’s NTT.
My main concern is understanding why OpenFHE’s NTT behaves differently from a conventional NTT or FFT.

My Understanding
As far as I understand, in a standard FFT or NTT, if we consider the following input:

N = 4, modulus = 17
A = {3, 3, 3, 3}
then, performing

B = NTT(A)
would typically yield the following result:

B = {12, 0, 0, 0}

That is, the first element (B[0]) should be the sum of all elements in A, and all other components should be 0.
However, when performing the NTT according to OpenFHE’s algorithm, I obtained the following result instead:

B = {11, 2, 4, 12}
I would like to understand whether this difference arises due to negacyclic properties, or if there is another underlying factor at play.

Additionally, I tested the actual OpenFHE implementation by setting all elements to 2 and obtained the following results (excerpt):

873423077313605891
318723509706611100
736541911239092811

This result was directly obtained using OpenFHE’s actual implementation.

Could you clarify why this discrepancy occurs?
I appreciate your guidance.

In negacyclic convolution, the inputs are “twisted,” that is, they are multiplied by twisting factors. This is clearly mentioned in Section 2.2 in the paper I referenced. So, your input A becomes A', where A'[i] = A[i]*\zeta^i \mod{modulus}.

Twisting is done implicitly (as shown in the referenced paper) in OpenFHE’s NTT for better performance.

Following your advice, I applied the twisting process to the input and was able to obtain the expected results.
Thank you very much.

※ Personal note:
The twisting process is described in the following image.

Thus, the following process is required before performing NTT and after performing iNTT.
for (i = 0; i < N; ++i) {
element[i] = (element[i] * RootOfUnity[i]) % modulus;
}
OpenFHE_CooleyTukeyNTT(element);

OpenFHE_GentlemanSandeiNTT(element);
for (i = 0; i < N; ++i) {
element[i] = (element[i] * RootOfUnity_Inverse[i]) % modulus;
}

We compared the data arrangement after performing Cooley-Tukey NTT with Stockham NTT.
Initially, we assumed that the only difference in array indexing would be bit-reversal ordering.

However, after comparing the actual results, we observed a specific ordering pattern different from simple bit-reversal.
Do you have any insights into this reordering pattern?
Additionally, we would appreciate any references or papers that explain this phenomenon.

Below is an example of our observed data:

Observed Data (N=8)

Cooley-Tukey NTT Index 0 1 2 3 4 5 6 7
Corresponding Index in Stockham NTT 7 3 5 1 6 2 4 0

(This table shows how the output of Stockham NTT corresponds to the index positions in Cooley-Tukey NTT.)

For example, given the input data c = {1,2,3,4,5,6,7,8}, the Stockham NTT output s follows this ordering:

s = {c[7], c[3], c[5], c[1], c[6], c[2], c[4], c[0]}

Observed Data (N=16)

Cooley-Tukey NTT Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Corresponding Index in Stockham NTT 15 4 10 1 13 7 8 2 14 5 11 0 12 6 9 3

Observed Data (N=32)

Cooley-Tukey NTT Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Corresponding Index in Stockham NTT 31 2 17 11 24 6 21 15 28 1 19 8 26 5 23 12 30 3 16 10 25 7 20 14 29 0 18 9 27 4 22 13

If you have any information about the rule behind this ordering, we would greatly appreciate your insights.

They are different beyond bit reversing. The indexing patterns of CT/GS and Stockham are different. You need to study both algorithms carefully to determine the differences.

One thing I can confirm is that Stockham algorithm can be used to compute negacyclic NTT. I did that in the past without the explicit twisting step. You just need to ensure your implementation is correct and it uses the correct set of twiddle factors in each NTT stage.

Thank you for your response. I understood that the indexing rules differ beyond just bit-reversal.

Regarding your comment on using the Stockham algorithm for negacyclic NTT, I would love to learn more about how you implemented it. Could you share any details on how you ensured correctness without the explicit twisting step?

Additionally, if there are any references (papers, websites, or other resources) that discuss this approach, I would greatly appreciate it if you could point me in the right direction.

I did it a long time ago and it mainly was prototyping. To understand the indexing pattern correctly, check Figure 6 here - this figure is read both ways: left-to-right and right-to-left for IFFT and FFT, respectively.

For FFT Stockham, check Algorithm 9.5.6 in Crandall and Pomerance’s Book - “Prime Numbers: A Computational Perspective”.

For Stockham’s negacyclic NTT, I do not know if there are public implementations. However, a Google search shows plenty of papers claiming that. I think once you understand the indexing pattern correctly, and use the right set of twiddle factors, you should be able to implement it correctly.

(post deleted by author)

I tried OpenFHE with N=16384 and was able to align Stockham’s ordering with Cooley-Tukey using the following sorting algorithm. Thank you very much.

  • python code
    def swap_Stockham_to_CooleyTukey(el):
    n = len(el)
    tmp = el[0]
    for i in range(n - 1):
    el[i] = el[i + 1]
    el[n - 1] = tmp
    BitReverseVec(el)
    return el