Confirmation on Cooley-Tukey NTT Results

Yusaku_suzuki · March 10, 2025, 6:05am

I understand that OpenFHE’s NTT implementation uses the Cooley-Tukey NTT.

[Code Link]

github.com/openfheorg/openfhe-development

src/core/include/math/hal/intnat/transformnat-impl.h

main

//==================================================================================
// BSD 2-Clause License
//
// Copyright (c) 2014-2022, NJIT, Duality Technologies Inc. and other contributors
//
// All rights reserved.
//
// Author TPOC: contact@openfhe.org
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
//
// 1. Redistributions of source code must retain the above copyright notice, this
//    list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright notice,
//    this list of conditions and the following disclaimer in the documentation
//    and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"

This file has been truncated. show original

L302: void NumberTheoreticTransformNat<VecType>::ForwardTransformToBitReverseInPlace

I would like to verify the correctness of its computation results.

I compared its output with the Stockham NTT described in the following paper,
but the results did not match, even after considering bit-reversal.

[Paper Link]
https://cris.maastrichtuniversity.nl/ws/portalfiles/portal/125839995/Collins_2022_Computer_Science_for_Continuous_Data.pdf#page=334
Algorithm 1. Stockham radix-2 NTT algorithm

Input Used

I initialized all elements to 2 as follows:

for (uint32_t i = 0; i < element->GetLength(); ++i) {
(*element)[i] = 2;
}
const auto modulus{element->GetModulus()};
const uint32_t n(element->GetLength() >> 1);

Expected Output

In a Stockham NTT implementation, I would expect the DC component (index 0) to contain the total sum of the elements,
i.e., (2 × element->GetLength()) mod modulus, while all other components should be 0.

Actual Output

However, the following results were obtained (excerpt):

873423077313605891
318723509706611100
736541911239092811
…

Question

Is the Cooley-Tukey NTT expected to produce different results from Stockham NTT in this case?
In other words, does Cooley-Tukey NTT inherently distribute the output differently, or should it behave similarly to Stockham NTT in terms of DC component accumulation?

If the results are correct, could you explain why they differ from my expectations?
Any guidance or clarification would be greatly appreciated.

For reference, the modulus used in this NTT implementation is 1152921504606748673.

Thank you for your help!

Caesar · March 10, 2025, 3:55pm

I think you should be able to compute OpenFHE’s NTT using the Stockham (let’s call it fast NTT (FNTT)) algorithm.

One thing you need to consider: NTT in OpenFHE is negacyclic (defined here). So you need to make sure that your Stockham implementation does compute the negacyclic version of NTT. From a first look, the paper you referenced seems to implement cyclic convolution rather than negacyclic. I would check this first.

Yusaku_suzuki · March 12, 2025, 7:17am

Thank you for your response.

I have reviewed the concepts of negacyclic and cyclic convolutions based on the reference you provided:
[Reference] Negacyclic Polynomial Multiplication || Math ∩ Programming

However, I am still unsure how these properties specifically impact the results of OpenFHE’s NTT.
My main concern is understanding why OpenFHE’s NTT behaves differently from a conventional NTT or FFT.

My Understanding
As far as I understand, in a standard FFT or NTT, if we consider the following input:

N = 4, modulus = 17
A = {3, 3, 3, 3}
then, performing

B = NTT(A)
would typically yield the following result:

B = {12, 0, 0, 0}

That is, the first element (B[0]) should be the sum of all elements in A, and all other components should be 0.
However, when performing the NTT according to OpenFHE’s algorithm, I obtained the following result instead:

B = {11, 2, 4, 12}
I would like to understand whether this difference arises due to negacyclic properties, or if there is another underlying factor at play.

Additionally, I tested the actual OpenFHE implementation by setting all elements to 2 and obtained the following results (excerpt):

873423077313605891
318723509706611100
736541911239092811

This result was directly obtained using OpenFHE’s actual implementation.

Could you clarify why this discrepancy occurs?
I appreciate your guidance.

Caesar · March 12, 2025, 1:39pm

In negacyclic convolution, the inputs are “twisted,” that is, they are multiplied by twisting factors. This is clearly mentioned in Section 2.2 in the paper I referenced. So, your input A becomes A', where A'[i] = A[i]*\zeta^i \mod{modulus}.

Twisting is done implicitly (as shown in the referenced paper) in OpenFHE’s NTT for better performance.

Yusaku_suzuki · March 13, 2025, 2:37am

Following your advice, I applied the twisting process to the input and was able to obtain the expected results.
Thank you very much.

※ Personal note:
The twisting process is described in the following image.

Thus, the following process is required before performing NTT and after performing iNTT.
for (i = 0; i < N; ++i) {
element[i] = (element[i] * RootOfUnity[i]) % modulus;
}
OpenFHE_CooleyTukeyNTT(element);

OpenFHE_GentlemanSandeiNTT(element);
for (i = 0; i < N; ++i) {
element[i] = (element[i] * RootOfUnity_Inverse[i]) % modulus;
}

Yusaku_suzuki · March 14, 2025, 12:57am

We compared the data arrangement after performing Cooley-Tukey NTT with Stockham NTT.
Initially, we assumed that the only difference in array indexing would be bit-reversal ordering.

However, after comparing the actual results, we observed a specific ordering pattern different from simple bit-reversal.
Do you have any insights into this reordering pattern?
Additionally, we would appreciate any references or papers that explain this phenomenon.

Below is an example of our observed data:

Observed Data (N=8)

Cooley-Tukey NTT Index	0	1	2	3	4	5	6	7
Corresponding Index in Stockham NTT	7	3	5	1	6	2	4	0

(This table shows how the output of Stockham NTT corresponds to the index positions in Cooley-Tukey NTT.)

For example, given the input data c = {1,2,3,4,5,6,7,8}, the Stockham NTT output s follows this ordering:

s = {c[7], c[3], c[5], c[1], c[6], c[2], c[4], c[0]}

Observed Data (N=16)

Cooley-Tukey NTT Index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Corresponding Index in Stockham NTT	15	4	10	1	13	7	8	2	14	5	11	0	12	6	9	3

Observed Data (N=32)

Cooley-Tukey NTT Index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
Corresponding Index in Stockham NTT	31	2	17	11	24	6	21	15	28	1	19	8	26	5	23	12	30	3	16	10	25	7	20	14	29	0	18	9	27	4	22	13

If you have any information about the rule behind this ordering, we would greatly appreciate your insights.

Caesar · March 14, 2025, 6:32pm

They are different beyond bit reversing. The indexing patterns of CT/GS and Stockham are different. You need to study both algorithms carefully to determine the differences.

One thing I can confirm is that Stockham algorithm can be used to compute negacyclic NTT. I did that in the past without the explicit twisting step. You just need to ensure your implementation is correct and it uses the correct set of twiddle factors in each NTT stage.

Yusaku_suzuki · March 17, 2025, 7:31am

Thank you for your response. I understood that the indexing rules differ beyond just bit-reversal.

Regarding your comment on using the Stockham algorithm for negacyclic NTT, I would love to learn more about how you implemented it. Could you share any details on how you ensured correctness without the explicit twisting step?

Additionally, if there are any references (papers, websites, or other resources) that discuss this approach, I would greatly appreciate it if you could point me in the right direction.

Caesar · March 17, 2025, 3:04pm

I did it a long time ago and it mainly was prototyping. To understand the indexing pattern correctly, check Figure 6 here - this figure is read both ways: left-to-right and right-to-left for IFFT and FFT, respectively.

For FFT Stockham, check Algorithm 9.5.6 in Crandall and Pomerance’s Book - “Prime Numbers: A Computational Perspective”.

For Stockham’s negacyclic NTT, I do not know if there are public implementations. However, a Google search shows plenty of papers claiming that. I think once you understand the indexing pattern correctly, and use the right set of twiddle factors, you should be able to implement it correctly.

Yusaku_suzuki · March 27, 2025, 7:27am

I tried OpenFHE with N=16384 and was able to align Stockham’s ordering with Cooley-Tukey using the following sorting algorithm. Thank you very much.

python code
def swap_Stockham_to_CooleyTukey(el):
n = len(el)
tmp = el[0]
for i in range(n - 1):
el[i] = el[i + 1]
el[n - 1] = tmp
BitReverseVec(el)
return el

Topic		Replies	Views
Question about the Cooley-Tukey operation in NTT Library Questions	4	36	January 29, 2025
How to use custom NTT/CRT library Library Questions	4	503	February 27, 2023
Is adapting a NTT done in a GPU to OpenFHE difficult? Library Questions	1	169	March 12, 2024
How to transform something to NTT space Library Questions	4	195	March 6, 2025
NTT or FFT in TFHE and CKKS scheme? FHE Questions	3	404	December 27, 2023