Polynomial Multiplication & other high-level operations on hardware backend

I would like to implement polynomial multiplication and other higher-level operations using a hardware-integrated backend implemented through the hardware abstraction layer.

I investigated the current backends that are provided (HEXL). From what I was able to understand from mubintvecnathexl.cpp, the backend is able to execute mostly lower-level operations such as Modular Multiplication (ModMul).

In benativehexl-math-impl.cpp, I found a template for Polynomial Multiplication (PolynomialMultiplication). However after running some examples, it seems that the function is not called during the execution of a program that contains for example an EvalMult() operation. I was only able to see PolynomialMultiplication() be executed within unittests, which I believe only test for correctness.

Ideally I would want the backend be able to process a “customized” polynomial multiplication operation instead of low-level ModMults.

Is there a way to accomplish this? If so, could you please point me to the right direction?

Thank you very much.

The PolynomialMultiplication() you are referring to is only used for correctness checks. It implements the schoolbook polynomial multiplication of complexity N^2, where N is the input polynomial length. We do not recommend using this algorithm in FHE implementation.

For your purpose, you are advised to implement the faster version of polynomial multiplication that uses NTT and runs in N\log_2{}N. At a high level, this algorithm does the following:
Given polynomials a and b:

  1. A=NTT(a), forward NTT
  2. B=NTT(b), forward NTT
  3. C=A \bigodot B, where \bigodot is point-wise modular multiplication
  4. c=INTT(C), inverse NTT

A concrete example illustrating this algorithm is provided in UnitTest CRT_polynomial_mult_small which can be found here.

Note that OpenFHE uses the described algorithm above extensively in the low-level implementations of the different schemes. Sometimes the input would be already in the NTT representation (which means we do not need to invoke steps 1 and 2), other times we do not need to convert the product into the coefficient representation (that is, we do not need to invoke step 4). For that reason, we advise implementing polynomial multiplication from the low-level operations it is composed of such as NTT, INTT, and ModMul, which should be provided by your accelerator.

Hello Caesar, thank you for your reply.
I am more concerned about the technical aspect of the process.

Could you point out which method is invoked in the regular execution when openFHE tries to perform polynomial multiplication using 2 polynomials?

Ideally I would like to have my custom backend execute a polynomial multiplication operation of the form poly_mult(A,B).
What is the procedure I would have to follow to accomplish this?
Thank you.

First, I do not see why there would be a need to implement poly_mult(A, B) as an atomic operation. In OpenFHE, we provide a complete set of low-level operations that can be used to compose any ring operation that might be required by the supported FHE schemes. Assuming your accelerator implements these low-level operations, you should be able to implement poly_mult(A, B) or potentially any other ring operation from the low-level operations.

If you still want to implement a new custom method, you may want to follow the guidelines below. For the sake of illustration, let’s assume you are interested in adding a new custom method to DCRTPoly. In order to provide your own implementation of the new custom method,
you will need to do the following:

  1. declare your method in dcrtpoly.h.
  2. declare your method in dcrtpoly-interface.h.
  3. declare your method in <accelerator-name>dcrtpoly.h, check hexldcrtpoly.h for an example.
  4. make sure you provide the implementation in <accelerator-name>dcrtpoly.cpp check hexldcrtpoly.cpp for an example.

As a guiding example, you can trace the evolution of any of the methods declared in hexldcrtpoly.h and do the same for your custom method.

Lastly, I would like to note that polynomial multiplication is implemented in dcrtpoly.cpp as Times. If you are interested in implementing poly_mult, I recommend using this method.

Hi @homer_g.
the Current HAL layer is insufficient for any PCI based accelerator, and also stops at the lowest math and transform level – it needs to be augmented up to the lattice layer in order to really accelerate commercial level FHE workloads. We plan to do this under our DARPA program.
You are actually the vanguard some of this by being the first one to investigate what is and isn’t there in the HAL for a PCI interfaced accelerator.

I think we need some definitions here.

what do you call a poly?
and tell me exactly what your poly_mult(a,b) would do, and how you see it used in FHE?

Because in OpenFHE our poly’s are basically 1) all in RNS form and are called DCRTpoly 2) components of other things such as cipertexts and keys.
Rarely do you just multiply two DCRTpolynomials hence the reason there isn’t an obvious stand alone call. @Caesar points out Times but that is an elementwise hadamard multiply of all the towers in a DCRTpoly and not a “multiplication of two polynomials” which requires a convolution of their coefficients. The code that @Caesar pointed out is what is used for that as direct convolution is N^2 vs NlogN and N is LARGE.

I would suggest maybe you provide us with a list of what your expected acceleration primitives are and then we may be able to suggest similar points in the OpenFHE code. Basically you have a small number of primitives and a huge number of OpenFHE functions, so wouldn’t that make sense?

I wonder if there are any further discussion?

A few days before I did asked any comment on following question.
“How to use custom NTT/CRT library” How to use custom NTT/CRT library

Basically, it is the same question how to offload (plug-in) the other implementation of NTT/CRT routines on the OpenFHE framework.