GPU based Matrix Multiplication for TFHE and CKKS?

Hey folks,

I am new to FHE and want to use it for my deep learning project.

I have just started understanding TFHE and CKKS and I have a question. How to do matrix multiplication in both on GPU? After exploring CKKS side, I have understood that the data is encoded in polynomial format and we have to use diagonal encoding to perform efficient matrix multiplication.

I am not able to wrap my head around TFHE side of things. According to my understanding, TFHE encodes each number by placing its data in Most Significant Bit (MSB), adds noise to Least Significant Bit (LSB) and encrypts each number independently while preserving the shape of the matrix. In TFHE, can I use existing GEMM implementations for CPUs and GPUs as the independent numbers are encrypted and the shape of the data remains the same? For eg: I have a matrix 256x256 and I want to multiply it with another matrix of size 256x512. If I use TFHE, then each number in my matrix will be encrypted. And, when I want to do matrix multiplication on GPU, can I use the existing GEMM scripts containing efficient tiling algos? If not, what am I missing?

I am currently trying to compare 2 schemes with respect to their performance on accelerators. With my current understanding, TFHE should be the best as we can leverage existing algos, whereas for CKKS we have to design new ones. Then why is CKKS considered better for deep-learning from performance point of view?

Thanks!

Fundamentally, a BinFHE (TFHE is one of a few similar schemes) cipherext encodes either one bit or a small integer (usually 7…0_) and the result is at least a KB (you can check this by serializing a BinFHE CT into binary and looking at the file size. (multiply that by the number of bits you need to encode a 32 or 64b value and it gets bigger). it is not close to the size of a float (32 or 64B). So one cannot leverage existing GPU algorithm.
Any algorithm developed for a GPU worrying about data locality for efficiency (your GEMM tiling etc) would not carry over to ciphertexts (which are orders of magnitude larger than machine words).

More inmportantly CKKS is used because it it approx. floating point (think block floating point) vs. integer. how many ML algorithms are integer based?

1 Like

TFHE works with small-precision integers or bits, one at a time. CKKS works with vectors that include 4k to 64k real numbers (like floats or doubles). One homomorphic operation is applied to the whole vector at once. This is why CKKS is typically several orders of magnitide faster for machine learning applications (that work with large amounts of data) than TFHE. See https://openfheorg.github.io/aaai-2024-lab-materials/AAAI_Tutorial.pdf (section FHE for ML) for more information.

1 Like