Why relinearization is so time consuming?

Hello everyone
I am a student with an experimental project to build some numerical simulations (PDE solution) over ciphertext. I am trying to analyze the effectiveness of my simulations using OpenFHE wrapped in Julia, comparing them with simulations over plaintext. For this purpose, I’m running some tests to measure time consumption of basic operations like EvalMult and Relinearize. I have uploaded images with time measurements of the ciphertext EvalMultNoRelin and Relinearize (3-element-ciphertext convert to 2-element-ciphertext) in a log scale for different ring dimensions. It turns out that Relinearize is something like 10 times slower than EvalMultNoRelin. As long as I understand with evaluated key one only needs to do several element multiplications and additions in Relinearize, that should be comparable with number of element multiplications and additions in EvalMultNoRelin. Could someone help find the reason for such poor performance of Relinearize? It looks like this is a memory-constrained calculation, if I’m right, could someone give me a hint where in the Relinearize code so many memory accesses are performed. Maybe there are already some publications explaining performance of OpenFHE arithmetical operations?

Thank you in advance!

All experiments are performed with single OMP thread.
Used parameters, code is in Julia, but with exactly same function names as in C++. ring_dim is shown in the plots.

    parameters = CCParams{CryptoContextCKKSRNS}()

    secret_key_distribution = UNIFORM_TERNARY
    SetSecretKeyDist(parameters, secret_key_distribution)

    SetSecurityLevel(parameters, HEStd_NotSet)
    SetRingDim(parameters, ring_dim)
    SetBatchSize(parameters, div(ring_dim,2))

    rescale_technique = FLEXIBLEAUTO
    dcrt_bits = 59;
    first_modulus = 60;

    SetScalingModSize(parameters, dcrt_bits)
    SetScalingTechnique(parameters, rescale_technique)
    SetFirstModSize(parameters, first_modulus)

    SetMultiplicativeDepth(parameters, 30)

    cc = GenCryptoContext(parameters)

    Enable(cc, PKE)
    Enable(cc, KEYSWITCH)
    Enable(cc, LEVELEDSHE)
    Enable(cc, ADVANCEDSHE)
    Enable(cc, FHE)

Relinearization, also known as key switching, does not only include multiplications and additions. It also includes quite many NTT/INTT invocations. The number of NTT/INTT is associated with several factors depending on the key switching algorithm used.

More on this can be found in these resources:

  1. Revisiting Homomorphic Encryption Schemes for Finite Fields
  2. Does Fully Homomorphic Encryption Need Compute Acceleration?

Thank you very much, these papers are exactly what I’m looking for to get deep into details!