ClearEvalAutomorphismKeys behavior is strange when you examine the memory usage pattern

Hi, I’m trying some features with OpenFHE.

To start with, I am running the test with OpenFHE commit hash 85a8f6325c50e3e39d07d253a1fe150bad2d49ce.

I am trying out functions EvalRotateKeyGen, SerializeEvalAutomorphismKey, DeserializeEvalAutomorphismKey, and ClearEvalAutomorphismKeys.

Below is an example of my test code.

// cryptocontext generated
// mem check 1
cryptocontext->EvalRotateKeyGen(...); // about 150 rotation keys
// mem check 2
cryptocontext->SerializeEvalAutomorphismKey("rot1", ...);
cryptocontext->ClearEvalAutomorphismKeys();
// mem check 3

cryptocontext->EvalRotateKeyGen(...); // about 150 rotation keys
// mem check 4
cryptocontext->SerializeEvalAutomorphismKey("rot2", ...);
cryptocontext->ClearEvalAutomorphismKeys();
// mem check 5

cryptocontext->DeserializeEvalAutomorphismKey("rot1", ...);
// mem check 6
cryptocontext->ClearEvalAutomorphismKeys();
// mem check 7

cryptocontext->DeserializeEvalAutomorphismKey("rot2", ...);
// mem check 8
cryptocontext->ClearEvalAutomorphismKeys();
// mem check 9

In the code, I generated two rotation key sets each containing about 150 rotation keys, and serialized them to file "rot1" and "rot2".
After clearing all automorphism keys from the crypto context, I deserialized the rotation keys from the file "rot1" and cleared all automorphism keys.
Same procedure was done for "rot2" file.

During the procedure, I’ve checked memory usage 9 times.
I’ve check that just before rotation key generation, the memory usage was 17GB and 150 rotation keys are about 47GB when deserialized into a file.

Therefore, I expected to observe a memory usage pattern such as 17GB(1) → 64GB(2) → 17GB(3) → 64GB(4) → 17GB(5) → 64GB(6) → 17GB(7) → 64GB(8) → 17GB(9).
However, what I actually observed was 17GB(1) → 73GB(2) → 66GB(3) → 74GB(4) → 64GB(5) → 110GB(6) → 109GB(7) → 110GB(8) → 109GB(9).

I have a lot of questions for this behavior.
First, why does the memory increases extra 9GB at memory check 2 between my expectation and the actual usage?
Second, why does not free at least 47GB of memory at memory check 3? why is only 7GB of memory is freed?
Third, at memory check 4, the memory did not increase 47GB compared to memory check 3. Am I guessing correctly that is reused the unfreed keys in memory check 3?
Fourth, if the library somehow reuses unfreed keys in the memory, how come the memory usage got 46GB increase in memory check 6? or does this mean deserialized keys are treated differently?
Lastly, for memory check 7 and 9, how come they’ve got nearly no memory free?

Can I get some explanation of what might be happening here? and some advice on how I might fix this?

For further specification, I measured the memory usage through calling free command every second and recording it.
My machine uses 5.5GB of memory at idle.
The parameters and configurations for the crypto context is as follows.

    CCParams<CryptoContextCKKSRNS> parameters;
    SecretKeyDist secretKeyDist = UNIFORM_TERNARY;
    parameters.SetSecretKeyDist(secretKeyDist);
    parameters.SetSecurityLevel(HEStd_128_classic);
    parameters.SetRingDim(1<<17);
    parameters.SetBatchSize(batchSize);
    ScalingTechnique rescaleTech = FLEXIBLEAUTO;
    usint dcrtBits               = 59;
    usint firstMod               = 60;
    parameters.SetScalingModSize(dcrtBits);
    parameters.SetScalingTechnique(rescaleTech);
    parameters.SetFirstModSize(firstMod);
    vector<uint32_t> levelBudget = {4, 4};
    uint32_t levelsAvailableAfterBootstrap = 17;
    uint32_t multDepth = levelsAvailableAfterBootstrap + FHECKKSRNS::GetBootstrapDepth(levelBudget, secretKeyDist);
    parameters.SetMultiplicativeDepth(multDepth);

    cc = GenCryptoContext(parameters);

    cc->Enable(PKE);
    cc->Enable(KEYSWITCH);
    cc->Enable(LEVELEDSHE);
    cc->Enable(ADVANCEDSHE);
    cc->Enable(FHE);

    vector<uint32_t> bsgsDim = {0, 0};
    cc->EvalBootstrapSetup(levelBudget, bsgsDim, batchSize);

    keys = cc->KeyGen();

Thank you in advance.

Additionally, I tried using google ASan to find out the memory leakage, and when the code is compiled with -fsanitize=address on, then the problems are all gone.
The memory usage pattern shows 17GB(1) → 64GB(2) → 18GB(3) → 65GB(4) → 18GB(5) → 64GB(6) → 18GB(7) → 65GB(8) → 18GB(9) very close to what I’ve expected.
Could this just be some kind of optimization?

From the hint when using google ASan, I found out that the memory allocator is the source of the problem.
When I changed the memory allocator to jemalloc it fixed the issue and afterwards, I found there is a tcmalloc option in OpenFHE cmake file.
With the tcmalloc option turned on inside OpenFHE cmake file, it also solved the issue.

I guess this issue is resolved.

Can you provide further details about your environment?
OS? Windows, Unix-like, WSL, or virtual machine, or … etc.
Compiler?
Architecture - What is the number of CPUs and sockets?
Was this behavior observed with multi-thread or single-thread build?

I am using Ubuntu 20.04.6 LTS, g++/gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, two CPUs (Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz) for two sockets and built with the OPENMP option turned on.