How to run ciphertext conversion with 2^14 slots in 60G memory

I hope to run CKKS ciphertext with a message count of 2^14 on a machine with 60GB of memory. When converted to FHEW ciphertext, it takes up 59.3GB of memory space (my computer also has other programs running, which takes up about 8GB). When calculating the EvalSign of FHEW, the program is directly killed. Is there a way to change parameters so that the program can run. I don’t know if using reset() can clear memory, here is my code:

    // Step 1: Setup CryptoContext for CKKS
    ScalingTechnique scTech = FLEXIBLEAUTO;
    uint32_t multDepth      = 17;
    if (scTech == FLEXIBLEAUTOEXT)
        multDepth += 1;

    uint32_t scaleModSize = 42;
    uint32_t firstModSize = 48;
    uint32_t ringDim      = 1<<15;
    SecurityLevel sl      = HEStd_NotSet;
    BINFHE_PARAMSET slBin = TOY;
    uint32_t logQ_ccLWE   = 17;
    uint32_t slots        = 1<<14;  // full-packed
    uint32_t batchSize    = slots;

    CCParams<CryptoContextCKKSRNS> parameters;
    parameters.SetMultiplicativeDepth(multDepth);
    parameters.SetScalingModSize(scaleModSize);
    parameters.SetFirstModSize(firstModSize);
    parameters.SetScalingTechnique(scTech);
    parameters.SetSecurityLevel(sl);
    parameters.SetRingDim(ringDim);
    parameters.SetBatchSize(batchSize);
    parameters.SetSecretKeyDist(UNIFORM_TERNARY);
    parameters.SetKeySwitchTechnique(HYBRID);
    parameters.SetNumLargeDigits(4);

    CryptoContext<DCRTPoly> cc = GenCryptoContext(parameters);

    // Enable the features that you wish to use
    cc->Enable(PKE);
    cc->Enable(KEYSWITCH);
    cc->Enable(LEVELEDSHE);
    cc->Enable(ADVANCEDSHE);
    cc->Enable(SCHEMESWITCH);

    std::cout << "CKKS scheme is using ring dimension " << cc->GetRingDimension();
    std::cout << ", number of slots " << slots << ", and supports a multiplicative depth of " << multDepth << std::endl
              << std::endl;

    // Generate encryption keys
    auto keys = cc->KeyGen();

    // Step 2: Prepare the FHEW cryptocontext and keys for FHEW and scheme switching
    SchSwchParams params;
    params.SetSecurityLevelCKKS(sl);
    params.SetSecurityLevelFHEW(slBin);
    params.SetCtxtModSizeFHEWLargePrec(logQ_ccLWE);
    params.SetNumSlotsCKKS(slots);
    params.SetNumValues(slots);
    auto privateKeyFHEW = cc->EvalSchemeSwitchingSetup(params);
    auto ccLWE          = cc->GetBinCCForSchemeSwitch();

    ccLWE->BTKeyGen(privateKeyFHEW);
    cc->EvalSchemeSwitchingKeyGen(keys, privateKeyFHEW);

    std::cout << "FHEW scheme is using lattice parameter " << ccLWE->GetParams()->GetLWEParams()->Getn();
    std::cout << ", logQ " << logQ_ccLWE;
    std::cout << ", and modulus q " << ccLWE->GetParams()->GetLWEParams()->Getq() << std::endl << std::endl;

    // Set the scaling factor to be able to decrypt; the LWE mod switch is performed on the ciphertext at the last level
    auto pLWE1           = ccLWE->GetMaxPlaintextSpace().ConvertToInt();  // Small precision
    double scaleSignFHEW = 1.0;
    cc->EvalCompareSwitchPrecompute(pLWE1, scaleSignFHEW);

    std::cout<<"x1: "<<std::endl;
    std::vector<double> x1(slots);
    for(unsigned int i=0;i<slots;i++){
        x1[i]=-30+i*0.001;
        
    }std::cout<<"x1 end!!!\n";

    // Encoding as plaintexts
    Plaintext ptxt1 = cc->MakeCKKSPackedPlaintext(x1, 1, 0, nullptr, slots);

    // Encrypt the encoded vectors
    auto c1 = cc->Encrypt(keys.publicKey, ptxt1);
    ptxt1.reset();
    std::cout << "CKKS->FHEW before:"<<c1->GetLevel() << std::endl;
    auto LWECiphertexts = cc->EvalCKKStoFHEW(c1, slots);
    c1.reset();
    std::cout << "Cryptography conversion completed" << std::endl;
    // std::vector<LWECiphertext> LWESign(slots);

    // LWEPlaintext plainLWE;
    for(uint32_t i=0;i<slots;++i){
        LWECiphertexts[i]=ccLWE->EvalSign(LWECiphertexts[i]);
        // if(i<1000){
        //     ccLWE->Decrypt(privateKeyFHEW, LWECiphertexts[i], &plainLWE, pLWE1);
        //     std::cout << plainLWE << " ";
        // }
    }
    std::cout << "sign Calculation completed" << std::endl;
    auto c2 = cc->EvalFHEWtoCKKS(LWECiphertexts, slots);
    LWECiphertexts.clear();
    std::cout << "FHEW->CKKS end:"<<c2->GetLevel() << std::endl;

    Plaintext plaintextDec3;
    cc->Decrypt(keys.secretKey, c2, &plaintextDec3);
    plaintextDec3->SetLength(slots);
    std::cout << "Decrypted switched result: " << plaintextDec3 << std::endl;

Currently, the homomorphic decoding required in the scheme switching is done via linear transformation (matrix-vector multiplication), which is not efficient for fully-packed ciphertexts with a large ring dimension. The plaintext decoding matrix is precomputed and stored in memory, which is what I assume to cause the large memory consumption.

There are three options possible:

  1. Don’t precompute the plaintext matrix, but generate it on the fly when performing the homomorphic decoding. This would involve combining EvalLTPrecomputeSwitch and EvalSlotsToCoeffsSwitch.
  2. Implement the homomorphic decoding using FFT, the way it is done in CKKS bootstrapping. This would involve changing the above two methods, but also the methods for setup and key gen.
  3. Don’t work with fully-packed ciphertexts. Instead, work with multiple sparsely-packed ciphertexts, with a smaller batch size. You can always go from fully packed to sparsely packed and in reverse using some multiplicative masking, homomorphic rotations and SetSlots.

Can you give me an example? I’m a beginner and I don’t know much about OPENFHE yet

The first two options are advanced, so I assume you are asking about the third one if you are a beginner.

Instead of setting slots to 1<<14, you can try with a value of 1<<10, and construct 16 CKKS ciphertexts instead of a single ciphertexts if you really want to encode 1<<14 numbers, and see if this fits in your given RAM. Then, for each of these ciphertexts you run EvalCKKStoFHEW, EvalSign, and EvalFHEWtoCKKS (Setup, KeyGen and Precompute can be run only once). If at the end you want a single CKKS ciphertext encoding all the 1<<14 results instead of 16 CKKS ciphertexts each encoding 1<<10, you can apply `SetSlots’ to 1<<14 to each of the ciphertexts, mask multiplicatively each ciphertext with [1 1 1… 1 0 0 0…0], then rotate and add in order to obtain {c1, c2, …, c16}. Below is a small example for this last part.

To learn more about various parameters’ meaning and how to work with OpenFHE, check out Webinars – OpenFHE.org.

    // Step 1: Setup CryptoContext
    uint32_t multDepth = 2;
    uint32_t scaleModSize = 50;
    uint32_t batchSize = 8;

    CCParams<CryptoContextCKKSRNS> parameters;
    parameters.SetMultiplicativeDepth(multDepth);
    parameters.SetScalingModSize(scaleModSize);
    parameters.SetBatchSize(batchSize);
    parameters.SetRingDim(4*batchSize);
    parameters.SetSecurityLevel(HEStd_NotSet);

    CryptoContext<DCRTPoly> cc = GenCryptoContext(parameters);

    cc->Enable(PKE);
    cc->Enable(KEYSWITCH);
    cc->Enable(LEVELEDSHE);

    // Step 2: Key Generation
    auto keys = cc->KeyGen();
    cc->EvalMultKeyGen(keys.secretKey);
    cc->EvalRotateKeyGen(keys.secretKey, {-8});

    // Step 3: Encoding and encryption of inputs

    // Inputs
    std::vector<double> x1 = {0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0};
    std::vector<double> x2 = {8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0};

    // Encoding as plaintexts
    Plaintext ptxt1 = cc->MakeCKKSPackedPlaintext(x1);
    Plaintext ptxt2 = cc->MakeCKKSPackedPlaintext(x2);

    std::cout << "Input x1: " << ptxt1 << std::endl;
    std::cout << "Input x2: " << ptxt2 << std::endl;

    // Encrypt the encoded vectors
    auto c1 = cc->Encrypt(keys.publicKey, ptxt1);
    auto c2 = cc->Encrypt(keys.publicKey, ptxt2);

    // Create a multiplicative mask for the first batchSize elements
    std::vector<double> mask(2*batchSize);
    std::transform(mask.begin(), mask.begin()+batchSize, mask.begin(), [&](const double& elem) { return 1.0; });
    Plaintext ptxtMask = cc->MakeCKKSPackedPlaintext(mask, 1, 0, nullptr, 2*batchSize);

    // Construct a fully-packed ciphertext containing x1 and x2
    c1->SetSlots(2*batchSize);
    c2->SetSlots(2*batchSize);
    auto c1M = cc->EvalMult(c1, ptxtMask);

    auto c2M = cc->EvalMult(c2, ptxtMask);
    auto cFull = cc->EvalAdd(c1M, cc->EvalRotate(c2M, -8));

    Plaintext result;
    cc->Decrypt(keys.secretKey, cFull, &result);
    result->SetLength(2*batchSize);
    std::cout << "x1|x2 = " << result << std::endl;

    return 0;

In fact, I understand the third option, what I don’t understand is the first one, I mainly want to do the content of ciphertext conversion, and then calculate the sign function in ciphertext conversion, so my program is the above situation. My understanding is that the memory usage is still the same even if I split the ciphertext into 16, so I want to pack it in one ciphertext and change the parameters to reduce the memory consumption. What’s the difference between sparse packing and full packing? In the ciphertext conversion, the full packing with a message size of 2^13 takes up 16G memory. Using more than 60 gigabytes of sparse-packed memory is forced to kill. When I calculate the completed data can release the space, such as ptxt1 in my code, what should I do? Of course, the biggest problem at present is how to ciphertext transformation of the message size of 2^14 ciphertext.

The sparse encoding consumes less memory than the full encoding. This is because it requires a smaller matrix for the decoding procedure, which in turn leads to fewer plaintexts and fewer rotation keys (which can occupy a lot of memory when the ring dimension is high). Given that you reuse these same decoding matrix for all the sparse ciphertexts, the total memory should be lower.

Can you tell me how to convert a ciphertext into several sub-ciphertexts in the encrypted state? For example, the current CKKS ciphertext slots is 2^{15}, how do I divide this ciphertext into 16 sub-ciphertexts with slots 2^{11}.

You have to extract each piece of 2^{11}, then add and rotate it 2^{15}/2^{11} times.

    // Step 1: Setup CryptoContext
    uint32_t multDepth = 2;
    uint32_t scaleModSize = 50;
    uint32_t batchSizeIn = 16;
    uint32_t batchSizeOut = 4;
    uint32_t ringDim = 2*batchSizeIn;

    CCParams<CryptoContextCKKSRNS> parameters;
    parameters.SetMultiplicativeDepth(multDepth);
    parameters.SetScalingModSize(scaleModSize);
    parameters.SetBatchSize(batchSizeIn);
    parameters.SetRingDim(ringDim);
    parameters.SetSecurityLevel(HEStd_NotSet);

    CryptoContext<DCRTPoly> cc = GenCryptoContext(parameters);

    cc->Enable(PKE);
    cc->Enable(KEYSWITCH);
    cc->Enable(LEVELEDSHE);

    // Step 2: Key Generation
    auto keys = cc->KeyGen();
    cc->EvalMultKeyGen(keys.secretKey);
    int32_t index1 = batchSizeOut;
    int32_t index2 = 2*batchSizeOut;
    int32_t index3 = 3*batchSizeOut;
    cc->EvalRotateKeyGen(keys.secretKey, {index1, index2, index3});

    // Step 3: Encoding and encryption of inputs

    // Input
    std::vector<double> x1 = {0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0};

    // Encoding as plaintexts
    Plaintext ptxt1 = cc->MakeCKKSPackedPlaintext(x1);

    std::cout << "Input x: " << ptxt1 << std::endl;

    // Encrypt the encoded vector
    auto c1 = cc->Encrypt(keys.publicKey, ptxt1);

    // Create a multiplicative mask to extract the smaller vectors
    std::vector<double> mask(batchSizeIn);
    std::transform(mask.begin(), mask.begin()+batchSizeOut, mask.begin(), [&](const double& elem) { return 1.0; });
    Plaintext ptxtMask = cc->MakeCKKSPackedPlaintext(mask, 1, 0, nullptr, batchSizeIn);

    // Construct four sparsely-packed ciphertexts
    auto c1M = cc->EvalMult(c1, ptxtMask);
    auto c2M = cc->EvalMult(cc->EvalRotate(c1, index1), ptxtMask);
    auto c3M = cc->EvalMult(cc->EvalRotate(c1, index2), ptxtMask);
    auto c4M = cc->EvalMult(cc->EvalRotate(c1, index3), ptxtMask);

    for (uint32_t j = 1; j < batchSizeIn / batchSizeOut; j <<= 1) {
        auto temp = cc->EvalAtIndex(c1M, j * batchSizeOut);
        cc->EvalAddInPlace(c1M, temp);
        temp = cc->EvalAtIndex(c2M, j * batchSizeOut);
        cc->EvalAddInPlace(c2M, temp);
        temp = cc->EvalAtIndex(c3M, j * batchSizeOut);
        cc->EvalAddInPlace(c3M, temp);
        temp = cc->EvalAtIndex(c4M, j * batchSizeOut);
        cc->EvalAddInPlace(c4M, temp);
    }
    c1M->SetSlots(batchSizeOut);
    c2M->SetSlots(batchSizeOut);
    c3M->SetSlots(batchSizeOut);
    c4M->SetSlots(batchSizeOut);

    Plaintext result;
    cc->Decrypt(keys.secretKey, c1M, &result);
    std::cout << "First quarter = " << result << std::endl;

    cc->Decrypt(keys.secretKey, c2M, &result);
    std::cout << "Second quarter = " << result << std::endl;

    cc->Decrypt(keys.secretKey, c3M, &result);
    std::cout << "Third quarter = " << result << std::endl;

    cc->Decrypt(keys.secretKey, c4M, &result);
    std::cout << "Fourth quarter = " << result << std::endl;

    return 0;

Thank you very much for your help. :grinning:

Hello, is there any way to parallelize the decomposed ciphertext when converting it into ciphertext in the for loop.

After extraction by masking, the way it is done for c1M, c2M, c3M, c4M in the example above, each sparsely packed ciphertext is independent and can be processed in parallel.

I don’t know much about parallel programming, can you give me an example?

You can try reading any tutorial on OpenMP to see how to parallelize that for loop. Keep in mind that OpenFHE already uses internally multithreading (for the functionality below the user application) if you specified -DWITH_OPENMP=ON when running cmake.