How to reduce Multiplication Depth in CKKS

Hi,
I want to do secure inference of the simple neural network. where client hold the neural network that is the weight of the neural network and user hold the input.

So I am tring to implement an neural network using arbitrary weight vector. so while doing that each layer of the neural network needs an matrix vector multiplication(Considerding it as n number of innerproduct) followed by an activation function on the output vector of matrix-vector multiplication. I can see that multiplication depth used by the n innerproduct is 2 and by the activation function is 3. So each layer on total use 5 multiplication depth. How can I reduce the number of multiplication depth?

For the innerproduct I am doing pliantext-ciphertext multiplication.
For evaluating the activation function I used polynomial approximation with an polynomial of degree 3.

Sounds to me like you don’t have many options. One trick the FHE community uses is to pack things in specific formats to reduce the number of encrypted multiplications. I cant say for sure without looking at your code, data format, etc, but it’s sounding like you’ve gotten it as small as you can get it. You’ll probably have to use bootstrapping at this point

Explain why the multiplicative depth of the inner product is 2. It should be 1, I believe.
Also, what is the degree of your approximation polynomial? If you are unsure of the degree, please provide your polynomial.

The degree of the approximate polynomial is 3
I am not getting the fact why the inner product takes two multiplication depths. Is there any way to reduce it?

Can you share a minimal working example of your issue? I can’t help without examining some code.

void innerProduct(Ciphertext<DCRTPoly>& output, const Ciphertext<DCRTPoly>& input1, const Plaintext& input2, CryptoContext<DCRTPoly>& cc, int i, bool accumulate=true){
	
    auto cMul = cc->EvalMult(input1, input2);
    auto ctemp2= cMul;
	
    for(int i=1; i<=log(8)+1; i++)
    {
      	auto ctemp1 = cc->EvalRotate(cMul, i);
	ctemp2 = cc->EvalAdd(ctemp2, ctemp1);
    }
    auto ctemp1=cc->EvalRotate(ctemp2, 4);
    ctemp2=cc->EvalAdd(ctemp2, ctemp1);

    // apply the mask
    std::vector<double> x1 = {0, 0, 0, 0, 0, 0,0, 0};
    x1[i]=1;
    
    // put the result in output
    Plaintext mask = cc->MakeCKKSPackedPlaintext(x1);
    if (accumulate){
    	output += cc->EvalMult(ctemp2, mask);
    }else{
    	output = cc->EvalMult(ctemp2, mask);
    }
}

Here is the inner product function. It takes two multiplicative depths. can I reduce it any how?

void EvalLogisticExample(Ciphertext<DCRTPoly>& output, const Ciphertext<DCRTPoly>& input1,  CryptoContext<DCRTPoly>& cc, double a, double b) {

    uint32_t polyDegree = 3;
    double lowerBound = -a;
    double upperBound = b;
    
    //output  = cc->EvalLogistic(input1, lowerBound, upperBound, polyDegree);
    output = cc->EvalChebyshevFunction([](double x) -> double { if (x < 0) return 0; else return x; }, input1, lowerBound,
                                            upperBound, polyDegree);
}

Here is the function evaluation on a vector (ciphertext). can this be modified such that it takes less multiplicative depth? now it takes 3 multiplicative depth.

Here is the inner product function. It takes two multiplicative depths. can I reduce it any how?

We have an internal EvalInnerProduct Is there a reason you’re choosing to not use it?

Chebyshev multiplicative depth

This doesn’t answer your question, but you should read through OpenFHE Lattice Cryptography Library - Arbitrary Smooth Function Evaluation if you haven’t already done so.

It takes two levels because you have a homomorphic multiplication and then you also multiply by a mask to clear all values except for one of the slots. It is the multiplication by a mask that takes the second level. You could rearrange the encoding so that after EvalMult and EvalSum you get the same values in all slots (of a subring element) if this is what you desire (depends on what the desired output should be). The EvalInnerProduct in OpenFHE does not multiply by a mask, and hence costs only 1 level.

Okay.
Is there any way of parallel processing so that the program takes less time like what we can do during the “make” of the library?

It looks like you are asking a new question. Are you asking about multithreading support in OpenFHE? Please open a new topic for this.

Yes.
Okay, I am doing that.