How to set batchsize to meet the size of different layer lengths of the gradient, because I found that if the batchsize is set very large, but the first layer gradient parameter is only a few hundred, the transmission is transmitted according to batchsize, resulting in very low transmission efficiency