The bottleneck is FHEW, which does not support SIMD instructions for a vector of inputs, and therefore the parallelism is limited by the number of threads on your machine.
In terms of practicality of the scheme switching implementation, either use sparse packing with fewer slots (so more ciphertexts but cheaper transformation, mentioned here) or implement the transformation via FFT (the way it is done for CKKS bootstrapping).
More promising are the recent methods for functional bootstrapping in CKKS: https://eprint.iacr.org/2024/1637.pdf, https://eprint.iacr.org/2024/1623.pdf. The latter is implemented in OpenFHE and we plan to expand it in the next release versions.