This article presents a new deep cross-output knowledge transfer approach based on least-squares support vector machines, called DCOT-LS-SVMs. Its aim is to improve the generalizability of least-squares support vector machines (LS-SVMs) while avoiding the complicated parameter tuning process that occurs in many kernel machines. The proposed approach has two significant characteristics: 1) DCOT-LS-SVMs is inspired by a stacked hierarchical architecture that combines several layer-by-layer LS-SVMs modules. The module that forms the higher layer has additional input features that consider the predictions from all previous modules and 2) cross-output knowledge transfer is used to leverage knowledge from the predictions of the previous module to improve the learning process in the current module. With this approach, the model’s parameters, such as a tradeoff parameter C and a kernel width δ, can be randomly assigned to each module in order to greatly simplify the learning process. Moreover, DCOT-LS-SVMs is able to autonomously and quickly decide the extent of the cross-output knowledge transfer between adjacent modules through a fast leave-one-out cross-validation strategy. In addition, we present an imbalanced version of DCOT-LS-SVMs, called IDCOT-LS-SVMs, given that imbalanced datasets are common in real-world scenarios. The effectiveness of the proposed approaches is demonstrated through a comparison with five comparative methods on UCI datasets and with a case study on the diagnosis of prostate cancer.