Three-dimensional point cloud generation systems from scanning data of a moving camera provide extra information about an object in addition to color. They give access to various prospective study fields for researchers. With applications in animal husbandry, we can analyze the characteristics of the body parts of a dairy cow to improve its fertility and milk production efficiency. However, in the depth image generation from stereo data, previous solutions using traditional stereo matching algorithms have several drawbacks, such as poor-quality depth images and missing information in overexposed regions. Additionally, the use of one camera to reconstruct a comprehensive 3D point cloud of the dairy cow has several challenges. One of these issues is point cloud misalignment when combining two adjacent point clouds with the small overlapping area between them. In addition, another drawback is the difficulty of point cloud generation from objects which have little motion. Therefore, we proposed an integrated system using two cameras to overcome the above disadvantages. Specifically, our framework includes two main parts: data recording part applies state-of-the-art convolutional neural networks to improve the depth image quality, and dairy cow 3D reconstruction part utilizes the simultaneous localization and calibration framework in order to reduce drift and provide a better-quality reconstruction. The experimental results showed that our approach improved the quality of the generated point cloud to some extent. This work provides the input data for dairy cow characteristics analysis with a deep learning approach.