Leveraging multiple heterogeneous measurements to predict wind power has long been a challenging task in the electrical community. In this paper, a deep architecture incorporated with multitask learning and multimodal learning for wind power prediction, termed predictive stacked autoencoder (PSAE), is presented. PSAE is a unified framework integrating multiple stacked autoencoders (SAEs), one feature fusion layer, and one prediction terminal layer, which expands the architecture from two spatial dimensions, including the depth and width, compared to conventional prediction models. Initially, the SAEs at the bottom of PSAE extracted features from multiple kinds of measurements respectively. Following, the feature fusion layer encodes the high-order features extracted by different SAEs into a unified feature that is more informative and representative for wind power prediction. Finally, the prediction terminal layer functions as a regression machine which generates the predicted targets based on the fusion features. Trained in an end-to-end (E2E) manner, PSAE is capable of learning heterogeneous features jointly and achieving the prediction task of sequence-to-sequence (S2S). Experiments for multi-step short-term predictions are conducted on real-world data, and the results demonstrate the superiority of PSAE to prior methods.