OBJECTIVE: Recognition of cortical arousal and sleep stages is pivotal for diagnosing sleep disorders, yet it hinges on labor-intensive manual scoring. Current intelligent approaches treat these two tasks as separate entities and heavily rely on electroencephalograms or other physiological signals, constraining their accessibility and practical application. METHODS: In this paper, we introduce CSleep, a collaborative learning approach exclusively relying on raw photoplethysmography (PPG). It comprises two processes: 1) Task-independent representation learning process enhances the model's capacity for acquiring general representations from PPG signals and 2) Self-task and cross-task training process utilizes a shared encoder, temporal modeling, and a cross-task attention branch to deepen the model's understanding of the intricate relationship between these two sleeping statuses. The CSleep framework also accommodates different time granularities, rendering it more adaptable to practical requirements. RESULTS: CSleep was evaluated on two distinct datasets: one composed of healthy individuals and another containing patients with atherosclerosis. The proposed CSleep effectively estimates clinical parameters such as arousal index (with a mean bias error of 4.8) and total sleep time (with a mean bias error of 1.3 minutes). SIGNIFICANCE: CSleep offers a novel, practical solution for sleep monitoring by relying solely on PPG signals, making it more accessible and cost-effective for homecare applications. Its ability to jointly recognize cortical arousal and sleep stages suggests its potential for supporting the diagnosis of sleep disorders and providing valuable insights in homecare applications.