Cancer treatment has made significant advancements in recent decades, however many patients still experience treatment failure or resistance. Attempts to identify determinants of response have been hampered by a lack of tools that simultaneously accommodate smaller datasets, sparse or missing measurements, multimodal clinicogenomic data, and that can be interpreted to extract biological or clinical insights. We introduce the Clinical Transformer, an explainable transformer-based deep-learning framework that addresses these challenges. Our framework maximizes data via self-supervised, gradual, and transfer learning, and yields survival predictions surpassing performance of state-of-the-art methods across diverse, independent datasets. The framework's generative capability enables in silico perturbation experiments to test counterfactual hypotheses. By perturbing immune-associated features in immunotherapy-naive patients, we identify a patient subset that may benefit from immunotherapy, and we validate this finding across three independent immunotherapy-treated cohorts. We anticipate our work will empower the scientific community to further harness data for the benefit of patients.