BACKGROUND: To investigate image quality and agreement of derived cardiac function parameters in a novel joint image reconstruction and segmentation approach based on disentangled representation learning, enabling real-time cardiac cine imaging during free-breathing. METHODS: A multi-tasking neural network architecture, incorporating disentangled representation learning, was trained using simulated examinations based on data from a public repository along with cardiovascular magnetic resonance (CMR) scans specifically acquired for model development. An exploratory feasibility study evaluated the method on undersampled real-time acquisitions using an in-house developed spiral balanced steady-state free precession pulse sequence in eight healthy participants and five patients with intermittent atrial fibrillation. Images and predicted left ventricle segmentations were compared to the reference standard of electrocardiography (ECG)-gated segmented Cartesian cine with repeated breath-holds and corresponding manual segmentation. RESULTS: On a 5-point Likert scale, image quality of the real-time breath-hold approach and Cartesian cine was comparable in healthy participants (RT-BH: 1.99 ± 0.98, Cartesian: 1.94 ± 0.86, p = 0.052), but slightly inferior in free-breathing (RT-FB: 2.40 ± 0.98, p <
0.001). In patients with arrhythmia, both real-time approaches demonstrated favorable image quality (RT-BH: 2.10 ± 1.28, p <
0.001, RT-FB: 2.40 ± 1.13, p <
0.01, Cartesian: 2.68 ± 1.13). Intra-observer reliability was good (intraclass correlation coefficient = 0.77, 95% confidence interval [0.75, 0.79], p <
0.001). In functional analysis, a positive bias was observed for ejection fractions derived from the proposed model compared to the clinical reference standard (RT-BH mean: 58.5 ± 5.6%, bias: +3.47%, 95% confidence interval [-0.86, 7.79%], RT-FB mean: 57.9 ± 10.6%, bias: +1.45%, [-3.02, 5.91%], Cartesian mean: 54.9 ± 6.7%). CONCLUSION: The introduced real-time CMR imaging technique enables high-quality cardiac cine data acquisitions in 1-2 min, eliminating the need for ECG gating and breath-holds. This approach offers a promising alternative to the current clinical practice of segmented acquisition, with shorter scan times, improved patient comfort, and increased robustness to arrhythmia and patient non-compliance.