The Swin Transformer is recently developed transformer architecture with promising results in various computer vision tasks. Medical image analysis is a complex and critical task that requires high dimensional feature extraction. The significant challenge in medical image analysis is the limited availability of annotated data for training. It has been proposed that a multitask learning scheme be put in place. Swin Transformer can be trained for all the medical image analysis tasks simultaneously so that general features can be learned from the model and used for other new tasks and data. In most cases, the medical images have poor properties such as noise, artifacts, and low contrast. The Swin Transformer presents an adaptive attention mechanism: its attention weights are learned dynamically according to input quality. It could selectively focus on essential regions in an image while discarding noise or irrelevant information. Medical images may have very complex anatomical structures. In this sense, an iterative transformer encoder is proposed to form a hierarchical structure with gradually decreasing dimensionality between layers-so that the attention mechanism is applied at different scales, capturing local and long-range relationships between image patches. This research proposes a robust Swin Transformer architecture for high-dimensional feature extraction in medical images. The proposed algorithm reached 80.76 % accuracy, 80.28 % precision, 78.04 % recall, 76.46 % F1-Score and 73.46 % critical success index.