BACKGROUND: Esophageal squamous cell carcinoma (ESCA) is a type of cancer that starts in the cells lining the esophagus, the tube connecting the throat to the stomach. It is known for its aggressive nature and poor prognosis. Understanding the key factors that drive this cancer is crucial for developing better diagnostic tools and treatments. METHODS: Gene expression profiles of ESCA were analyzed using Gene Expression Omnibus (GEO) datasets (GSE23400, GSE29001, GSE92396, and GSE1420) from the GEO database. Differentially expressed genes (DEGs) were identified using the limma package, and a protein-protein interaction (PPI) network was constructed using the STRING database. Hub genes were identified based on the degree method. Further validation was performed through reverse transcription quantitative PCR (RT-qPCR), mutational and copy number variation (CNV) analysis via the cBioPortal database, promoter methylation analysis using the OncoDB and GSCA databases, survival analysis, immune infiltration analysis through the GSCA database, and functional assays, including knockdown of key genes. RESULTS: We identified four key hub genes, COL3A1, COL4A1, COL5A2, and CXCL8 that play significant roles in ESCA. These genes were highly expressed in ESCA tissues and cell lines, with expression levels significantly (p-value <
0.001) elevated compared to normal controls. Receiver operating characteristic (ROC) curve analysis revealed exceptional diagnostic performance for all four genes, with area under the curve (AUC) values of 1.0, indicating perfect sensitivity and specificity in distinguishing ESCA from normal controls. Mutational analysis revealed that COL3A1 was altered in 67% of ESCA samples, primarily through missense mutations, while COL5A2 exhibited alterations in 50% of the samples, including splice site and missense mutations. Additionally, gene amplification patterns were observed in all four hub genes, further validating their oncogenic potential in ESCA progression. A significant (p-value <
0.05) promoter hypomethylation was detected in these genes, suggesting a potential regulatory role in their expression. Functional assays demonstrated that knocking down COL3A1 and COL4A1 led to decreased cell proliferation, colony formation, and migration, indicating their critical roles in tumor progression. Additionally, these genes were involved in pathways related to the extracellular matrix and immune system modulation. CONCLUSION: COL3A1, COL4A1, COL5A2, and CXCL8 are crucial in ESCA development and progression, particularly in remodeling the extracellular matrix, modulating the immune system, and promoting metastasis. These findings suggest that these genes could serve as potential biomarkers for diagnosing ESCA and targets for future therapies. Future research should focus on in vivo validation of these findings and clinical testing to assess the therapeutic potential of targeting these genes in ESCA treatment.