Monoclonal antibodies (mAbs) are considered one of the most game-changing products of the biopharmaceutical industry. The introduction of several diverse and complex formats consisting of several polypeptide chains and engineered with multiple antigen-binding domains has made the manufacturability process particularly challenging, especially in the context of assessing expression levels and yields of the formats. Here we present the largest and most diversified CHO transcriptomics analysis consisting of data derived from 892 different monoclonal cell lines, producing 11 different mAbs with various non-standard, highly complex formats. We apply three robust feature selection methods, one traditional differential expression analysis and two machine learning approaches to identify genes correlated to high product titer and quality. Cnpy3 gene is identified as a novel gene biomarker, showing a very strong negative correlation (Pearson r