BACKGROUND: The visually guided Rutgers Acquired Equivalence Test (RAET) and the various visual and audiovisual versions of the test with the same structure involve rule acquisition, retrieval, and generalization and is based on learning stimulus pairs (antecedents and consequents). In an earlier study we have found no difference in the acquisition learning and only slight enhancement in retrieval and generalization in the audiovisual learning compared to the visual one if complex readily verbalizable visual stimuli (cartoon faces and color fish) were used. In this study, we sought to examine whether similar phenomena can be observed with feature-restricted, less verbalizable visual stimuli (geometric shapes). METHODS: A total of 119 healthy adult volunteers completed two computer-based test paradigms: Polygon (PO) and SoundPolygon (SP). PO is a visual test where the antecedents are shaded circles, and the consequents are geometric shapes. SP is an audiovisual test where the antecedents are sounds and the consequents are the same geometric shapes as in PO. RESULTS: There were no significant differences in the performances and the reaction times in the acquisition phase between the PO (visual) and SP (audiovisual) tests. However, the performances in retrieval and generalization were significantly poorer in the audiovisual test and the reaction times were also longer. CONCLUSION: The acquisition phase seems to be independent from the stimulus modality if the simple geometric shapes were visual stimuli. However, feature-restricted, less verbalizable visual stimuli make more difficult to retrieve and generalize the already acquired audiovisual information.