In this paper, we propose a novel visual relation detection task, named Group Visual Relation Detection (GVRD), for detecting visual relations whose subjects and/or objects are groups (GVRs), inspired by the observation that groups are common in image semantic representation. GVRD can be deemed as an evolution over the existing visual relation detection task that limits both subjects and objects of visual relations as individuals. We propose a Simultaneous Group Relation Prediction (SGRP) method that can simultaneously predict groups and predicates to address GVRD. SGRP contains an Entity Construction (EC) module, a Feature Extraction (FE) module, and a Group Relation Prediction (GRP) module. Specifically, the EC module constructs instances, group candidates, and phrase candidates
the FE module extracts visual, location and semantic features for these entities
and the GRP module simultaneously predicts groups and predicates, and generates the GVRs. Moreover, we construct a new dataset, named COCO-GVR, to facilitate solutions to GVRD task, which consists of 9,570 images from COCO dataset and 31,855 manually labeled GVRs. We test and validate the performance of SGRP by extensive experiments on COCO-GVR dataset. It shows that SGRP outperforms the baselines generated from the state-of-the-art visual relation detection and scene graph generation methods.