In this paper, a video-based behavioural recognition dataset for beef cattle is constructed. The dataset covers five behaviours of beef cattle: standing, lying, drinking, feeding, and ruminating. Six beef cows in a captive barn were selected and monitored for 168 h. Different light conditions and nighttime data were considered. The dataset was collected by one surveillance video camera. The data collection process required deploying cameras, memory, routers and laptops. Data annotation was automated using the YOLOv8 target detection model and the ByteTrack multi-target tracking algorithm to annotate each beef cow's coordinates and identity codes. The FFmpeg tool cut out individual beef cow video clips and manually annotated them with behavioural labels. The dataset includes 500 video clips, 2000 image recognition samples, over 4000 target tracking samples, and over 10G of frame sequence images. 4974 video data of different behavioural types are labelled, totalling about 14 h. Based on this, a TimeSformer multi-behaviour recognition model for beef cattle based on video understanding is proposed as a baseline evaluation model. The experimental results show that the model can effectively learn the corresponding category labels from the behavioural category data of the dataset, with an average recognition accuracy of 90.33% on the test set. In addition, a data enhancement and oversampling strategy was adopted to solve the data imbalance problem and reduce the risk of model overfitting. The dataset provides a data basis for studying beef cattle behaviour recognition. It is of great significance for the intelligent perception of beef cattle health status and improvement of farming efficiency.