Low-Cost, Computer Vision-Based, Prebloom Cluster Count Prediction in Vineyards

Jonathan Jaramillo, Justine Vanden Heuvel, Kirstin Petersen |

Introduction

Traditional approaches to estimating grape yield typically involve manual cluster counts on a subset of vines, which are then scaled to the whole vineyard. These methods are time-consuming, labor-intensive, and prone to high variability depending on who performs the counts and which vines are selected. Although advanced sensing systems such as LiDAR and multispectral cameras have been explored, their high cost and complexity limit widespread adoption, particularly for smaller vineyard operations.

Methodology

The authors developed a low-cost system that employs a smartphone camera, gimbal, and portable LED lights to capture nighttime video of vines before bloom. This stage of growth was chosen because clusters are visible while foliage remains sparse, increasing detection accuracy and making early yield estimation possible. The videos were processed using a Faster R-CNN object detection model with a ResNet50 backbone, combined with Kernelized Correlation Filter tracking to prevent double counting. The network was pretrained on the COCO dataset and then fine-tuned on thousands of labeled grapevine images. Automated counts were calibrated using a small number of manual counts to account for occlusion and counting errors.

Results

Experiments were conducted at Cornell’s teaching vineyard across two growing seasons on Riesling and Pinot noir vines. Results demonstrated that the automated method significantly outperformed traditional manual counts. On average, the vision-based system achieved an error of 4.9 percent compared to 7.9 percent for manual methods, and its maximum error was nearly half that of human counters. Importantly, the automated approach was more consistent, with less variability tied to which vines were sampled. Labor efficiency was also dramatically improved: achieving equivalent accuracy through manual methods required counting more than fifty panels, while the automated method required calibration on only about twenty panels.

Discussion

The study highlights that counting clusters directly is more reliable than counting shoots, and that the automated method generalizes across grape cultivars and growing seasons. Beyond technical accuracy, the system offers practical advantages in cost, ease of use, and scalability. The entire setup cost only a few hundred dollars, a fraction of other advanced sensing systems, and videos could be processed efficiently on standard hardware or cloud platforms.

Conclusion

This research demonstrates the effectiveness of computer vision and deep learning for accurate and efficient crop yield estimation. The proposed system offers a valuable tool for farmers and agricultural researchers to optimize resource management, improve decision-making, and enhance overall agricultural productivity.