HyperAIHyperAI

Command Palette

Search for a command to run...

a year ago

Panoptic One-Click Segmentation: Applied to Agricultural Data

Patrick Zimmer Michael Halstead Chris McCool

One-click Deployment of Llama-3.3-70B-Instruct

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)
Go to Notebook

Abstract

In weed control, precision agriculture can help to greatly reduce the use of herbicides, resulting in both economical and ecological benefits. A key element here is the ability to locate and segment all the plants (crop & weed) from image data. Modern instance segmentation techniques can achieve this, however, training such systems requires large amounts of hand-labelled data which is expensive and laborious to obtain. Weakly supervised training can help to greatly reduce labelling efforts and costs. In this paper we propose panoptic one-click segmentation, an efficient and accurate offline tool to produce pseudo-labels from click inputs and thereby reduce labelling effort when creating novel datasets. Our approach jointly estimates the pixel-wise location of all N objects in the scene, compared to traditional approaches which iterate independently through all N objects. This results in a highly efficient technique with greatly reduced training times. Using just 10% of the data to train our panoptic one-click segmentation approach yields 68.1% and 68.8% mean object intersection over union (IoU) on challenging sugar beet and corn image data respectively, providing comparable performance to traditional one-click approaches while being approximately 12 times (an order of magnitude) faster to train. We demonstrate the practical applicability of our system by generating pseudo-labels from click annotations for the remaining 90% of the data. These pseudo-labels are then used to train Mask R-CNN, in a semi-supervised manner, improving the absolute performance (of mean foreground IoU) by 9.4 and 7.9 points for sugar beet and corn data respectively, demonstrating the potential of our approach to rapidly annotate challenging data. Finally, we show that our panoptic one-click segmentation technique is able to recover missed clicks during annotation outlining a further benefit over traditional approaches.

One-sentence Summary

The authors propose Panoptic One-Click Segmentation, a weakly supervised method that jointly estimates all scene objects to generate click-based pseudo-labels, reducing training time by an order of magnitude while achieving 68.1% and 68.8% mean object IoU on sugar beet and corn datasets with only 10% of the labeled data and subsequently improving Mask R-CNN foreground IoU by 9.4 and 7.9 points in semi-supervised training.

Key Contributions

  • This paper introduces a panoptic one-click segmentation framework that generates pseudo-labels from sparse click inputs to reduce manual annotation costs in agricultural plant segmentation.
  • The proposed method jointly estimates the pixel-wise locations of all objects in a scene simultaneously, replacing traditional independent iterative processing to substantially reduce training times.
  • Evaluations on sugar beet and corn datasets demonstrate that training with only 10% of labeled data achieves 68.1% and 68.8% mean object IoU while operating approximately 12 times faster than baseline methods, and the generated pseudo-labels improve downstream Mask R-CNN foreground IoU by 9.4 and 7.9 points respectively.

Introduction

Precision agriculture depends on accurate plant segmentation to enable targeted weed control and reduce herbicide usage, yet training these vision systems traditionally requires expensive pixel-level annotations. Existing weakly supervised methods that generate pseudo-labels from sparse inputs like single clicks remain computationally inefficient because they process each object independently across multiple forward passes. The authors leverage panoptic segmentation to jointly resolve all objects in a scene from a single click per instance during a single forward pass, creating a highly efficient offline annotation tool. This approach accelerates model training by an order of magnitude and successfully generates high-quality pseudo-labels for the vast majority of agricultural datasets, substantially improving downstream instance segmentation performance while requiring minimal manual effort.

Dataset

  • Composition and Sources: The authors evaluate their approach on two agricultural weeding datasets, designated as SB20 and CN20, which contain images featuring multiple, frequently overlapping plant instances across various species and sizes.
  • Subset Details: Annotations for both datasets include keypoint or stem locations to serve as interactive click targets. Due to occlusion or border placement, 63 instances in SB20 and 30 instances in CN20 lack explicit keypoints.
  • Training Strategy and Splits: The authors implement a semi-supervised workflow that trains on a small fraction of manually labeled data. They allocate 10 percent of each dataset for manual supervision and generate pseudo-labels for the remaining 90 percent using models trained on that initial 10 percent split.
  • Processing and Input Generation: For plants without predefined keypoints, the authors calculate the center of mass of the binary mask to determine click coordinates. If that point falls outside the mask, they apply iterative binary erosion until the object disappears and randomly select a coordinate from the penultimate iteration to guarantee an in-bounds location. During training, they add plus or minus 10 pixels of random noise to the click positions to simulate human annotation uncertainty while ensuring the points remain within the target region.

Method

The authors leverage a panoptic segmentation framework to develop a novel one-click segmentation system that jointly estimates the locations of all objects in an image within a single forward pass, significantly reducing computational overhead compared to traditional methods. The proposed approach is built upon Panoptic-Deeplab, a model that combines semantic and instance segmentation by producing three outputs: a semantic segmentation map, a center offset map, and an object center map. The semantic map classifies each pixel into a category, distinguishing between "things" (countable objects such as plants) and "stuff" (non-countable regions like background or textures). The center map identifies the location of each object’s center, while the offset map provides per-pixel displacement vectors toward the nearest object center, enabling pixel assignment to the correct instance during post-processing.

The baseline one-click segmentation method processes each object independently, requiring NNN forward passes for NNN objects in an image. This approach uses a Gaussian-encoded click transform map as an additional input channel to the encoder-decoder network, where each click is represented as a 2-D Gaussian with a standard deviation of 8. When multiple objects are present, this procedure is repeated for each positive click, and optionally, negative clicks from other objects are encoded into a secondary click map to improve scene understanding. However, this iterative process is computationally expensive due to repeated processing of the same image.

In contrast, the proposed panoptic one-click segmentation system operates in a single pass by adapting the Panoptic-Deeplab architecture. The input consists of the RGB image and a click transform map, which serves as both the network input and the ground truth for object centers. The network is modified to predict only two outputs: the semantic segmentation map and the center offset map. The object center estimation head is disabled, and the user-provided click locations are directly used as center locations during post-processing, as shown in the framework diagram . This design eliminates the need for the network to predict object centers, streamlining inference and enabling joint processing of all objects simultaneously.

Furthermore, the system can be extended to recover from annotation errors such as missing clicks by reintroducing the object center estimation head as a third output. In this variant, the network predicts object centers directly, allowing it to estimate missing click locations even when user input is incomplete. The click map remains as an input channel but is no longer used in post-processing, as the network’s predicted centers replace the annotated ones. This adaptation enhances robustness to annotation noise while maintaining the efficiency of a single-pass inference. The overall architecture integrates these components into a unified framework that efficiently processes multiple objects in a single pass, as illustrated in the diagram .

Experiment

The evaluation comprises three experiments assessing a novel panoptic one-click segmentation framework against traditional methods across agricultural datasets. The first experiment validates general segmentation performance and training efficiency, demonstrating that the panoptic approach inherently prevents object overlaps and trains significantly faster while maintaining comparable accuracy. Subsequent experiments validate the framework's utility in semi-supervised learning and its robustness to missing inputs, showing that it effectively generates high-quality pseudo-labels from minimal annotations and successfully recovers object locations even when the majority of user clicks are absent. Collectively, these findings establish the panoptic method as a highly efficient and resilient tool for rapid dataset creation and annotation recovery.

The authors compare semi-supervised instance segmentation performance using different one-click segmentation methods as pseudo-label sources. Results show that using pseudo-labels generated from one-click models improves performance over using only a small fraction of manually annotated data, with the panoptic approach achieving competitive results. The panoptic system demonstrates robustness to missing input clicks, maintaining recognition quality even when a significant portion of clicks are absent. Using pseudo-labels from one-click models improves instance segmentation performance compared to using only a small manually annotated subset. The panoptic one-click system achieves performance close to fully supervised baselines and outperforms standard one-click methods in some cases. The panoptic system maintains recognition quality even when a large portion of input clicks are missing, indicating robustness to missing annotations.

The authors compare traditional one-click segmentation methods with a proposed panoptic one-click approach, evaluating performance on two datasets. Results show that the panoptic system achieves competitive segmentation accuracy while being significantly faster to train, and it demonstrates robustness to missing input clicks. The panoptic method also performs well in semi-supervised learning tasks, generating pseudo-labels that improve instance segmentation performance. The panoptic one-click system achieves competitive segmentation performance compared to traditional methods while being substantially faster to train. The panoptic approach reduces overlapping errors common in traditional one-click systems and shows robustness when input clicks are missing. In semi-supervised learning, the panoptic system generates pseudo-labels that significantly improve instance segmentation performance over using only a small fraction of manually annotated data.

The authors evaluate a panoptic one-click segmentation system's ability to recover from missing input clicks by comparing performance metrics under varying levels of missing clicks. Results show that the system maintains high recognition quality even when a significant portion of clicks are missing, with only a gradual decline in performance as the percentage of missing clicks increases. The system demonstrates robustness in estimating object centers, as predicted centers yield results close to those with user-provided clicks. The panoptic system maintains high recognition quality even when a large percentage of input clicks are missing. Performance degrades gradually as the percentage of missing clicks increases, with recognition quality remaining substantial at 75% missing clicks. The system achieves recognition quality comparable to user-provided clicks when using network-predicted centers, indicating strong object localization capability.

The experiments compare the proposed panoptic one-click segmentation system against traditional and fully supervised baselines across two datasets to evaluate training efficiency, segmentation accuracy, and resilience to incomplete input. The first set of tests validates the system's effectiveness in semi-supervised instance segmentation, demonstrating that its generated pseudo-labels substantially improve performance over minimal manual annotations and approach fully supervised levels. Additional evaluations assess the model's ability to handle missing user interactions, confirming that it maintains high recognition quality, minimizes overlapping errors, and accurately estimates object centers even when most clicks are absent. Overall, the findings establish the panoptic approach as a faster, more accurate, and highly robust alternative for interactive segmentation tasks.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp