Command Palette
Search for a command to run...
Segmentation panoptique en un clic : appliquée aux données agricoles
Segmentation panoptique en un clic : appliquée aux données agricoles
Patrick Zimmer Michael Halstead Chris McCool
Déploiement en un clic de Llama-3.3-70B-Instruct
Résumé
Dans le contrôle des mauvaises herbes, l’agriculture de précision peut contribuer à réduire considérablement l’utilisation des herbicides, générant ainsi des avantages économiques et écologiques. Un élément clé à cet égard est la capacité à localiser et à segmenter toutes les plantes (cultivées et adventices) à partir de données image. Les techniques modernes de segmentation d’instances permettent d’y parvenir ; toutefois, l’entraînement de tels systèmes nécessite de grandes quantités de données étiquetées manuellement, dont l’obtention est coûteuse et fastidieuse. L’entraînement faiblement supervisé peut aider à réduire considérablement les efforts et les coûts d’étiquetage. Dans cet article, nous proposons la segmentation panoptique en un clic, un outil hors ligne efficace et précis pour produire des pseudo-étiquettes à partir d’entrées de clics, réduisant ainsi les efforts d’étiquetage lors de la création de nouveaux ensembles de données. Notre approche estime conjointement l’emplacement pixel par pixel de tous les N objets de la scène, par opposition aux approches traditionnelles qui itèrent indépendamment sur chacun des N objets. Cela se traduit par une technique hautement efficace avec des temps d’entraînement considérablement réduits. En utilisant seulement 10 % des données pour entraîner notre approche de segmentation panoptique en un clic, nous obtenons une moyenne d’intersection sur union (IoU) des objets de 68,1 % et 68,8 % sur des ensembles de données d’images de betterave sucrière et de maïs, respectivement, offrant des performances comparables à celles des approches traditionnelles en un clic tout en étant environ 12 fois (d’un ordre de grandeur) plus rapide à entraîner. Nous démontrons l’applicabilité pratique de notre système en générant des pseudo-étiquettes à partir d’annotations par clics pour les 90 % restants des données. Ces pseudo-étiquettes sont ensuite utilisées pour entraîner Mask R-CNN de manière semi-supervisée, améliorant les performances absolues (de l’IoU moyen du premier plan) de 9,4 et 7,9 points pour les données de betterave sucrière et de maïs, respectivement, démontrant ainsi le potentiel de notre approche pour annoter rapidement des données complexes. Enfin, nous montrons que notre technique de segmentation panoptique en un clic est capable de récupérer les clics manqués lors de l’annotation, ce qui constitue un avantage supplémentaire par rapport aux approches traditionnelles.
One-sentence Summary
The authors propose Panoptic One-Click Segmentation, a weakly supervised method that jointly estimates all scene objects to generate click-based pseudo-labels, reducing training time by an order of magnitude while achieving 68.1% and 68.8% mean object IoU on sugar beet and corn datasets with only 10% of the labeled data and subsequently improving Mask R-CNN foreground IoU by 9.4 and 7.9 points in semi-supervised training.
Key Contributions
- This paper introduces a panoptic one-click segmentation framework that generates pseudo-labels from sparse click inputs to reduce manual annotation costs in agricultural plant segmentation.
- The proposed method jointly estimates the pixel-wise locations of all objects in a scene simultaneously, replacing traditional independent iterative processing to substantially reduce training times.
- Evaluations on sugar beet and corn datasets demonstrate that training with only 10% of labeled data achieves 68.1% and 68.8% mean object IoU while operating approximately 12 times faster than baseline methods, and the generated pseudo-labels improve downstream Mask R-CNN foreground IoU by 9.4 and 7.9 points respectively.
Introduction
Precision agriculture depends on accurate plant segmentation to enable targeted weed control and reduce herbicide usage, yet training these vision systems traditionally requires expensive pixel-level annotations. Existing weakly supervised methods that generate pseudo-labels from sparse inputs like single clicks remain computationally inefficient because they process each object independently across multiple forward passes. The authors leverage panoptic segmentation to jointly resolve all objects in a scene from a single click per instance during a single forward pass, creating a highly efficient offline annotation tool. This approach accelerates model training by an order of magnitude and successfully generates high-quality pseudo-labels for the vast majority of agricultural datasets, substantially improving downstream instance segmentation performance while requiring minimal manual effort.
Dataset
- Composition and Sources: The authors evaluate their approach on two agricultural weeding datasets, designated as SB20 and CN20, which contain images featuring multiple, frequently overlapping plant instances across various species and sizes.
- Subset Details: Annotations for both datasets include keypoint or stem locations to serve as interactive click targets. Due to occlusion or border placement, 63 instances in SB20 and 30 instances in CN20 lack explicit keypoints.
- Training Strategy and Splits: The authors implement a semi-supervised workflow that trains on a small fraction of manually labeled data. They allocate 10 percent of each dataset for manual supervision and generate pseudo-labels for the remaining 90 percent using models trained on that initial 10 percent split.
- Processing and Input Generation: For plants without predefined keypoints, the authors calculate the center of mass of the binary mask to determine click coordinates. If that point falls outside the mask, they apply iterative binary erosion until the object disappears and randomly select a coordinate from the penultimate iteration to guarantee an in-bounds location. During training, they add plus or minus 10 pixels of random noise to the click positions to simulate human annotation uncertainty while ensuring the points remain within the target region.
Method
The authors leverage a panoptic segmentation framework to develop a novel one-click segmentation system that jointly estimates the locations of all objects in an image within a single forward pass, significantly reducing computational overhead compared to traditional methods. The proposed approach is built upon Panoptic-Deeplab, a model that combines semantic and instance segmentation by producing three outputs: a semantic segmentation map, a center offset map, and an object center map. The semantic map classifies each pixel into a category, distinguishing between "things" (countable objects such as plants) and "stuff" (non-countable regions like background or textures). The center map identifies the location of each object’s center, while the offset map provides per-pixel displacement vectors toward the nearest object center, enabling pixel assignment to the correct instance during post-processing.
The baseline one-click segmentation method processes each object independently, requiring N forward passes for N objects in an image. This approach uses a Gaussian-encoded click transform map as an additional input channel to the encoder-decoder network, where each click is represented as a 2-D Gaussian with a standard deviation of 8. When multiple objects are present, this procedure is repeated for each positive click, and optionally, negative clicks from other objects are encoded into a secondary click map to improve scene understanding. However, this iterative process is computationally expensive due to repeated processing of the same image.
In contrast, the proposed panoptic one-click segmentation system operates in a single pass by adapting the Panoptic-Deeplab architecture. The input consists of the RGB image and a click transform map, which serves as both the network input and the ground truth for object centers. The network is modified to predict only two outputs: the semantic segmentation map and the center offset map. The object center estimation head is disabled, and the user-provided click locations are directly used as center locations during post-processing, as shown in the framework diagram
. This design eliminates the need for the network to predict object centers, streamlining inference and enabling joint processing of all objects simultaneously.
Furthermore, the system can be extended to recover from annotation errors such as missing clicks by reintroducing the object center estimation head as a third output. In this variant, the network predicts object centers directly, allowing it to estimate missing click locations even when user input is incomplete. The click map remains as an input channel but is no longer used in post-processing, as the network’s predicted centers replace the annotated ones. This adaptation enhances robustness to annotation noise while maintaining the efficiency of a single-pass inference. The overall architecture integrates these components into a unified framework that efficiently processes multiple objects in a single pass, as illustrated in the diagram
.
Experiment
The evaluation comprises three experiments assessing a novel panoptic one-click segmentation framework against traditional methods across agricultural datasets. The first experiment validates general segmentation performance and training efficiency, demonstrating that the panoptic approach inherently prevents object overlaps and trains significantly faster while maintaining comparable accuracy. Subsequent experiments validate the framework's utility in semi-supervised learning and its robustness to missing inputs, showing that it effectively generates high-quality pseudo-labels from minimal annotations and successfully recovers object locations even when the majority of user clicks are absent. Collectively, these findings establish the panoptic method as a highly efficient and resilient tool for rapid dataset creation and annotation recovery.
The authors compare semi-supervised instance segmentation performance using different one-click segmentation methods as pseudo-label sources. Results show that using pseudo-labels generated from one-click models improves performance over using only a small fraction of manually annotated data, with the panoptic approach achieving competitive results. The panoptic system demonstrates robustness to missing input clicks, maintaining recognition quality even when a significant portion of clicks are absent. Using pseudo-labels from one-click models improves instance segmentation performance compared to using only a small manually annotated subset. The panoptic one-click system achieves performance close to fully supervised baselines and outperforms standard one-click methods in some cases. The panoptic system maintains recognition quality even when a large portion of input clicks are missing, indicating robustness to missing annotations.
The authors compare traditional one-click segmentation methods with a proposed panoptic one-click approach, evaluating performance on two datasets. Results show that the panoptic system achieves competitive segmentation accuracy while being significantly faster to train, and it demonstrates robustness to missing input clicks. The panoptic method also performs well in semi-supervised learning tasks, generating pseudo-labels that improve instance segmentation performance. The panoptic one-click system achieves competitive segmentation performance compared to traditional methods while being substantially faster to train. The panoptic approach reduces overlapping errors common in traditional one-click systems and shows robustness when input clicks are missing. In semi-supervised learning, the panoptic system generates pseudo-labels that significantly improve instance segmentation performance over using only a small fraction of manually annotated data.
The authors evaluate a panoptic one-click segmentation system's ability to recover from missing input clicks by comparing performance metrics under varying levels of missing clicks. Results show that the system maintains high recognition quality even when a significant portion of clicks are missing, with only a gradual decline in performance as the percentage of missing clicks increases. The system demonstrates robustness in estimating object centers, as predicted centers yield results close to those with user-provided clicks. The panoptic system maintains high recognition quality even when a large percentage of input clicks are missing. Performance degrades gradually as the percentage of missing clicks increases, with recognition quality remaining substantial at 75% missing clicks. The system achieves recognition quality comparable to user-provided clicks when using network-predicted centers, indicating strong object localization capability.
The experiments compare the proposed panoptic one-click segmentation system against traditional and fully supervised baselines across two datasets to evaluate training efficiency, segmentation accuracy, and resilience to incomplete input. The first set of tests validates the system's effectiveness in semi-supervised instance segmentation, demonstrating that its generated pseudo-labels substantially improve performance over minimal manual annotations and approach fully supervised levels. Additional evaluations assess the model's ability to handle missing user interactions, confirming that it maintains high recognition quality, minimizes overlapping errors, and accurately estimates object centers even when most clicks are absent. Overall, the findings establish the panoptic approach as a faster, more accurate, and highly robust alternative for interactive segmentation tasks.