2달 전

Alaa Zniber Mounir Ghogho Ouassim Karrakchou Mehdi Zakroum

초록

사물인터넷(IoT)이 웨어러블 기기, 스마트 빌딩, 연결된 장비 등에 센서가 점점 더 내장되면서 다양한 분야를 혁신하고 있습니다. 딥러닝은 IoT 데이터에서 귀중한 통찰력을 도출할 수 있게 해주지만, 기존 모델들은 리소스가 제한된 엣지 디바이스에는 계산량이 너무 많아 적용하기 어렵습니다. 또한 개인정보 보호 문제와 실시간 처리 요구사항으로 인해 클라우드 기반 솔루션보다 로컬 계산이 필수적입니다. 뇌의 에너지 효율성에 영감을 받아, 성능 임계값에 도달하면 계산을 동적으로 중단하는 조기 종료(early exiting) 기능을 갖춘 얕은 양방향 예측 코딩 네트워크를 제안합니다. 이 방식은 높은 정확도를 유지하면서도 메모리 사용량과 계산 오버헤드를 크게 줄입니다. 우리는 CIFAR-10 데이터셋을 통해 제안한 접근법의 유효성을 검증했습니다. 실험 결과, 본 모델은 매개변수 수가 현저히 적고 계산 복잡도가 낮음에도 불구하고 심층 네트워크와 유사한 성능을 달성하여, 효율적인 엣지 AI 를 위한 생물학적 영감 아키텍처의 잠재력을 입증했습니다.

One-sentence Summary

Researchers from the International University of Rabat, University Mohammed VI Polytechnic, and the University of Leeds propose EE-PCN, a shallow bidirectional predictive coding network with early exiting that dynamically halts computation to achieve deep-network accuracy with minimal memory and FLOPs for extreme edge AI.

Key Contributions

The paper introduces a new derivation of predictive coding cycling rules for bidirectional networks that effectively implements both feedback and feedforward update mechanisms.
A shallow predictive coding network is designed to achieve accuracy comparable to deeper models while significantly reducing the memory footprint for deployment on extreme-edge devices.
The method incorporates a dynamic early-exiting mechanism and knowledge distillation across cycles to adaptively adjust the number of operations, thereby improving inference efficiency and the performance of early exits.

Introduction

The rise of IoT in sectors like health monitoring and smart cities demands real-time data processing on resource-constrained edge devices, yet conventional deep learning models are too computationally heavy and memory-intensive for these environments. While Predictive Coding Networks (PCNs) offer biologically inspired efficiency, prior implementations often double parameter counts compared to standard models and lack adaptive mechanisms, forcing them to perform unnecessary computations on simple inputs. To address these challenges, the authors propose a shallow bidirectional PCN that integrates an early exiting mechanism to dynamically halt inference once a performance threshold is met. This approach leverages knowledge distillation across cycles to maintain high accuracy while drastically reducing memory footprint and computational overhead, making it suitable for extreme edge deployment.

Dataset

The authors use the CIFAR-10 dataset, which contains 60,000 32x32 RGB images evenly distributed across 10 classes to simulate low-resolution IoT applications like surveillance and smart farming.
The dataset is split into a training set of 50,000 images and a test set of 10,000 images.
Data augmentation is applied to the training set using random translation and horizontal flipping.
The training data is processed into batches of 128 images for model learning.

Method

The authors propose a Predictive Coding Network (PCN) model enhanced with early exiting capabilities to optimize inference efficiency. The architecture consists of a shared backbone serving as a feature extractor, along with multiple downstream task classifiers. The backbone is designed as a bidirectional hierarchy of convolutional and deconvolutional layers.

As shown in the figure below:

In this framework, blue arrows denote the forward convolutional pass, while red arrows indicate the feedback deconvolutions used to reduce local errors. During inference, the model performs a variable number of cycles, $t \leq T$ , over the backbone to iteratively minimize local prediction errors across all layers. Once the cycling process concludes, the final layer feature vector is passed to the classifier corresponding to the current cycle count $t$ , indicated by the green arrow. The classification confidence is then compared against a predefined user threshold. If the confidence exceeds the threshold, the inference is terminated and a response is returned. Otherwise, another cycle is initiated, followed by another classification and threshold comparison.

The architecture employs $T$ distinct classifiers rather than a single classifier shared across all cycles. This decision is driven by the evolving nature of feature representations throughout the iterative process. Since feature vectors undergo continuous refinement from one cycle to the next, a classifier trained on feature representations from a five-cycle model would be unable to accurately interpret the patterns extracted by a one-cycle model for the same input.

To derive the PC update rules, the authors apply gradient descent to minimize the local errors at each pass. Let $\mathbf{r}_l(t)$ denote the feature representation at convolution layer $l$ and cycle $t$ . The representation at layer $l=0$ is fixed as the input image. For $t=0$ , all feature representations are initialized through a standard feedforward pass: $\mathbf { r } _ { l } ( 0 ) = \phi ( \mathbf { W } _ { l - 1 , l } \mathbf { r } _ { l - 1 } ( 0 ) ) , \qquad l = 1, \cdots, L$ where $\phi$ is a nonlinear activation function, assumed to be ReLU in the experiments.

The feedback pass update rule governs a process in which the higher-layer representation, $\mathbf{r}_{l+1}(t)$ , generates a top-down prediction of the lower-layer representation, $\mathbf{r}_{l}(t)$ , denoted by $\mathbf{p}_{l}(t)$ . This prediction is given by: $\mathbf { p } _ { l } ( t ) = \phi \left[ \mathbf { W } _ { l + 1 , l } \mathbf { r } _ { l + 1 } ( t ) \right]$ The update is carried out by minimizing the local error, defined as $\epsilon _ { l } ( t ) = \frac { 1 } { 2 } \left| \left| \mathbf { r } _ { l } ( t ) - \mathbf { p } _ { l } ( t ) \right| \right| _ { 2 } ^ { 2 }$ . The feedback update rule, computed at the midpoint $t+1/2$ , is expressed as: $\mathbf { r } _ { l } ( t + 1 / 2 ) = ( 1 - \alpha _ { l } ) \mathbf { r } _ { l } ( t ) + \alpha _ { l } \phi \left[ \mathbf { W } _ { l + 1 , l } \mathbf { r } _ { l + 1 } ( t ) \right]$ The representation of the last layer remains unaffected during the feedback pass by design.

The feed-forward pass update rule governs a process in which the lower-layer representation generates a bottom-up prediction, which is then used to update the upper-layer representation. The feed-forward prediction is given by: $\mathbf { p } _ { l } ( t + 1 / 2 ) = \phi [ \mathbf { W } _ { l - 1 , l } \mathbf { r } _ { l - 1 } ( t + 1 / 2 ) ]$ This results in the following feed-forward update rule: $\mathbf { r } _ { l } ( t + 1 ) = ( 1 - \beta _ { l } ) \mathbf { r } _ { l } ( t + 1 / 2 ) + \beta _ { l } \phi [ \mathbf { W } _ { l - 1 , l } \mathbf { r } _ { l - 1 } ( t + 1 / 2 ) ]$ Unlike prior formulations that rely solely on feedback convolution weight matrices, this formulation integrates both top-down and bottom-up predictions, leading to a more comprehensive update mechanism.

Regarding training, the classification task is formulated as a multi-objective optimization problem where $T$ losses, denoted as $\mathcal{L}_i$ , compete over the shared weights. The authors address this using scalarization, transforming the problem into a single-objective optimization through a weighted average. Furthermore, they incorporate Kullback-Leibler (KL) divergence, denoted as $\mathcal{KD}$ , between intermediate logits and the final-cycle logits to facilitate knowledge distillation. In this framework, the deepest network acts as the teacher, while the preceding shallow sub-networks serve as students. The total loss is expressed as: $\mathcal { L } _ { \mathrm { t o t } } = \rho \sum _ { i = 1 } ^ { T } \lambda _ { i } \mathcal { L } _ { i } + ( 1 - \rho ) \sum _ { i = 1 } ^ { T - 1 } \mathcal { K D } ( \hat { \mathbf { y } } _ { i } , \hat { \mathbf { y } } _ { T } )$ where $\lambda_{i}$ is a positive weighting factor for the loss function $\mathcal{L}_{i}$ , $\hat{\mathbf{y}}_{i}$ represents the logit vector from classifier $i$ , and $\rho$ is a balancing factor.

The model design leverages PC dynamics to develop shallow networks capable of running on extreme edge devices. The models are based on VGG-like architectures where all convolutions use a 3×3 kernel with a stride of 1 and are followed by a ReLU activation function. Whenever the number of channels changes, max-pooling is applied in the feed-forward direction or upsampling in the feedback direction with a 2×2 kernel. Finally, the early exit classifiers are implemented as simple linear layers to ensure minimal overhead.

Experiment

Experiments validate that recursive processing with PC update rules in shallow models achieves competitive performance on extreme edge devices, outperforming edge-specific baselines and approaching VGG-11 accuracy with significantly fewer parameters.
Results demonstrate that additional processing cycles enhance model expressivity, allowing shallow architectures to better learn complex patterns and distinguish difficult classes.
Integrating an early exiting mechanism significantly reduces computational load and energy consumption, with high-confidence thresholds enabling the model to exit early for most inputs while maintaining accuracy.
The proposed models meet strict memory constraints of frugal microcontrollers, and their recursive nature ensures lower FLOP counts than deep networks for a large portion of the dataset, facilitating extended battery life.
Comparisons confirm that predictive coding rules combining top-down and bottom-up predictions outperform equivalent feed-forward CNNs.

소스 PDF

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩

바로 사용 가능한 GPU

최적의 가격

시작하기 가격 보기

HyperAI Newsletters

최신 정보 구독하기

한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다

이메일 서비스 제공: MailChimp

2달 전

Alaa Zniber Mounir Ghogho Ouassim Karrakchou Mehdi Zakroum

초록

One-sentence Summary

Key Contributions

The paper introduces a new derivation of predictive coding cycling rules for bidirectional networks that effectively implements both feedback and feedforward update mechanisms.
A shallow predictive coding network is designed to achieve accuracy comparable to deeper models while significantly reducing the memory footprint for deployment on extreme-edge devices.
The method incorporates a dynamic early-exiting mechanism and knowledge distillation across cycles to adaptively adjust the number of operations, thereby improving inference efficiency and the performance of early exits.

Introduction

Dataset

The authors use the CIFAR-10 dataset, which contains 60,000 32x32 RGB images evenly distributed across 10 classes to simulate low-resolution IoT applications like surveillance and smart farming.
The dataset is split into a training set of 50,000 images and a test set of 10,000 images.
Data augmentation is applied to the training set using random translation and horizontal flipping.
The training data is processed into batches of 128 images for model learning.

Method

As shown in the figure below:

Experiment

Experiments validate that recursive processing with PC update rules in shallow models achieves competitive performance on extreme edge devices, outperforming edge-specific baselines and approaching VGG-11 accuracy with significantly fewer parameters.
Results demonstrate that additional processing cycles enhance model expressivity, allowing shallow architectures to better learn complex patterns and distinguish difficult classes.
Integrating an early exiting mechanism significantly reduces computational load and energy consumption, with high-confidence thresholds enabling the model to exit early for most inputs while maintaining accuracy.
The proposed models meet strict memory constraints of frugal microcontrollers, and their recursive nature ensures lower FLOP counts than deep networks for a large portion of the dataset, facilitating extended battery life.
Comparisons confirm that predictive coding rules combining top-down and bottom-up predictions outperform equivalent feed-forward CNNs.

소스 PDF

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩

바로 사용 가능한 GPU

최적의 가격

시작하기 가격 보기

HyperAI Newsletters

최신 정보 구독하기

한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다

이메일 서비스 제공: MailChimp

Command Palette

에지 AI 를 위한 조기 탈출 예측 코딩 신경망

Alaa Zniber Mounir Ghogho Ouassim Karrakchou Mehdi Zakroum

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Command Palette

에지 AI 를 위한 조기 탈출 예측 코딩 신경망

Alaa Zniber Mounir Ghogho Ouassim Karrakchou Mehdi Zakroum

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Command Palette

에지 AI 를 위한 조기 탈출 예측 코딩 신경망

Alaa Zniber Mounir Ghogho Ouassim Karrakchou Mehdi Zakroum

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters