HyperAIHyperAI

Command Palette

Search for a command to run...

vor 5 Tagen
Agent
LLM

MemSlides: Ein hierarchisches, gedächtnisgesteuertes Agent-Framework für die personalisierte Folien-Generierung mit mehrstufiger lokaler Überarbeitung

Ye Jin Yangyang Xu Jun Zhu Yibo Yang

Zusammenfassung

Die personalisierte Generierung von Präsentationen erfordert mehr als die Konditionierung auf einen aktuellen Prompt oder eine Vorlage: agents müssen stabile Nutzerpräferenzen aufgabenübergreifend bewahren, neu eingeführte Präferenzen und Einschränkungen während der mehrstufigen Revision beibehalten sowie lokale Bearbeitungen zuverlässig durchführen. Wir präsentieren MemSlides, ein hierarchisches Speicherframework für personalisierte presentation agents, das den Langzeitspeicher vom Arbeitsgedächtnis trennt und den Langzeitspeicher weiter in einen Nutzerprofil-Speicher und einen Tool-Speicher unterteilt. Der Nutzerprofil-Speicher speichert intent-konditionierte Profile für die round-0-Personalisierung, das Arbeitsgedächtnis überträgt aktive Präferenzen und Sitzungsbeschränkungen über die Revisionsrunden hinweg, und der Tool-Speicher speichert wiederverwendbare Ausführungserfahrungen für eine zuverlässige lokale Bearbeitung. MemSlides kombiniert dieses Speicherdesign mit einer scoped, slide-lokalen Revision, sodass gezielte Aktualisierungen auf die kleinste betroffene Region wirken, anstatt den gesamten Folienstapel wiederholt zu regenerieren. In kontrollierten Experimenten verbessert der Nutzerprofil-Speicher die Bewertungen der Persona-Anpassung auf einer Multi-Persona-, Multi-Intent-Profilbank, die Einspeisung des Tool-Speichers verbessert das Closed-Loop-Modifikationsverhalten in diagnostischen Matched-Pair-Einstellungen, und qualitative Fallbeispiele veranschaulichen die Fähigkeit des Arbeitsgedächtnisses, Präferenzen zu übertragen. Zusammenfassend deuten diese Ergebnisse darauf hin, dass eine effektive Personalisierung bei der Erstellung von Präsentationen davon abhängt, persistente Nutzerprofile, das Arbeitsgedächtnis auf Sitzungsebene und wiederverwendbare Ausführungserfahrungen über Generierungs- und Lokalisierungsrevisionsphasen hinweg zu trennen.

One-sentence Summary

MemSlides is a hierarchical memory-driven agent framework that separates long-term user profile and tool memories from working memory to sustain preferences across multi-turn revision, enabling scoped slide-local editing that targets only affected regions rather than regenerating entire decks, with experiments demonstrating that user profile memory improves persona-alignment judgments on a multi-persona, multi-intent profile bank, tool-memory injection enhances closed-loop modification in diagnostic matched-pair settings, and qualitative cases confirm working memory's preference carryover.

Key Contributions

  • MemSlides introduces a hierarchical memory framework for personalized presentation agents that partitions long-term memory into user profile and tool components while maintaining a distinct working memory for active session constraints. This architectural separation enables stable preference preservation across tasks and reliable retention of new instructions during multi-turn revisions.
  • The method couples this memory design with scoped slide-local revision, allowing targeted updates to modify only the smallest affected regions rather than repeatedly regenerating full decks. This mechanism preserves precise edit scope throughout iterative authoring workflows.
  • Controlled experiments demonstrate that user profile memory improves persona-alignment judgments on a multi-persona, multi-intent profile bank, while tool-memory injection enhances closed-loop modification behavior in diagnostic matched-pair settings. Qualitative evaluations further confirm the working memory's capacity to carry over preferences across extended editing sessions.

Introduction

Automatic presentation generation has advanced into agentic systems capable of producing polished slide decks from natural language prompts, significantly reducing the time and cognitive load required for visual communication. Despite these gains, prior systems struggle with persistent personalization, typically treating user preferences as transient prompts and triggering full-deck regeneration for every edit, which makes multi-turn revisions context-heavy and fragile. To address these limitations, the authors develop MemSlides, a framework that pairs scoped slide locality for targeted multi-turn editing with a hierarchical memory architecture. By explicitly separating long-term user profile and execution memory from short-term working memory, the system accumulates individual styling and organizational preferences across sessions, enabling precise, personalized slide generation and revision without overwhelming context constraints.

Dataset

  • Composition and Sources: The authors build a controlled profile bank containing 30 persona-intent entries, organized from 10 occupation-style personas and three role-intent buckets per persona. All entries are generated from a single shared source material through controlled authoring interactions.

  • Subset Details: Each of the 30 entries functions as a read/write unit for long-term memory and captures structured fields such as slide layout structure, preferred chart types, and content notes. The personas cover diverse professional roles including software developers, marketing managers, medical health services managers, and legislators.

  • Usage and Processing: The paper uses this profile bank as a fixed memory reference for personalized slide generation rather than a traditional training dataset. During inference, the model retrieves the completed profile entry matching the current persona and role intent to guide output generation. A no-injection baseline evaluates performance using the same source material without profile memory injection.

  • Construction and Filtering Rules: The authors apply a two-stage construction process. First, they generate initial profile evidence by prompting the model with role-intent variations over the source material. Second, they run a seeded completion step to fill sparse fields using stable persona prompts, an occupation-grounded preference registry, and existing signals. This step strictly follows an only-fill-empty rule, preserves existing intent-specific data, filters out generic constraints like page limits, and attaches provenance tags to track whether each field originated from the seed prompt, registry, or current profile signals. The process explicitly avoids creating synthetic interaction episodes or template usage records.

Method

The authors propose MemSlides, a hierarchical memory framework designed to treat personalized presentation generation as a stateful, multi-turn authoring process rather than a one-shot conversion task. The system initializes a deck through S0=Ginit(x,Pu,τ)S _ { 0 } = G _ { \mathrm { i n i t } } ( x , P _ { u } , \tau )S0=Ginit(x,Pu,τ), where xxx represents source material, PuP_uPu stores long-term user preferences, and τ\tauτ provides task-specific templates. Subsequent revision cycles operate on a stateful update mechanism defined by:

zt=U(zt1,ft;St1),St=Gedit(St1,x,Pu,τ,zt),t1.z _ { t } = U ( z _ { t - 1 } , f _ { t } ; S _ { t - 1 } ) , \qquad S _ { t } = G _ { \mathrm { e d i t } } ( S _ { t - 1 } , x , P _ { u } , \tau , z _ { t } ) , \quad t \geq 1 .zt=U(zt1,ft;St1),St=Gedit(St1,x,Pu,τ,zt),t1.

This formulation explicitly separates personalization signals by their temporal scope to prevent context drift and preserve already aligned content.

The architecture divides memory into long-term and working layers. Long-term memory is partitioned into user profile memory and tool memory. User profile memory stores intent-conditioned preferences such as visual styles, layout habits, and content density. Rather than injecting these preferences as a static prompt block, the framework routes compatible items into active temporary memory at the start of each session. The active state evolves through revision rounds via At=U(At1,rt)A _ { t } = \mathcal { U } ( A _ { t - 1 } , r _ { t } )At=U(At1,rt), where the update operator appends newly exposed preferences, resolves explicit conflicts, and preserves non-conflicting constraints. Stable interaction signals are consolidated back into the long-term profile only at job completion to filter out transient requests.

Concurrently, tool memory manages the execution reliability of localized edits. The framework structures this component across two temporal granularities, represented as Mt,ktool=(Etround,Et,kop)\mathcal { M } _ { t , k } ^ { \mathrm { t o o l } } = \big ( E _ { t } ^ { \mathrm { r o u n d } } , E _ { t , k } ^ { \mathrm { o p } } \big )Mt,ktool=(Etround,Et,kop). Round-scope experience buffers task-level execution patterns across modify rounds, while operation-scope experience segments raw reasoning and tool observation chains into indexed fragments. These fragments are retrieved before similar future tool calls to minimize backtracking and repeated misuse. Transferable execution patterns are similarly consolidated into long-term storage after each session.

Working memory serves as the session-scoped state layer that orchestrates a constrained Plan-Act-Guard execution cycle. Working memory tracks active constraints, resolved targets, and coverage status to transform each revision request into an explicit execution contract. The planning phase defines the inferred scope, target slide paths, and active rule identifiers. The acting phase selects appropriate editing tools, prioritizing batch CSS updates for shared selectors, semantic batch styling for common semantics, or layout-first patch operations for single-slide modifications. The guarding phase enforces completion as a checked state by binding patches to snapshot content hashes and blocking premature finalization until all targeted regions are verified. This constrained execution approach ensures that localized updates operate on minimal effective scopes while maintaining session-level preference carryover across revision turns.

Experiment

The evaluation assesses MemSlides through controlled persona-alignment judgments and diagnostic matched-pair editing protocols to validate its capacity for personalized presentation generation and precise multi-turn revisions. Findings indicate that integrating user profile memory significantly enhances content, structure, and visual alignment with target personas while maintaining standard presentation quality, as the system leverages long-term preferences for strategic page organization rather than superficial template matching. Additionally, tool-memory injection enables highly targeted localized edits that minimize unintended modifications and streamline the revision workflow. Ultimately, the experiments demonstrate that memory-augmented generation reliably balances personalized content delivery with efficient, precise slide modification.

The authors evaluate the impact of tool-memory injection on localized presentation revision using a diagnostic matched-pair setting. Results indicate that tool memory enhances overall reliability and verification accuracy while significantly reducing the time required for core tool operations. Although improvements in closed-loop completion and first-edit latency are directionally favorable, statistical significance is primarily established for strict verification and core tool time efficiency. Tool-memory injection yields statistically significant improvements in strict verification and core tool time ratio. Closed-loop completion and first-edit latency show directional favorability but lack strong paired statistical evidence. The system achieves more reliable localized editing with less non-inspection tool work compared to the no-injection baseline.

The authors evaluate persona alignment across ten distinct professional roles, comparing their method against a baseline. The results demonstrate that the proposed approach generally achieves higher scores than the baseline across all evaluation dimensions, including Overall, Content, Structure, Visual, and Specificity. The improvements are particularly pronounced in Content and Visual categories for specific personas, indicating that the system effectively tailors presentation elements to match target user profiles. The proposed approach demonstrates superior performance across the majority of personas and evaluation metrics, with significant relative gains in Content and Visual categories. Improvements in Structure and Specificity suggest that the system successfully uses long-term profiles to determine page organization and layout fit rather than relying solely on template matching. The performance advantage is most notable for personas requiring distinct evidence selection or narrative organization, such as Graphic designers and Postsecondary teachers.

The authors demonstrate that local feedback cues from early tasks are generalized into reusable organizational patterns for later presentations. These patterns include structured tables, responsibility schemas, and implementation checklists that persist across multiple jobs. The results indicate that the system successfully converts specific editing instructions into consistent structural templates to maintain coherence over time. Local feedback cues are consolidated into reusable slide-organization patterns across repeated jobs. Specific preferences evolve into structured tables and responsibility schemas for future use. The system maintains consistent structural templates, ensuring coherent long-term presentation generation.

The MemSlides framework demonstrates superior performance in persona alignment compared to DeepPresenter and SlideTailor across multiple large language models. It achieves the highest scores in content, structure, visual presentation, and specificity for GLM-5 and Gemini 3.1 Pro. While GPT-5 shows isolated advantages for baselines in structure and visual metrics, MemSlides maintains a lead in content and specificity. MemSlides achieves the highest scores across all four alignment dimensions for GLM-5 and Gemini 3.1 Pro. For GPT-5, MemSlides leads in content and specificity, while baselines show isolated advantages in structure and visual metrics. The proposed method consistently outperforms SlideTailor and DeepPresenter in content and specificity across all tested models.

The experiment evaluates general presentation quality across different model families, showing that the proposed method generally maintains competitive standards while excelling in specific areas. MemSlides achieves the highest average quality score for GPT-5 and leads in visual style and diversity for Gemini 3.1 Pro. While it demonstrates strong constraint adherence for GPT-5, it shows lower adherence compared to baselines on other models. MemSlides achieves the highest average quality score for GPT-5 and leads in style and diversity for Gemini 3.1 Pro. The method maintains competitive content scores across all model families, often surpassing baseline systems. Constraint adherence is strongest for GPT-5 but lower than baselines for Gemini 3.1 Pro and GLM-5.

The experimental evaluation assesses the framework’s editing efficiency, persona alignment, feedback consolidation, and cross-model performance against established baselines. Results demonstrate that tool-memory injection substantially improves verification reliability and reduces operational overhead while effectively translating localized feedback into consistent, reusable structural templates. By leveraging long-term user profiles, the system successfully tailors content and visual organization to diverse professional personas, maintaining coherence across repeated tasks. Across multiple large language models, the approach consistently outperforms competing methods in content relevance, specificity, and overall presentation quality, highlighting its robust adaptability and long-term generation capabilities.


KI mit KI entwickeln

Von der Idee bis zum Launch – beschleunigen Sie Ihre KI-Entwicklung mit kostenlosem KI-Co-Coding, sofort einsatzbereiter Umgebung und bestem GPU-Preis.

KI-gestütztes kollaboratives Programmieren
Sofort einsatzbereite GPUs
Die besten Preise

HyperAI Newsletters

Abonnieren Sie unsere neuesten Updates
Wir werden die neuesten Updates der Woche in Ihren Posteingang liefern um neun Uhr jeden Montagmorgen
Unterstützt von MailChimp