arXiv:2511.05203 — Submitted for review

Beyond Master and Apprentice

Grounding Foundation Models for Symbiotic Interactive Learning in a Shared Latent Space

Linus Nwankwo*, Björn Ellensohn, Christian Rauch, Elmar Rueckert
Chair of Cyber-Physical Systems, Technical University of Leoben, Austria

Abstract

0%
Task Completion Rate
Avg. (↑ better)
ρ ≈ 0
Belief Alignment
Avg. (↑ better)
0
Clarification Efficiency
Avg. (↓ better)

The Master-Apprentice Problem

State-of-the-art language-conditioned HRI frameworks treat communication as a unidirectional process. SIL fundamentally changes this dynamic.

Traditional SIL

Master → Apprentice

Unidirectional, reactive, no learning

👤
Human Full burden
Commands only
🤖
Agent Passive executor
Agent is a passive executor — no memory of prior interactions
Excessive corrective burden on the human partner
Human bears the entire reasoning burden
No reciprocal learning — agent never contributes
Avg. task completion: 60.1% (Static LLM baseline)

Symbiotic ↔ Co-Adaptive

Bidirectional, proactive, evolving

👤
Human Co-adapts
Shared latent space
🤖
Agent Co-adapts
Bidirectional belief alignment — iteratively updated shared beliefs
Proactive clarification — agent seeks disambiguation when needed
Episodic + semantic memory — retains learned preferences
EWC anti-forgetting — Fisher information safeguards
Avg. task completion: 90.4% (Full SIL)

Key Contributions

SIL introduces several novel components that together enable co-adaptive human-robot interaction.

01

Characterisation of the Master-Apprentice Problem

We identify and formalise the unidirectional learning problem in language-conditioned HRI, where the agent maintains a static belief BAstatic with ∂θ/∂t = 0, imposing the entire alignment burden on the human.

02

Shared Latent Task Space Formalisation

We modelled co-adaptation as belief-state evolution. Both human and agent maintain structured belief states BH and BA that co-evolve within Z ⊆ ℝd, each modulated by the other's latent embedding via learned influence vectors.

03

Grounded Foundation Model Pipeline

We employed pre-trained FMs (SAM for zero-shot segmentation, CLIP for vision-language alignment) for spatial perception, paired with a lightweight latent encoder ϕ : ℝ768 → Z. GPT-4o provides ensemble-based reasoning and uncertainty quantification.

04

Memory Architecture with EWC Safeguards

SIL employs dual-component memory (episodic buffer + semantic consolidation) augmented with Elastic Weight Consolidation (EWC). EWC estimates parameter importance via Fisher information F(k) to prevent catastrophic forgetting (λ = 1000) of learned tasks representations.

Demo Videos

Simulation Demo

SIL agent in Gazebo with Unitree Go1 quadruped

Real-World Demo

SIL agent on physical robot

SIL Processing Pipeline

Click each stage to see how data flows through the system. Labels correspond to the model architecture shown below.

Natural Language Input

Architecture

SIL Architecture diagram showing the full pipeline from Fig. 2
Click to enlarge
01

Shared Latent Task Space

Belief states BH and BA co-evolve within Z ⊆ ℝd (d=256). Bidirectional influence via learned weight matrices WHA, WAH. Alignment measured by ρt; clarification triggered when ρ < τmis=0.6.

02

Grounded Foundation Models

SAM + CLIP for zero-shot segmentation and open-vocabulary recognition with dual-fidelity filtering. GPT-4o ensemble (K temperatures) for reasoning. Lightweight encoder ϕ : ℝ768 → Z bridges perception to task space.

03

Dual Memory Architecture

Episodic memory for interaction-specific traces (2000 episodes, 60 days). Semantic memory consolidates patterns. Belief-aware retrieval balances semantic similarity (ws=0.6) and belief alignment (wb=0.4).

04

EWC Anti-Forgetting

Elastic Weight Consolidation estimates parameter importance via Fisher information F(k). Task-shift detection via performance windows (10/20 episodes). Importance coefficient λ=1000 balances plasticity and stability.

Evaluation Dimensions

Results

Toggle ablation variants to compare against full SIL. Bars show averaged TCR (%) per domain.

Qualitative Visualisations

Yellow paths show agent trajectories. Use arrow keys or click to navigate.

Interactions

Int 1 EIF Multistep Navigation

1/14

FAQ

Resources & Citation

BibTeX

@article{nwankwo2025beyond,
  title={Beyond Master and Apprentice: Grounding Foundation 
         Models for Symbiotic Interactive Learning in a 
         Shared Latent Space},
  author={Nwankwo, Linus and Ellensohn, Bj{\"o}rn and 
          Rauch, Christian and Rueckert, Elmar},
  journal={arXiv preprint arXiv:2511.05203},
  year={2025}
}

Updates

March 2026Manuscript submitted for publication review.