Submitted for review - Codes will be released upon publication.

SIL: Symbiotic Interactive Learning for Language-Conditioned Human-Agent Co-Adaptation

A Framework for Bidirectional Learning and Co-adaptation within a Shared Latent Space in Natural Language-Conditioned HRI

Linus Nwankwo^*, Björn Ellensohn, Christian Rauch, Elmar Rueckert
Chair of Cyber-Physical Systems, Technical University of Leoben, Austria

Paper arXiv GitHub Video CC BY 4.0

Abstract

The Master-Apprentice Problem

State-of-the-art language-conditioned HRI frameworks treat communication as a unidirectional process. SIL fundamentally changes this dynamic.

Traditional SIL

Master → Apprentice

Unidirectional, reactive, no learning

👤

Human Full burden

Commands only

🤖

Agent Passive executor

✗Agent is a passive executor — no memory of prior interactions

✗Excessive corrective burden on the human partner

✗Human bears the entire reasoning burden

✗No reciprocal learning — agent never contributes

Avg. task completion: 60.1% (Static LLM baseline)

Symbiotic ↔ Co-Adaptive

Bidirectional, proactive, evolving

👤

Human Co-adapts

Shared latent space

🤖

Agent Co-adapts

✓Bidirectional belief alignment — iteratively updated shared beliefs

✓Proactive clarification — agent seeks disambiguation when needed

✓Episodic + semantic memory — retains learned preferences

✓EWC anti-forgetting — Fisher information safeguards

Avg. task completion: 90.4% (Full SIL)

Key Contributions

SIL introduces several novel components that together enable co-adaptive human-robot interaction.

Characterisation of the Master-Apprentice Problem

We identify and formalise the unidirectional learning problem in language-conditioned HRI, where the agent maintains a static belief B^A_static with ∂θ/∂t = 0, imposing the entire alignment burden on the human.

Shared Latent Task Space Formalisation

We modelled co-adaptation as belief-state evolution. Both human and agent maintain structured belief states B^H and B^A that co-evolve within Z ⊆ ℝ^d, each modulated by the other's latent embedding via learned influence vectors.

Grounded Foundation Model Pipeline

We employed pre-trained FMs (SAM for zero-shot segmentation, CLIP for vision-language alignment) for spatial perception, paired with a lightweight latent encoder ϕ : ℝ⁷⁶⁸ → Z. GPT-4o provides ensemble-based reasoning and uncertainty quantification.

Memory Architecture with EWC Safeguards

SIL employs dual-component memory (episodic buffer + semantic consolidation) augmented with Elastic Weight Consolidation (EWC). EWC estimates parameter importance via Fisher information F^(k) to prevent catastrophic forgetting (λ = 1000) of learned tasks representations.

Demo Videos

Simulation Demo

SIL agent in Gazebo with Unitree Go1 quadruped

Real-World Demo

SIL agent on physical robot

Architecture

Click to enlarge

Shared Latent Task Space

Belief states B^H and B^A co-evolve within Z ⊆ ℝ^d (d=256). Bidirectional influence via learned weight matrices W_HA, W_AH. Alignment measured by ρ_t; clarification triggered when ρ < τ_mis=0.6.

Grounded Foundation Models

SAM + CLIP for zero-shot segmentation and open-vocabulary recognition with dual-fidelity filtering. GPT-4o ensemble (K temperatures) for reasoning. Lightweight encoder ϕ : ℝ⁷⁶⁸ → Z bridges perception to task space.

Dual Memory Architecture

Episodic memory for interaction-specific traces (2000 episodes, 60 days). Semantic memory consolidates patterns. Belief-aware retrieval balances semantic similarity (w_s=0.6) and belief alignment (w_b=0.4).

EWC Anti-Forgetting

Elastic Weight Consolidation estimates parameter importance via Fisher information F^(k). Task-shift detection via performance windows (10/20 episodes). Importance coefficient λ=1000 balances plasticity and stability.

Evaluation Dimensions

We conducted a total of 350 interaction episodes distributed across the five task domains below: EIF (n = 120), MIIR (n = 60), QOR (n = 80), PDS (n = 40), and LPL (n = 50). Each experiment was repeated over 5 independent runs to account for variability in LLM sampling and encoder initialisation.

Results

Toggle ablation variants to compare against full SIL. Bars show averaged TCR (%) per domain.

Performance of SIL Across Task Domains

TCR = Task Completion Rate, CE = Clarification Efficiency (↓ better), BA = Belief Alignment

Click headers to sort

Metric ↕	EIF ↕	MIIR ↕	QOR ↕	PDS ↕	LPL ↕

Ablation Study on SIL's Core Architecture

Metrics averaged across all task categories

Click headers to sort

Model Variant ↕	TCR (%) ↑ ↕	CE ↓ ↕	BA (ρ) ↑ ↕	Δ TCR

Task Success Rate

Belief Alignment Plot

Resources & Citation

📄

BibTeX

@article{nwankwo2025beyond,
  title={SIL: Symbiotic Interactive Learning for Language-Conditioned Human-Agent Co-Adaptation},
  author={Nwankwo, Linus and Ellensohn, Bj{\"o}rn and 
          Rauch, Christian and Rueckert, Elmar},
  journal={arXiv preprint arXiv:2511.05203},
  year={2025}
}

Updates

March 2026Manuscript submitted for publication review.

SIL: Symbiotic Interactive Learning for Language-Conditioned Human-Agent Co-Adaptation

A Framework for Bidirectional Learning and Co-adaptation within a Shared Latent Space in Natural Language-Conditioned HRI

Abstract

The Master-Apprentice Problem

Master → Apprentice

Symbiotic ↔ Co-Adaptive

Key Contributions

Characterisation of the Master-Apprentice Problem

Shared Latent Task Space Formalisation

Grounded Foundation Model Pipeline

Memory Architecture with EWC Safeguards

Demo Videos

SIL Processing Pipeline

Natural Language Input

Architecture

Shared Latent Task Space

Grounded Foundation Models

Dual Memory Architecture

EWC Anti-Forgetting

Evaluation Dimensions

Embodied Instruction Following

Memory-Based Info Retrieval

Query-Oriented Reasoning

Proactive Dialogue & Suggestion

Long-Term Preference Learning

Results

Belief Alignment (ρ) Over Interaction Turns

Performance of SIL Across Task Domains

Ablation Study on SIL's Core Architecture

Task Success Rate

Belief Alignment Plot

Qualitative Visualisations

Interactions

FAQ

Resources & Citation

Paper

Code

Video

BibTeX

Updates