Distilling LLMs: Training an In-House Model with a Mature One for Retrieval-Augmented Generation
June 25 @ 7:00 pm - 8:00 pm
This presentation outlines a practical pipeline for distilling large language models into compact, in-house alternatives using mature teacher models. We focus on retrieval-augmented generation workflows, where distilled students must learn context grounding, citation discipline, and latency-efficient behavior. The methodology covers data curation from teacher-generated RAG trajectories, constrained supervised fine-tuning with context masking, and evaluation using faithfulness, robustness, and efficiency <a href="http://metrics.
We” target=”_blank” title=”metrics.
We”>metrics.
We demonstrate end-to-end integration with the Hugging Face platform, leveraging Model Hub for teacher selection, Datasets for curated training data, trl and peft for parameter-efficient training, Spaces for interactive validation, and Leaderboards for community benchmarking. Comparative analysis shows distillation achieves 85–95% of teacher performance at 20% inference cost, enabling deployable, privacy-compliant RAG <a href="http://systems.
The” target=”_blank” title=”systems.
The”>systems.
The talk concludes with mitigation strategies for common pitfalls and future directions in agentic and multi-teacher <a href="http://distillation.
Speaker(s):” target=”_blank” title=”distillation.
Speaker(s):”>distillation.
Speaker(s): Zichao Li,
Agenda:
7:00PM – Introduction of IEEE Hamilton Section
7:15PM – Presentation
8:00PM – Q&A
8:15PM – Refreshments
Room: Multipurpose Room 3, Bldg: Trafalgar Park Community Centre, 133 Rebecca St,, Oakville,, Ontario, Canada, L6K 1J5