RLHF & DPO Explained (In Simple Terms!)

Length 19:38 • 2.7K Views • 5 months ago

Entry Point AI 📃 My History

LikeShare

Video Terkait

Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use

Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use

LoRA & QLoRA Fine-tuning Explained In-Depth

LoRA & QLoRA Fine-tuning Explained In-Depth

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

November 4, 2024

November 4, 2024

How to write a scientific blogpost - video 1 of 3 - the preparation stage

How to write a scientific blogpost - video 1 of 3 - the preparation stage

Is inequality the problem? (Professor Lane Kenworthy)

Is inequality the problem? (Professor Lane Kenworthy)

BUS 104-01 11/21/24

BUS 104-01 11/21/24

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Large Language Models (LLMs) Explained

Large Language Models (LLMs) Explained

Reinforcement Learning from Human Feedback (RLHF) Explained

Reinforcement Learning from Human Feedback (RLHF) Explained

Quantitative Autumn 2024 - Jack Buckner, Oregon State University

Quantitative Autumn 2024 - Jack Buckner, Oregon State University

Aligning LLMs with Direct Preference Optimization

Aligning LLMs with Direct Preference Optimization

Streamed 9 months ago

COMM 223 Final Exam Review Pt 4

COMM 223 Final Exam Review Pt 4

20241122-Nobel

Fine-tuning Datasets with Synthetic Inputs

Fine-tuning Datasets with Synthetic Inputs

15min History of Reinforcement Learning and Human Feedback

15min History of Reinforcement Learning and Human Feedback

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Proximal Policy Optimization Explained

Proximal Policy Optimization Explained

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback