(Ken Cen出品)Generative AI第28部 LLM 對齊革命 - PPO X DPO 策略優化

關於KL懲罰，Frozen Model，Entropy，Advantage Function ，Value Function，Bradley-Terry 偏好模型

5.00 (1 reviews)

Udemy

platform

中文

language

Data Science

category

instructor

(Ken Cen出品)Generative AI第28部 LLM 對齊革命 - PPO X DPO 策略優化

44

students

5.5 hours

content

Jun 2025

last update

$24.99

regular price

What you will learn

深入瞭解 PPO Policy Gradient & PPO Clipped Objective

深入瞭解 Value Function Loss & Policy Entropy & Total PPO Loss PPO 總損失

深入瞭解什麼是 DPO & 如何解決約束優化問題

學會如何使用 Pytorch實現 SFT 監督微調

學會如何使用 Pytorch實現 DPO 直接偏好優化 Direct Preference Optimization

Loading charts...

6621101

udemy ID

18/05/2025

course created date

28/06/2025

course indexed date

Bot

course submited by

(Ken Cen出品)Generative AI第28部 LLM 對齊革命 - PPO X DPO 策略優化 - | Comidoc