๐Ÿชด ๋””์ง€ํ„ธ ๊ฐ€๋“ 

ํƒœ๊ทธ: RLHF

1๊ฑด์˜ ํ•ญ๋ชฉ

  • 2026๋…„ 3์›” 22์ผ

    Policy Gradient๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ดํ•ดํ•˜๊ธฐ

    • LLM
    • NLP
    • Reinforcement-Learning
    • RLHF

Created with Quartz v4.5.2 ยฉ 2026

  • GitHub
  • Discord Community