목록2025/11 (6)
토니의 연습장
Pretraining -> SFT (Supervised Fine-Tuning) -> RL (Reinforcement Learning) RL 을 하다보면, 기존에 없던 능력을 unlock 해내는 'aha moment' 가 생기는 것을 발견하게 됩니다. Pretraining -> SFT (Supervised Fine-Tuning) -> RL (Reinforcement Learning) (distillation) 이제 강화 학습(RL)을 통해 추론 능력..
https://chatgpt.com/share/69186036-e3d4-8009-b4f6-100009b1e463 ChatGPT - 번역 및 수식 설명Shared via ChatGPTchatgpt.com 참고 : https://youtu.be/qpHgHcWxB5I
https://huggingface.co/papers/2403.13372 Paper page - LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Modelsgoated. Please add MeZO variants.huggingface.co https://ponder.ing/ko/flow/3d9cfe25-6dcb-477c-af2c-ed6027864890
참고 : https://www.nvidia.com/ko-kr/on-demand/session/gtc25-s72431/
https://www.acmicpc.net/problem/14502from collections import dequefrom itertools import combinationsN, M = map(int, input().split())B = [[] for _ in range(N)]for i in range(N): B[i] = list(map(int, input().split())) cells = [(i,j) for i in range(N) for j in range(M) if B[i][j]==0]max_safe = 0for combination in combinations(cells, 3): for row, col in combination: B[row][col] = 1 ..
