Stable Diffusion 이론

Notice

Recent Posts

Tags more

Archives

관리 메뉴

토니의 연습장

비전 AI (VISION)/Stable Diffusion

bellmake 2024. 8. 23. 15:59

1. Pixel Space

: VAE encoding/decoding (512x512 <-> 64x64)

2. Diffusion Model

: U-Net

3. Conditioning

: Text guide

- CLIP을 활용해 Text로부터 embedding vector를 뽑아내 Tau vector로 활용하여 model에 attention을 반복적으로 줍니다.

- CLIP의 CLIPTextModel, CLIPTokenizer가 활용됩니다.

- CLIPVisionModel은 SD를 통해 생성된 이미지 중 폭력성/음란성 등을 검열하기 위한 과정에 사용됩니다.

4. Image Generation

: Diffusion Process는 생략되고, Conditioning의 Random한 Latent Vector로부터 Text로부터 CLIP을 통해 나온 Tau vector attention 반복 피드백을 통해 VAE의 Decoder를 통해 Image가 생성됩니다.

[ 참고 ] Stable Diffusion XL

[ 참고 ]

'비전 AI (VISION)/Stable Diffusion' Related Articles