[Pytorch] Softmax with CrossEntropyLoss 역전파

728x90

Softmax & Sigmoid with CrossEntropyLoss (CE Loss) forward and backward (calculate gradient).

아래 사진에 자세한 풀이과정을 적어놓음.

softmax-with-CELoss의 입력(x)에 대한 역전파(미분값)는 softmax(x) - target(label).
feaures of taget=1 and taget=0에 대해서는 부호를 반대로 역전파하여 모델이 cateogry classification ability를 학습한다.
target이 one-hot label (single classification)이라는 전제하에, Softmax를 Sigmoid로 대체해도 미분값이 같기 때문에 동일한 결과를 얻을 수 있다.
multi-class classification 문제에서는 주로 sigmoid를 사용한다. 이유는 softmax를 수행시 target들의 값의 합이 1이 아닐 가능성이 크기 때문에, 미분값의 계수 조절의 변동이 심하므로 학습이 불안정하다.
KLdivergence Loss (두 feature의 분포를 비슷하게 만드는 loss)는 CE Loss와 동일한 수식의 미분값을 전달한다.
다른 점은 Target의 분포가 CE loss는 one-hot label이고, KLD loss는 그렇지 않다(값이 분포됨). 그렇기에 정답이 아닌 클래스에 해당하는 feature들에게도 target과의 차이를 구해서(미분값) 전달하여 분포가 비슷해지는 효과를 볼 수 있다. (Knowledge Distillation에서 주로 사용됨)

[Pytorch] Cosin Similarity Loss +역전파 (4)	2023.07.30
[Pytorch] Negative Pair Loss 코드 (4)	2023.07.21
[Pytorch] L1-loss, L2-loss, KLD-loss 역전파 (13)	2023.07.20
[Pytorch] torch kld-loss (KD-loss) 코드 (7)	2023.07.14

관련글