AdaCLIP Code Review

Static Prompts & Dynamic Prompts 정의

if self.enabled: # only when enabled, the parameters should be constructed
            if 'S' in prompting_type: # static prompts
                # learnable
                self.static_prompts = nn.ParameterList(
                    [nn.Parameter(torch.empty(self.length, self.channel))
                     for _ in range(self.depth)])

                for single_para in self.static_prompts:
                    nn.init.normal_(single_para, std=0.02)

            if 'D' in prompting_type: # dynamic prompts
                self.dynamic_prompts = [0.] # place holder

    def set_dynamic_prompts(self, dynamic_prompts):
        self.dynamic_prompts = dynamic_prompts

prompting_type이 ‘SD’가 default인데, 이 경우 두 if문이 모두 실행
torch.empty() : 주어진 크기의 텐서를 생성
- 이 코드에서는 (self.length $\times$ self.channel) 크기의 텐서 생성
- 메모리를 할당하지만 초기화하지 않기 때문에 텐서의 초기 값은 지정되지 않음
  - 메모리 공간에 이미 있는 임의의 값으로 구성
nn.Parameter() : 텐서를 wrapping하여 PyTorch 모델의 학습 가능한 매개변수로 등록하는 클래스
- torch.empty()로 생성된 텐서가 모델의 매개변수로써 학습 중에 자동으로 업데이트됨
for _ in range(self.depth) : list comprehension
- 텐서의 개수를 depth만큼 설정
- (self.length $\times$ self.channel) 크기의 텐서가 depth만큼 생성
nn.ParameterList() : PyTorch의 Module 클래스 내에서 매개변수를 리스트 형태로 관리
- list comprehension으로 생성된 여러 개의 nn.Parameter 객체를 nn.ParameterList로 묶어 신경망 모델이 이 리스트를 학습할 수 있도록 함
nn.init.normal_(single_para, std=0.02) : 정규분포를 통해 신경망의 매개변수를 초기화
self.dynamic_prompts = [0.] : dynamic prompt를 실수 0.0 하나가 들어있는 리스트로 초기화
- 값 자체의 의미보단 리스트 자료형으로 초기화하는 것이 중요
- 파이썬에서 리스트는 가변적(mutable)이기 때문에, 리스트를 초기화해 두면 나중에 쉽게 원소를 추가하거나 변경 가능

Loss 계산

def train_one_batch(self, items):
    image = items['img'].to(self.device)
    cls_name = items['cls_name']

    # pixel level
    anomaly_map, anomaly_score = self.clip_model(image, cls_name, aggregation=False)

    if not isinstance(anomaly_map, list):
        anomaly_map = [anomaly_map]

    # losses
    gt = items['img_mask'].to(self.device)
    gt = gt.squeeze()

    gt[gt > 0.5] = 1
    gt[gt <= 0.5] = 0

    is_anomaly = items['anomaly'].to(self.device)
    is_anomaly[is_anomaly > 0.5] = 1
    is_anomaly[is_anomaly <= 0.5] = 0
    loss = 0

    # classification loss
    classification_loss = self.loss_focal(anomaly_score, is_anomaly.unsqueeze(1))
    loss += classification_loss

    # seg loss
    seg_loss = 0
    for am, in zip(anomaly_map):
        seg_loss += (self.loss_focal(am, gt) + self.loss_dice(am[:, 1, :, :], gt) +
                     self.loss_dice(am[:, 0, :, :], 1-gt))

    loss += seg_loss

    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()

    return loss

Anomaly Map → Focal + Dice
- input : anomaly_map(모델이 예측한 각 픽셀의 비정상 확률), gt(실제 마스크 레이블)
- am[:, 1, :, :] : 모델이 예측한 비정상 픽셀의 확률 맵
- Dice Loss는 예측된 픽셀과 실제 픽셀 간의 겹치는 부분을 측정하여 계산
Anomaly Score → Focal
- input : anomaly_score(모델이 예상한 이상치 점수), is_anomaly(실제 anomaly 여부)
- is_anomaly.unsqueeze(1) : is_anomaly의 차원을 하나 늘려줌
  - 손실 함수 계산 시 사용되는 모델 예측값과 차원을 맞춰주기 위함
  - 추가된 차원에는 is_anomaly의 값이 그대로 들어감
    - ex> is_anomaly = tensor([0, 1, 0, 1, 1]) → tensor([[0], [1], [0], [1], [1]])
이 두 가지 손실을 합산하여 총 segmentation loss를 구하고, 이를 최종 손실에 더해 모델을 학습
- 최종 손실은 classification loss + segmentation loss
Focal Loss 공부

[ DL ] Focal Loss(Focal Loss for Dense Object Detection)