Algorithm

MoCo

Absract (MoCo V2): Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. With simple modifications to MoCo— namely, using an MLP projection head and more data augmentation—we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

class MoCoModel[source]

MoCoModel(encoder, projector) :: Module

MoCo model

You can either use MoCoModel module to create a model by passing predefined encoder and projector models or you can use create_moco_model with just passing predefined encoder and expected input channels. In new MoCo paper, model consists of an encoder and a mlp projector following the SimCLR-v2 improvements.

You may refer to: official implementation

create_moco_model[source]

create_moco_model(encoder, hidden_size=256, projection_size=128, bn=False, nlayers=2)

Create MoCo model

encoder = create_encoder("tf_efficientnet_b0_ns", n_in=3, pretrained=False, pool_type=PoolingType.CatAvgMax)
model = create_moco_model(encoder, hidden_size=2048, projection_size=128)
out = model(torch.randn((2,3,224,224))); out.shape
torch.Size([2, 128])

MoCo Callback

The following parameters can be passed;

  • aug_pipelines list of augmentation pipelines List[Pipeline] created using functions from self_supervised.augmentations module. Each Pipeline should be set to split_idx=0. You can simply use get_moco_aug_pipelines utility to get aug_pipelines.
  • K is queue size. For simplicity K needs to be a multiple of batch size and it needs to be less than total training data. You can try out different values e.g. bs*2^k by varying k where bs i batch size.
  • m is momentum for key encoder update. 0.999 is a good default according to the paper.
  • temp temperature scaling for cross entropy loss similar to SimCLR

You may refer to official implementation

Our implementation doesn't uses shuffle BN and instead it uses current batch for both positives and negatives during loss calculation. This should handle the "signature" issue coming from batchnorm which is argued to be allowing model to cheat for same batch positives. This modification not only creates simplicity but also allows training with a single GPU. Official Shuffle BN implementation depends on DDP (DistributedDataParallel) and only supports multiple GPU environments. Unfortunately, not everyone has access to multiple GPUs and we hope with this modification MoCo will be more accessible now.

For more details about our proposed custom implementation you may refer to this Github issue.

MoCo algorithm uses 2 views of a given image, and MOCO callback expects a list of 2 augmentation pipelines in aug_pipelines.

You can simply use helper function get_moco_aug_pipelines() which will allow augmentation related arguments such as size, rotate, jitter...and will return a list of 2 pipelines, which we can be passed to the callback. This function uses get_multi_aug_pipelines which then get_batch_augs. For more information you may refer to self_supervised.augmentations module.

Also, you may choose to pass your own list of aug_pipelines which needs to be List[Pipeline, Pipeline] where Pipeline(..., split_idx=0). Here, split_idx=0 forces augmentations to be applied in training mode.

get_moco_aug_pipelines[source]

get_moco_aug_pipelines(size, rotate=True, jitter=True, bw=True, blur=True, resize_scale=(0.2, 1.0), resize_ratio=(0.75, 1.3333333333333333), rotate_deg=30, jitter_s=0.6, blur_s=(4, 32), same_on_batch=False, flip_p=0.5, rotate_p=0.3, jitter_p=0.3, bw_p=0.3, blur_p=0.3, stats=([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), cuda=True, xtra_tfms=[])

class MOCO[source]

MOCO(aug_pipelines, K, m=0.999, temp=0.07, print_augs=False) :: Callback

Basic class handling tweaks of the training loop by changing a Learner in various events

Example Usage

path = untar_data(URLs.MNIST_TINY)
items = get_image_files(path)
tds = Datasets(items, [PILImageBW.create, [parent_label, Categorize()]], splits=GrandparentSplitter()(items))
dls = tds.dataloaders(bs=8, after_item=[ToTensor(), IntToFloatTensor()], device='cpu')
fastai_encoder = create_encoder('xresnet18', n_in=1, pretrained=False)
model = create_moco_model(fastai_encoder, hidden_size=1024, projection_size=128, bn=True)
aug_pipelines = get_moco_aug_pipelines(size=28, rotate=False, jitter=False, bw=False, blur=False, stats=None, cuda=False)
learn = Learner(dls, model, cbs=[MOCO(aug_pipelines=aug_pipelines, K=128, print_augs=True), ShortEpochCallback(0.001)])
Pipeline: RandomResizedCrop -> RandomHorizontalFlip
Pipeline: RandomResizedCrop -> RandomHorizontalFlip
learn.summary()
MoCoModel (Input shape: 8)
============================================================================
Layer (type)         Output Shape         Param #    Trainable 
============================================================================
                     8 x 32 x 14 x 14    
Conv2d                                    288        True      
BatchNorm2d                               64         True      
ReLU                                                           
Conv2d                                    9216       True      
BatchNorm2d                               64         True      
ReLU                                                           
____________________________________________________________________________
                     8 x 64 x 14 x 14    
Conv2d                                    18432      True      
BatchNorm2d                               128        True      
ReLU                                                           
MaxPool2d                                                      
Conv2d                                    36864      True      
BatchNorm2d                               128        True      
ReLU                                                           
Conv2d                                    36864      True      
BatchNorm2d                               128        True      
Sequential                                                     
ReLU                                                           
Conv2d                                    36864      True      
BatchNorm2d                               128        True      
ReLU                                                           
Conv2d                                    36864      True      
BatchNorm2d                               128        True      
Sequential                                                     
ReLU                                                           
____________________________________________________________________________
                     8 x 128 x 4 x 4     
Conv2d                                    73728      True      
BatchNorm2d                               256        True      
ReLU                                                           
Conv2d                                    147456     True      
BatchNorm2d                               256        True      
____________________________________________________________________________
                     []                  
AvgPool2d                                                      
____________________________________________________________________________
                     8 x 128 x 4 x 4     
Conv2d                                    8192       True      
BatchNorm2d                               256        True      
ReLU                                                           
Conv2d                                    147456     True      
BatchNorm2d                               256        True      
ReLU                                                           
Conv2d                                    147456     True      
BatchNorm2d                               256        True      
Sequential                                                     
ReLU                                                           
____________________________________________________________________________
                     8 x 256 x 2 x 2     
Conv2d                                    294912     True      
BatchNorm2d                               512        True      
ReLU                                                           
Conv2d                                    589824     True      
BatchNorm2d                               512        True      
____________________________________________________________________________
                     []                  
AvgPool2d                                                      
____________________________________________________________________________
                     8 x 256 x 2 x 2     
Conv2d                                    32768      True      
BatchNorm2d                               512        True      
ReLU                                                           
Conv2d                                    589824     True      
BatchNorm2d                               512        True      
ReLU                                                           
Conv2d                                    589824     True      
BatchNorm2d                               512        True      
Sequential                                                     
ReLU                                                           
____________________________________________________________________________
                     8 x 512 x 1 x 1     
Conv2d                                    1179648    True      
BatchNorm2d                               1024       True      
ReLU                                                           
Conv2d                                    2359296    True      
BatchNorm2d                               1024       True      
____________________________________________________________________________
                     []                  
AvgPool2d                                                      
____________________________________________________________________________
                     8 x 512 x 1 x 1     
Conv2d                                    131072     True      
BatchNorm2d                               1024       True      
ReLU                                                           
Conv2d                                    2359296    True      
BatchNorm2d                               1024       True      
ReLU                                                           
Conv2d                                    2359296    True      
BatchNorm2d                               1024       True      
Sequential                                                     
ReLU                                                           
AdaptiveAvgPool2d                                              
AdaptiveMaxPool2d                                              
Flatten                                                        
Linear                                    1049600    True      
BatchNorm1d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     8 x 128             
Linear                                    131200     True      
____________________________________________________________________________

Total params: 12,378,016
Total trainable params: 12,378,016
Total non-trainable params: 0

Optimizer used: <function Adam at 0x7f8976115a70>
Loss function: <bound method MOCO.lf of MOCO>

Callbacks:
  - TrainEvalCallback
  - ShortEpochCallback
  - MOCO
  - Recorder
  - ProgressCallback
b = dls.one_batch()
learn._split(b)
learn.pred = learn.model(*learn.xb)
axes = learn.moco.show(n=5)
learn.fit(1)
/Users/turgutlu/anaconda3/envs/fastai/lib/python3.7/site-packages/ipykernel_launcher.py:24: UserWarning: Key encoder and queue are already defined, keeping them.
epoch train_loss valid_loss time
0 00:07
learn.recorder.losses
[tensor(1.2937)]