Algorithm

SimCLR

SimCLR Framework

SimCLR is a simple contrastive learning framework which uses 2 augmented views of the same image and InfoNCE loss for training. Different views of the same image are considered as positive examples whereas all the other images images in a batch are considered as negatives.

SimCLR V2

SimCLR V2 Framework

SimCLR has a follow up paper with few minor changes and improvements. Code difference between SimCLR and SimCLR V2 are minimal and there is good amount of overlap, that is why both versions are implemented here in the same module. Also, SimCLR V2 is more about the added step of knowledge distillation rather than the contrastive learning itself.

One difference in SimCLR V2 is that MLP module has 3 layers instead of 2.
Another difference is using a larger model for the pretraining/self supervised learning task. It is mentioned in the original paper that scaling up the model from ResNet-50 to ResNet-152 (3×+SK) gave 29% relative gain in top-1 accuracy when fine tuning with only 1% labeled data.

Also, a few addition to data augmentation pipeline happenned, such as adding gaussian blur.

Note that self_supervised.augmentations module is highly flexible, supporting all the augmentations from the popular self supervised learning algorithms by default, allowing to pass any custom augmentations and more. It should always be adjusted based on the data and problem at hand for best performance.

Qote from SimCLR V2 paper: In our experiments, we set the width of projection head’s middle layers to that of its input, so it is also adjusted by the width multiplier. However, a wider projection head improves performance even when the base network remains narrow.

SimCLR model consists of an encoder and a projector (MLP) layer. The definition of this module is fairly simple as below.

Instead of directly using SimCLRModel by passing both an encoder and a projector, create_simclr_model function can be used by minimally passing a predefined encoder and the expected input channels.

You can use self_supervised.layers module to create an encoder. It supports all timm and fastai models available out of the box.

We define number of input channels with n_in, projector/mlp's hidden size with hidden_size, projector/mlp's final projection size with projection_size and projector/mlp's number of layers with nlayers.

encoder = create_encoder("tf_efficientnet_b0_ns", n_in=3, pretrained=False, pool_type=PoolingType.CatAvgMax)
model = create_simclr_model(encoder, hidden_size=2048, projection_size=128, nlayers=2)
out = model(torch.randn((2,3,224,224))); out.shape

torch.Size([2, 128])

SimCLR Callback

The following parameters can be passed;

aug_pipelines list of augmentation pipelines List[Pipeline] created using functions from self_supervised.augmentations module. Each Pipeline should be set to split_idx=0. You can simply use get_simclr_aug_pipelines utility to get aug_pipelines.
temp temperature scaling for cross entropy loss (defaults to paper's best value)

SimCLR algorithm uses 2 views of a given image, and SimCLR callback expects a list of 2 augmentation pipelines in aug_pipelines.

You can simply use helper function get_simclr_aug_pipelines() which will allow augmentation related arguments such as size, rotate, jitter...and will return a list of 2 pipelines, which we can be passed to the callback. This function uses get_multi_aug_pipelines which then get_batch_augs. For more information you may refer to self_supervised.augmentations module.

Also, you may choose to pass your own list of aug_pipelines which needs to be List[Pipeline, Pipeline] where Pipeline(..., split_idx=0). Here, split_idx=0 forces augmentations to be applied in training mode.

aug_pipelines = get_simclr_aug_pipelines(size=28, rotate=False, jitter=False, bw=False, blur=False, stats=None, cuda=False)
aug_pipelines

[Pipeline: RandomResizedCrop -> RandomHorizontalFlip,
 Pipeline: RandomResizedCrop -> RandomHorizontalFlip]

Distributed SimCRL Callback

DistributedSimCLR is a distributed implementation of InfoNCE Loss. It effectively increases the number of negative samples to all available samples across all GPUs during loss calculation. For example, if you use batch size of 16 per GPU and 8 GPUs, then the loss will be calculated using a similarity matrix with size of 16x8 x 16x8 = 1024x1024. In literature/experiments it's mentioned that more negatives help training.

Following callback should be used together with DistributedDataParallel and inside a python script which will be executed in launch mode, such as:

python -m fastai.launch script.py --FOLD 4 --size 640 --bs 12 --epochs 10 --lr 1e-3 --arch_name tf_efficientnet_b7_ns

For more details about distributed operations like all gather, see https://pytorch.org/tutorials/intermediate/dist_tuto.html.

Training Tip: You can maximize your effective batchsize by using gradient checkpointing (see self_supervised.layers module), fp16 (Learner.to_fp16() in fastai) and distributed callback. Don't worry if you don't have access to multiple GPUs, usually just using gradient checkpointing and fp16 is enough to increase your batch size to 256 and beyond. Which is enough to train highly competitive models.

Example Usage

path = untar_data(URLs.MNIST_TINY)
items = get_image_files(path)
tds = Datasets(items, [PILImageBW.create, [parent_label, Categorize()]], splits=GrandparentSplitter()(items))
dls = tds.dataloaders(bs=5, after_item=[ToTensor(), IntToFloatTensor()], device='cpu')

fastai_encoder = create_encoder('xresnet18', n_in=1, pretrained=False)
model = create_simclr_model(fastai_encoder, hidden_size=2048, projection_size=128)
aug_pipelines = get_simclr_aug_pipelines(size=28, rotate=False, jitter=False, bw=False, blur=False, stats=None, cuda=False)
learn = Learner(dls, model, cbs=[SimCLR(aug_pipelines, temp=0.07, print_augs=True),ShortEpochCallback(0.001)])

Pipeline: RandomResizedCrop -> RandomHorizontalFlip
Pipeline: RandomResizedCrop -> RandomHorizontalFlip

Also, with show_one() method you can inspect data augmentations as a sanity check. You can use existing augmentation functions from augmentations module.

b = dls.one_batch()
learn._split(b)
learn('before_batch')
axes = learn.sim_clr.show(n=5)

learn.fit(1)

learn.recorder.losses

[tensor(2.5995)]

SimCLR

Algorithm

SimCLR

SimCLR V2

`class` `SimCLRModel`[source]

`create_simclr_model`[source]

SimCLR Callback

`get_simclr_aug_pipelines`[source]

`class` `SimCLR`[source]

Distributed SimCRL Callback

`class` `DistributedSimCLR`[source]

Example Usage

SimCLR

Algorithm

SimCLR

SimCLR V2

class SimCLRModel[source]

create_simclr_model[source]

SimCLR Callback

get_simclr_aug_pipelines[source]

class SimCLR[source]

Distributed SimCRL Callback

class DistributedSimCLR[source]

Example Usage

`class` `SimCLRModel`[source]

`create_simclr_model`[source]

`get_simclr_aug_pipelines`[source]

`class` `SimCLR`[source]

`class` `DistributedSimCLR`[source]