This module combines CLIP and MoCo for increasing negative samples. This is useful when there is no available compute such as GPUs with large memory to support large batch sizes or multi-gpu machines to leverage distributed infonce loss implementation.






class ClipTokenizer[source]

ClipTokenizer(context_length=77) :: DisplayedTransform

Tokenizer from



vitb32_config(input_res, context_length, vocab_size)

ViT-B/32 configuration, uses 32x32 patches


vitl14_config(input_res, context_length, vocab_size)

ViT-L/14 configuration, uses 14x14 patches

class Bottleneck[source]

Bottleneck(inplanes, planes, stride=1) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class AttentionPool2d[source]

AttentionPool2d(spacial_dim:int, embed_dim:int, num_heads:int, output_dim:int=None) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class ModifiedResNet[source]

ModifiedResNet(layers, output_dim, heads, input_resolution=224, width=64) :: Module

A ResNet class that is similar to torchvision's but contains the following changes:

  • There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
  • Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
  • The final pooling layer is a QKV attention instead of an average pool

class LayerNorm[source]

LayerNorm(normalized_shape:Union[int, List[int],Size\], **eps**:float=*1e-05*, **elementwise_affine**:bool=*True*, **device**=*None*, **dtype**=*None*) :: [LayerNorm`](/self_supervised/21 - clip-moco.html#LayerNorm)

Subclass torch's LayerNorm to handle fp16.

class QuickGELU[source]

QuickGELU() :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class ResidualAttentionBlock[source]

ResidualAttentionBlock(d_model:int, n_head:int, attn_mask:Tensor=None) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class Transformer[source]

Transformer(width:int, layers:int, heads:int, attn_mask:Tensor=None, checkpoint=False, checkpoint_nchunks=2) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class VisualTransformer[source]

VisualTransformer(input_resolution:int, patch_size:int, width:int, layers:int, heads:int, output_dim:int, **kwargs) :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

class CLIPMOCO[source]

CLIPMOCO(embed_dim:int, image_resolution:int, vision_layers:Union[Tuple[int, int, int, int],int\], **vision_width**:int, **vision_patch_size**:int, **context_length**:int, **vocab_size**:int, **transformer_width**:int, **transformer_heads**:int, **transformer_layers**:int, **K**=*4096*, **m**=*0.999*, **\*\*kwargs**) ::Module`

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

Type Default Details
embed_dim int No Content
image_resolution int vision
vision_layers Tuple[int, int, int, int], int] No Content
vision_width int No Content
vision_patch_size int No Content
context_length int text
vocab_size int No Content
transformer_width int No Content
transformer_heads int No Content
transformer_layers int No Content
K int 4096 No Content
m float 999 No Content
kwargs No Content


A useful proxy metric for tracking training performance and convergence.

class RetrievalAtK[source]

RetrievalAtK(k=20, **kwargs) :: AccumMetric

Stores predictions and targets on CPU in accumulate to perform final calculations with func.

CLIP-MoCo Callback

class CLIPMOCOTrainer[source]

CLIPMOCOTrainer(after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None) :: Callback

MoCo Loss for CLIP. Can be used with or without DistributedDataParallel

Example Usage

num2txt = {'3': 'three', '7': 'seven'}
def num_to_txt(o): return num2txt[o]
def dummy_targ(o): return 0 # loss func is not called without it
path = untar_data(URLs.MNIST_TINY)
items = get_image_files(path)
clip_tokenizer = ClipTokenizer()
tds = Datasets(items, [PILImage.create, [parent_label, num_to_txt], dummy_targ], n_inp=2, splits=GrandparentSplitter()(items))
dls = tds.dataloaders(bs=2, after_item=[Resize(224), clip_tokenizer, ToTensor()], after_batch=[IntToFloatTensor()], device='cpu')
vitb32_config_dict = vitb32_config(224, clip_tokenizer.context_length, clip_tokenizer.vocab_size)
clip_model = CLIPMOCO(K=4096,m=0.999, **vitb32_config_dict, checkpoint=False, checkpoint_nchunks=0)
learner = Learner(dls, clip_model, loss_func=noop, cbs=[CLIPMOCOTrainer(), ShortEpochCallback(0.001)],
CLIPMOCO (Input shape: 2 x torch.Size([2, 77]))
Layer (type)         Output Shape         Param #    Trainable 
                     2 x 768 x 7 x 7     
Conv2d                                    2359296    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
                     2 x 1 x 3072        
Linear                                    2362368    True      
                     2 x 1 x 768         
Linear                                    2360064    True      
LayerNorm                                 1536       True      
LayerNorm                                 1536       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
LayerNorm                                 1024       True      
                     2 x 1 x 2048        
Linear                                    1050624    True      
                     2 x 1 x 512         
Linear                                    1049088    True      
LayerNorm                                 1024       True      
                     2 x 77 x 512        
Embedding                                 25296896   True      
LayerNorm                                 1024       True      
                     2 x 768 x 7 x 7     
Conv2d                                    2359296    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
                     2 x 1 x 3072        
Linear                                    2362368    False     
                     2 x 1 x 768         
Linear                                    2360064    False     
LayerNorm                                 1536       False     
LayerNorm                                 1536       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     
LayerNorm                                 1024       False     
                     2 x 1 x 2048        
Linear                                    1050624    False     
                     2 x 1 x 512         
Linear                                    1049088    False     
LayerNorm                                 1024       False     

Total params: 193,876,992
Total trainable params: 109,587,456
Total non-trainable params: 84,289,536

Optimizer used: <function Adam at 0x7fbd8d0189e0>
Loss function: <bound method CLIPMOCOTrainer.lf of CLIPMOCOTrainer>

  - TrainEvalCallback
  - ShortEpochCallback
  - CLIPMOCOTrainer
  - Recorder
  - ProgressCallback