This module combines CLIP and MoCo for increasing negative samples. This is useful when there is no available compute such as GPUs with large memory to support large batch sizes or multi-gpu machines to leverage distributed infonce loss implementation.


class ClipTokenizer[source]
ClipTokenizer(context_length=77) ::DisplayedTransform
Tokenizer from https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py
vitb32_config[source]
vitb32_config(input_res,context_length,vocab_size)
ViT-B/32 configuration, uses 32x32 patches
vitl14_config[source]
vitl14_config(input_res,context_length,vocab_size)
ViT-L/14 configuration, uses 14x14 patches
class Bottleneck[source]
Bottleneck(inplanes,planes,stride=1) ::Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
class AttentionPool2d[source]
AttentionPool2d(spacial_dim:int,embed_dim:int,num_heads:int,output_dim:int=None) ::Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
class ModifiedResNet[source]
ModifiedResNet(layers,output_dim,heads,input_resolution=224,width=64) ::Module
A ResNet class that is similar to torchvision's but contains the following changes:
- There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
- Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
- The final pooling layer is a QKV attention instead of an average pool
class LayerNorm[source]
LayerNorm(normalized_shape:Union[int, List[int],Size\], **eps**:float=*1e-05*, **elementwise_affine**:bool=*True*, **device**=*None*, **dtype**=*None*) :: [LayerNorm`](/self_supervised/21 - clip-moco.html#LayerNorm)
Subclass torch's LayerNorm to handle fp16.
class QuickGELU[source]
QuickGELU() ::Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
class ResidualAttentionBlock[source]
ResidualAttentionBlock(d_model:int,n_head:int,attn_mask:Tensor=None) ::Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
class Transformer[source]
Transformer(width:int,layers:int,heads:int,attn_mask:Tensor=None,checkpoint=False,checkpoint_nchunks=2) ::Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
class VisualTransformer[source]
VisualTransformer(input_resolution:int,patch_size:int,width:int,layers:int,heads:int,output_dim:int, **kwargs) ::Module
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
class CLIPMOCO[source]
CLIPMOCO(embed_dim:int,image_resolution:int,vision_layers:Union[Tuple[int, int, int, int],int\], **vision_width**:int, **vision_patch_size**:int, **context_length**:int, **vocab_size**:int, **transformer_width**:int, **transformer_heads**:int, **transformer_layers**:int, **K**=*4096*, **m**=*0.999*, **\*\*kwargs**) ::Module`
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to, etc.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
| Type | Default | Details | |
|---|---|---|---|
embed_dim |
int |
No Content | |
image_resolution |
int |
vision | |
vision_layers |
Tuple[int, int, int, int], int] |
No Content | |
vision_width |
int |
No Content | |
vision_patch_size |
int |
No Content | |
context_length |
int |
text | |
vocab_size |
int |
No Content | |
transformer_width |
int |
No Content | |
transformer_heads |
int |
No Content | |
transformer_layers |
int |
No Content | |
K |
int |
4096 |
No Content |
m |
float |
999 |
No Content |
kwargs |
No Content |
A useful proxy metric for tracking training performance and convergence.
class RetrievalAtK[source]
RetrievalAtK(k=20, **kwargs) ::AccumMetric
Stores predictions and targets on CPU in accumulate to perform final calculations with func.
class CLIPMOCOTrainer[source]
CLIPMOCOTrainer(after_create=None,before_fit=None,before_epoch=None,before_train=None,before_batch=None,after_pred=None,after_loss=None,before_backward=None,before_step=None,after_cancel_step=None,after_step=None,after_cancel_batch=None,after_batch=None,after_cancel_train=None,after_train=None,before_validate=None,after_cancel_validate=None,after_validate=None,after_cancel_epoch=None,after_epoch=None,after_cancel_fit=None,after_fit=None) ::Callback
MoCo Loss for CLIP. Can be used with or without DistributedDataParallel
num2txt = {'3': 'three', '7': 'seven'}
def num_to_txt(o): return num2txt[o]
def dummy_targ(o): return 0 # loss func is not called without it
path = untar_data(URLs.MNIST_TINY)
items = get_image_files(path)
clip_tokenizer = ClipTokenizer()
tds = Datasets(items, [PILImage.create, [parent_label, num_to_txt], dummy_targ], n_inp=2, splits=GrandparentSplitter()(items))
dls = tds.dataloaders(bs=2, after_item=[Resize(224), clip_tokenizer, ToTensor()], after_batch=[IntToFloatTensor()], device='cpu')
vitb32_config_dict = vitb32_config(224, clip_tokenizer.context_length, clip_tokenizer.vocab_size)
clip_model = CLIPMOCO(K=4096,m=0.999, **vitb32_config_dict, checkpoint=False, checkpoint_nchunks=0)
learner = Learner(dls, clip_model, loss_func=noop, cbs=[CLIPMOCOTrainer(), ShortEpochCallback(0.001)],
metrics=[RetrievalAtK(k=5),
RetrievalAtK(k=20),
RetrievalAtK(k="mean"),
RetrievalAtK(k="median")])
learner.summary()
CLIPMOCO (Input shape: 2 x torch.Size([2, 77]))
============================================================================
Layer (type) Output Shape Param # Trainable
============================================================================
2 x 768 x 7 x 7
Conv2d 2359296 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 True
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 True
LayerNorm 1536 True
LayerNorm 1536 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 True
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 77 x 512
Embedding 25296896 True
LayerNorm 1024 True
____________________________________________________________________________
2 x 768 x 7 x 7
Conv2d 2359296 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
____________________________________________________________________________
2 x 1 x 3072
Linear 2362368 False
QuickGELU
____________________________________________________________________________
2 x 1 x 768
Linear 2360064 False
LayerNorm 1536 False
LayerNorm 1536 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
LayerNorm 1024 False
____________________________________________________________________________
2 x 1 x 2048
Linear 1050624 False
QuickGELU
____________________________________________________________________________
2 x 1 x 512
Linear 1049088 False
LayerNorm 1024 False
____________________________________________________________________________
Total params: 193,876,992
Total trainable params: 109,587,456
Total non-trainable params: 84,289,536
Optimizer used: <function Adam at 0x7fbd8d0189e0>
Loss function: <bound method CLIPMOCOTrainer.lf of CLIPMOCOTrainer>
Callbacks:
- TrainEvalCallback
- ShortEpochCallback
- CLIPMOCOTrainer
- Recorder
- ProgressCallback