WeeNet Manifesto
Models of the world, unite; you have nothing to lose but your priors!
This post serves as a manifesto for the term “WeeNet,” which I’ve been using for the past few years. Conceptually it’s nothing new (either in machine learning or general software development), but I believe that the term is a useful one to describe a particular way of thinking about machine learning (ML) systems.
What is a WeeNet?
A WeeNet is a simple DNN model, which shows the minimum set of behaviour that you are trying to explore in your ML system. It could be a single layer or many. The point is, when developing and debugging, it’s often easier to work with a small, understandable model than a large, complex one.
Below is an example of a one layer WeeNet in PyTorch:
import torch.nn as nn
class WeeNet(nn.Module):
def __init__(self, in_c, num_filters, kdim, stride=1, padding=0, groups=1):
super(WeeNet, self).__init__()
self.layer1 = nn.Conv2d(
in_c,
num_filters,
kernel_size=kdim,
stride=stride,
padding=padding,
groups=groups,
bias=False,
)
def forward(self, x):
out = self.layer1(x)
return out
Why use a WeeNet?
As you can see above, we have a single convolutional layer, with no activation function.
However, it has parameters that you can tune (in_c
, num_filters
, kdim
, etc), which might help you test how your system handles a single kernel, perhaps varying the stride, padding, or number of groups.
In comparison, a full model might show many errors that are just variations of the same bug. Have fun wading through those logs!
I’ve used this technique a lot when working on optimised kernels, and DNN compiler flows (for example, DNN64, a compiler for the Nintendo 64). You’d be surprised how many times it’s an issue of off-by-one on a for loop!
WeeNets are fast to run, load, and export, which helps to keep your development loop tight. They are easy to extend, so you can add more complexity as you need it. And finally, they are simple to understand, so you can more easily reason about what is going wrong.
Ultimately the only thing that matters is how well you run real models, but in my experience, WeeNets can help you get there faster.
Everything should be made as simple as possible, but not simpler. – Albert Einstein (maybe?)
How to use WeeNets
The idea is to use a WeeNet as a test case for your system, especially during development. A WeeNet is not a specific architecture, but rather an approach for building and testing your system.
When building a DNN compiler, for example, you might start with a one-layer model, then a two-layer model, and so on.
Depending on the system you’re building, getting a single layer to work is by itself no mean feat, and can often expose many bugs and spurious assumptions.
Next, adding a second layer can expose more, such as how data is passed between layers, or how you handle different shapes of tensors. As more complex structures start to come in (like skip connections, handling different data types, limited memory, etc.), you can start to see how your system will handle these cases.
Once you’re satisfied that a decent chunk of your system is working, you can then move on to running real models (i.e., known workloads, such as ResNet, BERT, etc.). If one of these models fails, you can try making some new WeeNets to isolate the problem. E.g., does a WeeNet of the core block of the model work?
Aren’t these essentially just unit and integration tests?
Yes.