Hypernetworks can generate INRs & model functional similarity
TLDR
hypernetworks can be interpreted to
- generate of implicit neural representations, and
- quantify of functional similarity between models
I've worked on hypernetworks (neural networks that parameterize other neural networks) for a little while, that has led to a workshop paper at ICLR 2025 and a main conference paper at EMNLP 2025. While working on these papers, I've had the time to think about hypernetworks in a couple of fascinating ways that I've described below.
Hypernetworks as generators of INRs
Implicit neural representations (INRs) are a clever and rather under-appreciated class of representations. Contrary to the popular approach of predicting representations, INRs are representations themselves: given , if we train a neural network to predict from , the parameters of this network implicitly represent the function .
Hence, when learning several functions , we can decide to predict weights (or INRs) for , using a generating function as such that . From the deep learning perspective, this generating function is called a hypernetwork.
This ability to conditionally generate INRs is what makes hypernetworks strongly applicable in physics-informed machine learning. In particular, we can see how the process is just another formuation of partial differential equations (PDEs), which denote time-varying functions of space in differential form.
Hypernetworks can very well predict spatial functions given a timestep: such that as they can conditionally generate an INR corresponding to the function at tilmestep , and can be used to forecast PDEs. Indeed they've been used to do so in a very interesting NeurIPs paper that I've presented in a tutorial here.
Hypernetworks as quantifiers of functional similarity
Let's say that we wish to know the functional similarity between two encoders f_B: \mathbb{R}^{m} \to \mathbb{R}^{d}N$ inputs for each encoder and use a similarity function like Centered-Kernel-Alignment (CKA) on these features: where is a score denoting the functional similarity between the two models, and are the stacks of encoders for inputs.
However, let's think of another way to quantify the functional similarity between and , specifically, by using a hypernetwork that predicts linear classifiers on top of the two encoders . Here, is the number of classes.
Writing, the composite function to classify the inputs, we outline the scheme that let's the hypernetwork depict functional similarity between and as follows:
- Given a dataset and the current state of the hypernetwork's parameters , predict
- Obtain the classification loss of and on as and .
- Train the hypernetwork's parameters only on encoder as
- Predict and obtain the classification loss of and as and .
- Then, the magnitude of depicts the functional similarity between encoders and .
In other words, if training the hypernetwork only using network A lowers the loss of network B as well, then network A and B can be called functionally similar.
This is because the hypernetwork is able to parmaterise them well from a common state achieved using only network A.
The above interpretations of hypernetworks are avenues that I think are rather under-utilised in the current literature, and can offer creative and potentially powerful ways of modulating neural networks. I'd be happy to know what you think of the above, or about parameter prediction in general. Hit my up for a chat on X/twitter if you like.