Jaisidh Singh

Hypernetworks can generate INRs & model functional similarity

TLDR

hypernetworks can be interpreted to


I've worked on hypernetworks (neural networks that parameterize other neural networks) for a little while, that has led to a workshop paper at ICLR 2025 and a main conference paper at EMNLP 2025. While working on these papers, I've had the time to think about hypernetworks in a couple of fascinating ways that I've described below.

Hypernetworks as generators of INRs

Implicit neural representations (INRs) are a clever and rather under-appreciated class of representations. Contrary to the popular approach of predicting representations, INRs are representations themselves: given f(x)=y, if we train a neural network gθ to predict y from x, the parameters of this network θ implicitly represent the function f(·).

Hence, when learning several functions f1,,fn, we can decide to predict weights (or INRs) for fi, using a generating function H(·) as H(i)=θi such that gθi(x)=fi(x). From the deep learning perspective, this generating function H(·) is called a hypernetwork.

This ability to conditionally generate INRs is what makes hypernetworks strongly applicable in physics-informed machine learning. In particular, we can see how the process is just another formuation of partial differential equations (PDEs), which denote time-varying functions of space νt(x) in differential form.

Hypernetworks can very well predict spatial functions given a timestep: H(t)=θt such that gθt(x)=νt(x) as they can conditionally generate an INR corresponding to the function at tilmestep t, and can be used to forecast PDEs. Indeed they've been used to do so in a very interesting NeurIPs paper that I've presented in a tutorial here.

Hypernetworks as quantifiers of functional similarity

Let's say that we wish to know the functional similarity between two encoders fA,f_B: \mathbb{R}^{m} \to \mathbb{R}^{d}.ThemoststraightforwardwaytodothiswouldbetocollectastackofencodingsforN$ inputs for each encoder and use a similarity function like Centered-Kernel-Alignment (CKA) on these features: s1=CKA(OA,OB) ;OA,OBN×d where s1[0,1] is a score denoting the functional similarity between the two models, and OA,OB are the stacks of encoders for N inputs.

However, let's think of another way to quantify the functional similarity between fA and fB, specifically, by using a hypernetwork H that predicts linear classifiers WA,WBd×k on top of the two encoders fA,fB. Here, k is the number of classes.

Writing, the composite function gj()=Wjfj(), j{A,B} to classify the inputs, we outline the scheme that let's the hypernetwork depict functional similarity between fA and fB as follows:

In other words, if training the hypernetwork only using network A lowers the loss of network B as well, then network A and B can be called functionally similar.

This is because the hypernetwork is able to parmaterise them well from a common state achieved using only network A.


The above interpretations of hypernetworks are avenues that I think are rather under-utilised in the current literature, and can offer creative and potentially powerful ways of modulating neural networks. I'd be happy to know what you think of the above, or about parameter prediction in general. Hit my up for a chat on X/twitter if you like.