Orthogonal Convolution

By LI Haoyang 2020.10.19

Content

Orthogonal ConvolutionContentOCNNKernel Orthogonality VS Orthogonal CNNOrthogonal ConvolutionConvolution as a matrix-vector multiplicationConvolutional orthogonalityPerformanceRobustness under Adversarial AttackInspirations

OCNN

Paper: https://arxiv.org/abs/1911.12207v1

Code: https://github.com/samaonline/Orthogonal-Convolutional-Neural-Networks

Jiayun Wang Yubei Chen Rudrasis Chakraborty Stella X. Yu. Orthogonal Convolutional Neural Networks. CVPR 2020

Kernel Orthogonality VS Orthogonal CNN

$Y=Conv(K,X)$ , the differences are:

Kernel orthogonality
$K$ im2col $\tilde{X}$ $Y=K\tilde{X}$ .
$K$ $||KK^T-I||$ .
Orthogonal convolution
$X$ $Y$ $\cal{K}$ $K$ $Y=\mathcal{K}X$ $\mathcal{K}$ directly.

Orthogonal Convolution

Convolution as a matrix-vector multiplication

$X\in \Bbb{R}^{C\times H\times W}$ $K\in \Bbb{R}^{M\times C\times k\times k}$ $M$ $\{K_i\in \Bbb{R}^{C\times k\times k}\}$ $Y=Conv(K,X)$ $Y\in \Bbb{R}^{M\times H^{'}\times W^{'}}$ . Since convolution is linear, it can be rewritten in a matrix-vector form:

Y=Conv(K,X)\iff y=\mathcal{K}x

$y$ $x$ $Y$ $X$ .

$\cal{K}$ $K_i$ $\mathcal{K}\in\Bbb{R}^{(MH^{'}W^{'})\times (CHW)}$ $K\in \Bbb{R}^{M\times C\times k\times k}$ .

Convolutional orthogonality

$\cal{K}$ to be uniform.

$MH^{'}W^{'}\le CHW$ $MH^{'}W^{'}\ge CHW$ $\cal{K}$ is a normalized frame and preserves the norm.

Row Orthogonality
$\cal{K}$ $K_i$ $\mathcal{K}_{ih^{'}w^{'},\cdot \in\Bbb{R}^{CHW}}$ .
The row orthogonality condition is:
$\left<\mathcal{K}_{ih^{'}_1w_1^{'},\cdot},\mathcal{K}_{jh_2^{'}w_2^{'},\cdot}\right>=\begin{cases} 1,(i,h_1^{'},w_1^{'})=(j,h_2^{'},w_2^{'})\\ 0, otherwise \end{cases}$
$|h_1-h_2|\ge k$ $|w_1-w_2|\ge k$ .
$i,j,h_2,w_2$ $h_1,w_1$ $P=\left[\frac{k-1}{S}\right]\cdot S$ $S$ denotes the stride.
Then the condition is equivalent to the following self-convolution:
$Conv(K,K,padding=P,stride=S)=I_{r0}$
$I_{r0}\in \Bbb{R}^{M\times M\times (2P/S+1)\times(2P/S+1)}$ $M\times M$ entries as an identity matrix.
Column Orthogonality
$\cal{K}_{\cdot,ihw}$ $\cal{K}$ by the following equation:
$\mathcal{K}_{\cdot,ihw}=\mathcal{K}\bold{e}_{ihw}=Reshape(Conv(K,E_{i,h,w}))$
$\bold{e}_{ihw}\in\Bbb{R}^{CHW}$ $E_{i,h,w}\in \Bbb{R}^{C\times H\times W}$ $(i,h,w)$ .
The column orthogonality condition is:
$\left<\mathcal{K}_{\cdot,ih_1w_1},\mathcal{K}_{\cdot,jh_2,w_2}\right>=\begin{cases} 1,(i,h_1,w_1)=(j,h_2,w_2)\\ 0,otherwise \end{cases}$
There is also a simpler equivalent of this condition (with stride 1):
$Conv(K^{T},K^{T},padding=k-1,stride=1)=I_{c0}$
$K^{T}$ $K$ $K^T\in \Bbb{R}^{C\times M\times k\times k}$ $I_{c0}\in \Bbb{R}^{C\times C\times (2k-1)\times (2k-1)}$ $C\times C$ entries as an identity matrix.

$k$ :

\begin{cases} Conv(K,K,padding=0)=I_{r0}\\ Conv(K^T,K^T, padding=0)=I_{c0} \end{cases}

$I_{r0}\in\Bbb{R}^{M\times M\times 1\times 1}$ $I_{c0}\in\Bbb{R}^{C\times C\times C\times 1\times 1}$ are identity matrices.

This condition is clearly necessary but not sufficient for the orthogonal convolution conditions.

Given the following Lemma:

Lemma 1. $||\mathcal{K}\mathcal{K}^T-I||_F^2=||\mathcal{K}\mathcal{K}^T-I^{'}||_{F}^2+U$ $U$ is a constant.

The finally loss with regularization is:

L=L_{task}+\lambda L_{orth}

Performance

Robustness under Adversarial Attack

$\cal{K}$ $\Delta x$ $\Delta y$ is bounded to be low.

Inspirations

It sounds very natural that a more orthogonal set of kernels captures a more diverse set of features, and a more diverse set of features helps with the performance of models. In another perspective, orthogonal kernels have limited power, i.e. stabler gradients propagated during training, thus a more smooth and stable training is acquired.

However, by restricting strong orthogonality, the transformation done by the linear part of each layer is restricted to rotation in the space, which seems to be a little to restrictive for the model to work at its best.

Since the Lipshitz bound is lowered for each layer, it's rational for it to be more robust to adversarial attacks, but the resistance reported seems not that competitive with adversarial training.