How to set the padding of convolution to make the stride acting as a scale factor

img

Cover image source: A guide to convolution arithmetic for deep learning - Vincent Dumoulin, Francesco Visin - ArXiv

When doing convolution, the kernel is shift around the input image. Let's consider the case when the kernel moves left-to-right. The up-to-down scenario are just the same.

Constrains

Let's denote the padding size as pp, the image width as ww, the kernel size as kk, and the stride as ss. Then the total length of image width plus padding should be:

2p+w2p + w

At the begging, the kernel is placed on the left of this total length, and is then moved towards right side with step length of ss, until no more space is available. Denotes the number of the placable potions as nn, then the output width of the convolution is also nn. It should satisfys:

2p + w - s < k + (n-1) s \leq 2p + w \tag{a}

When will the stride ss acts as a scale factor? It can be formally written as:

n = \frac{w}{s} \tag{b}

Substitute (b) to (a) yields:

2p+wβˆ’s<k+wβˆ’s≀2p+w2p + w - s < k + w - s \leq 2p + w

The left side yields:

p<12kp < \frac{1}{2}k

And the right side yields:

pβ‰₯12(kβˆ’s)p \geq \frac{1}{2}(k - s)

Thus:

12(kβˆ’s)≀p<12k\frac{1}{2}(k - s) \leq p < \frac{1}{2}k

When kk is even number

Since p∈N+p \in \mathbb{N}^{+}, we must have

sβ‰₯2s \geq 2

Otherwise there would not be any available pp. Then

p=12kβˆ’1p = \frac{1}{2}k - 1

P.S. Therefore, when kk is even number, size preserved convolution is not available.

When kk is odd number

Since p∈N+p \in \mathbb{N}^{+}, then

p=[12k]p = [\frac{1}{2}k]
where [β‹…][\cdot] takes the integer part of the value.

This can be conveniently implemented as

p = k // 2
Lastly updated:

Do you have any ideas or comments? Please join the discussion on XπŸ‘‡