Cover image source: A guide to convolution arithmetic for deep learning - Vincent Dumoulin, Francesco Visin - ArXiv
When doing convolution, the kernel is shift around the input image. Let's consider the case when the kernel moves left-to-right. The up-to-down scenario are just the same.
Let's denote the padding size as , the image width as , the kernel size as , and the stride as . Then the total length of image width plus padding should be:
At the begging, the kernel is placed on the left of this total length, and is then moved towards right side with step length of , until no more space is available. Denotes the number of the placable potions as , then the output width of the convolution is also . It should satisfys:
2p + w - s < k + (n-1) s \leq 2p + w \tag{a}
When will the stride acts as a scale factor? It can be formally written as:
n = \frac{w}{s} \tag{b}
Substitute (b) to (a) yields:
The left side yields:
And the right side yields:
Thus:
Since , we must have
Otherwise there would not be any available . Then
P.S. Therefore, when is even number, size preserved convolution is not available.
Since , then
where takes the integer part of the value.
This can be conveniently implemented as
p = k // 2