This tutorial covers three subtopics on local image operations. In general many standard opeations may be realized with python package functions (such as scipy). We also the cover the case of explicit implementation for exposition and for development and validation of new local functions.
Image procesing and computer vision opeartions may frequently be implemetned with available python packages. A simple example of image convolution is shown in the following.
import numpy as np
from v4 import vx
im = vx.Vx("uint8", [0, 6, 0, 4],1)
im.i[0][2] = 10
im.i[1][1] = 10
im.i[2][2] = 20
im.i[3][3] = 30
print(im.i)
[[ 0 0 10 0 0 0] [ 0 10 0 0 0 0] [ 0 0 20 0 0 0] [ 0 0 0 30 0 0]]
The basc VX image class may be created either from a v4 file or explicity as shown above. The structure element .i is the image data in a numpy array. im also contains the image metadata.
kernel = np.ones((3,3));
print(kernel);
[[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]
from scipy import ndimage
res = ndimage.convolve(im.i, kernel, mode='constant', cval=0.0)
print(res);
[[10 20 20 10 0 0] [10 40 40 30 0 0] [10 30 60 50 30 0] [ 0 20 50 50 30 0]]
from v4 import vd
vd.dispmvx(res, scale='table')
<scaled size: (4 x 6) >
An explicit (but less efficient) implementation of convolution may be achieved with the vx.Vx embedim function. It is very common to make the output of a local image operation have the same dimensions as the input image; padding is a convenient mechanism to achieve this for explicit, pixel level, programs.
## create a result image "resvx" and a padded image "ip"
resvx = im.i * 0
ip = vx.Vx(np.copy(im.i))
ip.embedim((1,1,1,1)) #zero pad the original image
print (ip.i)
vd.dispsvx(im.i, ip.i, scale='table', capt="Images im and ip")
[[ 0 0 0 0 0 0 0 0] [ 0 0 0 10 0 0 0 0] [ 0 0 10 0 0 0 0 0] [ 0 0 0 20 0 0 0 0] [ 0 0 0 0 30 0 0 0] [ 0 0 0 0 0 0 0 0]]
Images im and ip <scaled size: (4 x 6) (6 x 8) >
Consider below the implicit implemetnation of convolution
for y in range(np.shape(resvx)[0]):
for x in range(np.shape(resvx)[1]):
for yi in range(np.shape(kernel)[0]):
for xi in range(np.shape(kernel)[1]):
resvx[y,x] += ip.i[y+yi,x+xi] * kernel[yi,xi]
print(resvx)
vd.dispmvx(resvx, scale='table')
[[10 20 20 10 0 0] [10 40 40 30 0 0] [10 30 60 50 30 0] [ 0 20 50 50 30 0]]
<scaled size: (4 x 6) >
Note, The outcome is identical to the scipy implicit function
for y in range(np.shape(resvx)[0]):
for x in range(np.shape(resvx)[1]):
resvx[y,x] = (ip.i[y,x] * kernel[0,0] + ip.i[y,x+1] * kernel[0,1] + ip.i[y,x+2] * kernel[0,2] +
ip.i[y+1,x] * kernel[1,0] + ip.i[y+1,x+1] * kernel[1,1] + ip.i[y+1,x+2] * kernel[1,2] +
ip.i[y+2,x] * kernel[1,0] + ip.i[y+2,x+1] * kernel[2,1] + ip.i[y+2,x+2] * kernel[2,2])
print(resvx)
vd.dispmvx(resvx, scale='table')
[[10 20 20 10 0 0] [10 40 40 30 0 0] [10 30 60 50 30 0] [ 0 20 50 50 30 0]]
<scaled size: (4 x 6) >
An issue with practical images is that the pixels are typically in byte format which is adequate for viewing images with the human visual system (HVS). There are several details that need to be addressed when performing computations on byte valued pixels. The strategy to deal with the low dynamic range pixel format may depend upon the application.
For recent deep learning systems, a common approach is to convert the pixels to float format and to scale their values from 0.0 to 1.0. For run-time systems, special hardware and data formats may be used to improve efficiency.
For many applications, arbitrary scaling of input pixels may be a problem especially when a desired output may also be a byte image for human review. Below we give examples of two approaches when using byte pixel data.
Consider the following modified image:
# Consider a modified input image "ib"
ib = im.i * 8
print (ib)
vd.dispmvx(im.i, ib, scale='table', capt="im.i and ib")
[[ 0 0 80 0 0 0] [ 0 80 0 0 0 0] [ 0 0 160 0 0 0] [ 0 0 0 240 0 0]]
im.i and ib <scaled size: (4 x 6) (4 x 6) >
# now consider the convolution of both im.i and ib
imc = ndimage.convolve(im.i, kernel, mode='constant', cval=0.0)
ibc = ndimage.convolve(ib, kernel, mode='constant', cval=0.0)
vd.dispmvx(imc, ibc, scale='table')
print(imc)
<scaled size: (4 x 6) (4 x 6) > [[10 20 20 10 0 0] [10 40 40 30 0 0] [10 30 60 50 30 0] [ 0 20 50 50 30 0]]
We see the result is incorrect because of numerical overflow of our array dtype 'uint8' as it is not able to represent any value greater than 255 or less than 0. Note, python does not provide any error message to inform that an overflow occurred. Since the kernel is (by default) set to type float, one way to achieve a valid output is to scale the kernel. Alternatively, once could change array dtype to “int16” (or “float32”); however, in that case, it would be necessary to scale the result to a byte again in order to create a conventional image file for human viewing.
In general, it is common practice, to scale (potiive valued) convolution kernels so that they sum to one, then no overflow is possible. The situation is more complex if any kernel elments have negative values.
ibc1 = ndimage.convolve(ib, kernel/5, mode='constant', cval=0.0)
ibc2 = ndimage.convolve(ib.astype('int16'), kernel, mode='constant', cval=0.0)
vd.dispmvx(ibc1, ibc2, scale='table')
vd.dispsvx(ibc1, ibc2, scale='table')
<scaled size: (4 x 6) (4 x 6) >
<scaled size: (4 x 6) (4 x 6) >
Python requires that the “first” element of an array has index values of [0,0]. This means that a padded image has a “spatial” offset with respect to the initial image which makes indexing a little messier as is shown in the loop-expanded explicit example below.
# 3 x 3 kernel loop expanded convolution
for y in range(np.shape(resvx)[0]):
for x in range(np.shape(resvx)[1]):
resvx[y,x] = (ip.i[y,x] * kernel[0,0] + ip.i[y,x+1] * kernel[0,1] + ip.i[y,x+2] * kernel[0,2] +
ip.i[y+1,x] * kernel[1,0] + ip.i[y+1,x+1] * kernel[1,1] + ip.i[y+1,x+2] * kernel[1,2] +
ip.i[y+2,x] * kernel[1,0] + ip.i[y+2,x+1] * kernel[2,1] + ip.i[y+2,x+2] * kernel[2,2])
vd.dispsvx(im.i, resvx, scale='table')
<scaled size: (4 x 6) (4 x 6) >
Consider that we want a convolution of a pixel with its four-connected near neighbors. One way to do this is to use a kernel of
0 1 0 1 1 1 0 1 0
knn = np.array([[0,1,0],[1,1,1],[0,1,0]])
ibnn = ndimage.convolve(im.i, knn, mode='constant', cval=0.0)
vd.dispsvx(im.i, ibnn, scale='table', capt="input image and 4NN convolution")
input image and 4NN convolution <scaled size: (4 x 6) (4 x 6) >
Consider the explicit version of this convolution below. While, the relative indices of NN pixels are +/- one pixel from the center pixel location, offsets (of +1 in this case) need to be added to all ip pixel indices to accomodate the padding.
resnn = 0 * im.i
for y in range(np.shape(resvx)[0]):
for x in range(np.shape(resvx)[1]):
resnn[y,x] = ( ip.i[y,x+1] +
ip.i[y+1,x] + ip.i[y+1,x+1] + ip.i[y+1,x+2] +
ip.i[y+2,x+1] )
vd.dispsvx(im.i, resnn, scale='table', capt="input image and 4NN convolution")
input image and 4NN convolution <scaled size: (4 x 6) (4 x 6) >
In summary, (1) good library packages exist for many standard iame processing functions in python. (2) When developing programs for images be aware of pixel precision in your program design and that python does not usually report overflow errors. (3) When developing or validating new algorihtms with pixel indexing be careful to get the index values correct when image padding is involved.