High Performance Signal and Image Processing on VxWorks

High Performance Signal and Image Processing on VxWorks

KaKay photo

The VxWorks platform includes the Intel Integrated Performance Primitives (IPP).  This set of libraries is used for signal processing, image processing, video processing, cryptography, and other computations involving large vectors and matrices.  Intel IPP enables VxWorks applications to include text and object detection in aerial photos, machine vision in automated manufacturing, audio signal processing in aircrafts, high-speed encryption and decryption of signals.

Intel IPP libraries are designed to take advantage of Intel Streaming SIMD Extensions (SSE) and Intel Advanced Vector Extensions (AVX) instructions.  These instructions process 128, 256, and 512 bits of data in a single instruction, and accelerates matrix processing.

For optimization on Intel Architecture, the VxWorks platform also includes the Intel C++ Compiler (ICC).  VxWorks libraries, kernel, drivers, and applications can all be built with ICC for the Intel Architecture.

Recently, the VxWorks platform upgraded to ICC version 16 and Intel IPP version 9 support.  This new version of Intel IPP contains optimizations for a majority of the image processing, color conversion, and computer vision functions for SSE 4.2 and Intel AVX2.  The details below explore some features of Intel IPP v9 and the benefits of this version upgrade.

Image Processing

In machine vision, the first processing steps are often noise reduction and edge detection.   Once edges are identified, the computer can go through a series of pattern recognition steps to identify objects, text, and faces.  Intel IPP image processing library has a set of functions to support machine vision in some simple steps.

I started with a 6 megapixel image of an aircraft to see the result of edge detection with Intel IPP’s computer vision library in VxWorks.  A typical high definition (HD) 1920 x 1080 image is 2 megapixels, so I’m working with an image that is 3 times the size of a typical HD image.  Figure 1 shows the result of each Intel IPP matrix conversion.  The first step converts the color image to grayscale.  Then a 5×5 low-pass Gaussian filter is used to remove image noise.  Finally, the Canny Edge Detection function is used to create the edge detection image.

Fig1 - airplaneIppiProgression_fragmentsmall

Figure 1 – Canny Edge Detection using Intel IPP.   Image shown is a portion of a smaller version of the 6 megapixel image.  From left to right:  original image, image after grayscale conversion and low-pass filter, final edge image.


The image in Figure 1 is a fragment of a reduced size airplane image of the 6 megapixel image.  Click on the following thumbnail image to see a larger version with its edges detected. The edge detection looks like a good representation of the edges of the original image.

Development time advantage

As a comparison, I tried to replicate edge detection functionality without the use of Intel IPP.  The edge detection algorithm used in the Intel IPP is the Canny algorithm.  This Wikipedia page gives a good description of the algorithm.

The first obvious advantage of Intel IPP was the great reduction in development time needed to implement edge detection.  Even with a search on the internet for a ready-made Canny Edge Detection C library, it still took me many hours to get the application to finally detect edges properly.  The one C-like library I did find turned out to have a few bugs.  To fix them, I ended up studying the Canny Edge Detection algorithm in depth to modify the code to follow the actual algorithm.  The resulting source code was hardly recognizable from its original source, especially when I apply low-level array access optimizations typical of VxWorks C applications.

Performance advantage

The second obvious advantage of Intel IPP was its performance.  The self-written implementation processed the image one pixel at a time in sequential processing.  The application ran on a single processor core, on a single VxWorks task, and did not make use of Intel AVX instructions.  It would have doubled or tripled my development time to rework the code to take advantage of VxWorks multicore processing and Intel AVX.

I’m running VxWorks on a Core i7-4700EQ processor, which is from the 4th generation Intel Core processor family running at 3.4GHz.  This processor has AVX-2 extensions.  For a 6 megapixel image with not a whole lot of edges to detect, Intel IPP on VxWorks took 28.8 milliseconds from color image to edge image.  That would be 35 frames per second on Intel IPP.   By contrast, my sequential processing implementation on VxWorks took 1.4 seconds, at 0.7 frames per second – a difference by one or two orders of magnitude.

I also ran the same image processing application against the earlier version of Intel IPP.  Intel IPP 8 took 119 milliseconds in contrast to 28.8 milliseconds for Intel IPP 9.  So with the upgrade from Intel IPP 8 to Intel IPP 9, I see a big increase in performance.

Fig2 - airplanePerformance

Figure 2.  Contrast in image processing speed for a 6 megapixel image

One advantage of writing the Canny Edge Detection algorithm as a C application myself is the ability to see the step-by-step process of the Canny Edge Detection algorithm.  The algorithm starts by producing a gradient magnitude image, which is an image showing the magnitude of change in color intensity from neighboring pixels.  The larger magnitudes show roughly where the edges are.  Then using two directional gradient images, an edge-thinning process called non-maximum suppression is used.  Each gradient in the edge-thinned image is then categorized into a strong edge, weak edge, and no edge based on a high threshold and a low threshold.  In image #5 in Figure 3, the red lines show the strong edges and the blue lines show the weak edges.  In a process called hysteresis thresholding, all weak edges are identified as a true edge only if it touches a line connected to a strong edge.

The image in Figure 3 is a fragment of a reduced size image.  Click the following thumbnail image to see a larger version of the edge image.

Fig3 - airplanesmallraster_progressionfragment

Figure 3. Canny Edge Detection from sequential processing C application.  From left to right, top row:  Original image, grayscale conversion, gradient map.  Bottom row:  Edge thinning, strong-weak edge categorization, final edge image using hysteresis thresholding


Using equivalent thresholds, the Intel IPP application and the sequential processing VxWorks implementation yield roughly the same edge.  The biggest difference is in its performance and development time.

One important note is the reduced application stack size when using Intel IPP on VxWorks.  In the hysteresis thresholding phase of the Canny Edge Detection algorithm where the weak edges are traced for contact with strong edges, the sequential processing implementation uses a recursive trace routine.  Even after minimizing the argument list in the trace routine, the application stack easily used up 5 times more space than the Intel IPP application for large dense edge detection images.

Edge Density

The hysteresis thresholding process in Canny Edge Detection algorithm involves huge amounts of line tracing.  The more edges in the picture, the longer the application takes.  Here is another image, slightly smaller at 5.1 megapixels, but with many more edges to identify.

Figure 5 shows a fragment of an image with lots of edges. The red lines show where edges were detected.

Fig5 - cockpitIppi_edgeOverlayColor - fragment

Figure 5. Portion of a 5 megapixel image of a cockpit with denser edges


Click on thumbnail  image to see a larger version of the image.

While the full size cockpit image is about 18% smaller than the full size airplane image, the frame rate for the smaller image was slightly lower.  So the processing frame rate can be affected by the content of the image.

Image size

Applications commonly using machine vision like aerial mapping and automated manufacturing have different image size requirements.  I ran the VxWorks edge detection applications against a number of other image sizes to see its effect on performance.

Fig6 - edisonIppi_edgeOverlayColor

Figure 6.  Intel Edison Arduino board. 0.5 megapixel image processed  in 7.1 milliseconds by IPP on VxWorks. (Click on image for larger version)


Figure 7.  Portion of an aerial map.  Full size image is 2 megapixel processed in 23.2 milliseconds by Intel IPP on VxWorks. (Click on image for larger version)

Figure 8 shows the performance of Intel IPP image processing on various image sizes.  As a general guideline, smaller images are processed at a higher frame rate.  However, the actual frame rate depends on image content.

Fig8 - imageSize

Figure 8.  Frame rate of various image sizes on Intel IPP 9 with VxWorks

Signal Processing

So far, I’ve focused on the image processing libraries of Intel IPP.  VxWorks also comes with the Intel IPP signal processing libraries.

Figure 9 shows an audio signal.  The signal was processed on VxWorks with Intel IPP Fast Fourier Transform (FFT) routine that converts the audio signal into the frequency profile as shown.  In speech recognition, the next step is for the application to run through pattern matching algorithms to identify phonemes.  A series of phonemes can be identified as a word.

Fig9 - signalprocessing

Figure 9.  Signal processing on VxWorks using Intel IPP.  Left side:  Audio signal.  Right side:  Frequency profile. (Click on image for larger version)

If I gather performance measurements of the FFT routines, I will no doubt find that Intel IPP 9 will show the same performance improvements over Intel IPP 8. Detailed performance comparisons for the signal processing library can be a topic for a different blog entry.

You can find out more about VxWorks at http://www.windriver.com/products/vxworks/. Intel ICC and Intel IPP are part of every VxWorks platform. Wind River continues to provide highly optimized software libraries for use with VxWorks, including those on the Intel Architecture.