TRIPOD

(Template for Real-Time Image PrOcessing Development)

by Prof. Paul Y. Oh (Copyright 2001)

TRIPOD and tutorial is freeware and distributable
If used, author acknowledgement and/or bibliographic citatation is appreciated.

Keywords: machine vision, computer vision, robotic vision, real-time, image processing, MFC, C++, Win32 API, Windows, framegrabber, video camera, tracking, centroid, binarize, image features, bitmap, BMP, real-time, USB camera, Logitech, LEGO Vision Command

The left figure is an image grab of a Windows application programmed using TRIPOD. The top viewport (color) displays in real-time, images captured by a Logitech USB camera's. At the same time the bottom viewport displays image processing results, in this case, a binary image (black and white) thresheld at a 150 8-bit grayscale intensity.

TRIPOD was written to empower enthusiasts with an easy-use, open-source software Template to rapidly program Real-time Image PrOcessing algorithms. Enthusiasts include those wishing to use a digital video camera (Logitech USB camera) for Developing robotic, machine, computer vision applications.

The template consists of MFC VC++ source files for which you build your image processing program on top of, as shown in the next figure. The template grabs a frame, gives a pointer to its pixel data and displays the image. You just simply add your code for pixel processing.








Motivation and Audience

This tutorial's audience is characterized by the above questions and frustrations. TRIPOD is a set MFC VC++ files (all source code and compiled files freely downloadable). With TRIPOD, this tutorial will show in detail how to implement image processing algorithms.

The tutorial assumes you know or have the following:

Machine vision is often difficult eventhough some of the fundamental concepts are easy to grasp. Implementing concepts on images acquired in real-time (30 frames/sec) by a video camera, typically demands advanced skills in low-level software programming. Even for ANSI C/C++ programmers, Windows programming is a steep learning curve (e.g. Win32 API and MFC). Furthermore, framegrabbers, CCD cameras are often expensive and their bundled software development kits (SDK) typically are proprietary, involve licensing and royalties. Beyond affordability, such SDKs often just provide ActiveX components and not the source code. As such, developing your custom application is limited to what functions the SDK gives you.

TRIPOD was created to overcome such difficulties and impact the largest possible audience. The itemized assumptions above are common denominators; readers will probably be advanced in one or more skills but would know ANSI C/C++ whether they program in UNIX or Windows. Most readers will have access to a Win98 machine even if they typically use a Linux-based PC or Sun Sparc. The Win98 machine requirements are modest and Logitech's USB cameras are very prolific and affordable (apx. $50 USD for the Logitech QuickCam Express).

The tutorial demonstrates writing a grayscale threshold program using the TRIPOD template. The tutorial breakdown:

Why Logitech?

Logitech's USB cameras are widely available. In fact, LEGO's Vision Command is a Logitech USB camera and hence robot builders can make use of TRIPOD. Logitech also provides a free Quickcam software development kit (QCSDK) which can be downloaded from http://developer.logitech.com/sdk/. Alternatively, the 10 MB file can be downloaded from Boondog's local server. The SDK includes PDF documentation and software examples. It doesn't require DirectX or Video For Windows (VFW) programming knowledge unlike Intel's OpenCV or Microsoft's Vision SDK. With the QCSDK one can easily and quickly write applications like take snapshots and record AVI files. However writing more interesting image processing applications with the Logitech's Quickcam ActiveX component is more complex. Functions to access a frame's pixels exist but unfortunately there's little documentation or source code comments on doing so.

Readers with Win32 API and MFC skills can probably dissect the software examples. Doing so, a template resembling TRIPOD could emerge. However, the examples do not provide a pointer to or an array of pixels that most machine vision programmers are used to, namely as a row-column vector. Lastly, Windows default image format is bitmap (BMP) and the QCSDK provides images in 24-bit RGB (red, green, blue) True-color. Most machine vision programmers often use 8-bit grayscale images with pixels stored left-to-right and top-to-bottom (e.g. PGM).

TRIPOD was thus written with these considerations in mind: (1) The MFC VC++ programming framework is provided so that you can just manipulate pixels by adding ANSI C code in the appropriate section; (2) The 24-bit RGB BMP is converted into a 24-bit grayscale version. You can still work with the color BMP if you wish; (3) Code comments and this code description explains the BMP's left-to-right, bottom-to-top storage format; (4) frame data is provided in the standard row-column vector.

Coding Objective

"Hello World" is the first application programmers begin with. Likewise, machine vision developers start out with thresholding and binary image generation. For example their program would create the binary image (black or white pixels only) on the right from the grayscale image on the left.

An 8-bit grayscale image means that each pixel ranges in value from 0 to 255. An image has (WxH) pixels, where W is width (number of columns) and H is height (number of rows). The algorithm reads each pixel. Pixels below a threshold value are made white (grayscale value = 255), otherwise they are made black (grayscale value = 0). A thresholding pseudocode is thus:


	for(row = 0; row < H; row++) {
           for(col = 0; col < W; col++) {
              if(pixel[row, col] < thresholdValue) 
		pixel[row, col] = 0; /* make pixel black */
	      else
                pixel(row, col) = 255; /* make pixel white */
	   }
	}

If you installed your Logitech USB camera and the QCSDK, then you can try running brinarize.exe (Version 1.0 uploaded 04/29/02) and see it work: (NB: note the spelling has a "r" in brinarize.exe).

Step-by-Step Instructions

Assuming you only have minimal MFC VC++ experience, step-by-step instructions using TRIPOD to create brinarize.exe follow. This program will display two frames. The top one is unaltered and just gives a live display from your Logitech USB camera. Below it, a binary version at a specfied theshold value is displayed. brinarize.exe was tested on a Pentium III 500 MHz, 128 Mb RAM PC and a LEGO Vision Command USB camera and both images displayed at approximately 30 frames/sec.

Assuming also that you installed your Logitech USB camera and the QCSDK, download tripod.zip (2.5 MB) (Version 1.0 uploaded 04/29/02) and unzip it to C:/tripod.

STEP 1: Creating a Win32 Application

Create a Win32 Application; from the menubar choose File - New and select "Win32 Application". For the Location, choose C:/tripod and type brinarize for the Project name (NB: note the spelling has a "r" in brinarize.exe). The Win32 checkbox should be checked. When your screen looks like the following, click OK.

STEP 2: Creating and MFC Project

After you clicked OK above, Choose "Empty Project" when the popup box asks you for a project type. When you click the Finish button, VC++ automatically creates your project's structure and makefiles. Click the "FileView" tab (near the screen's bottom) and you see the Source, Header and Resource folders. From the menubar choose File - Save All and then Build - Rebuild All. You shouldn't get any compile errors since these folders are empty.

STEP 3: Copying TRIPOD files to your applications folder

Using Explorer, copy the TRIPOD template files from the C:/tripod folder to your application project folder e.g. C:/tripod/brinarize:


   StdAfx.h, resource.h, tripod.cpp, tripod.h, tripod.rc, tripodDlg.cpp, 
   tripodDlg.h, videoportal.h, videoportal.cpp.  

You should also copy the res folder. In VC++, choose FILE - Save All.

STEP 4: Including TRIPOD files to your VC++ Project

In VC++, click the "FileView" tab and expand the binarize files. You should see folders for Source Files, Headers Files and Resources. Click the "Source Files" folder once and then right click and choose "Add Files". You should the left image in the figure below:

Browse to "C:/tripod/brinarize" and add tripod.cpp. Expand the Source Files folder and you should see tripod.cpp listed as shown in middle image in the above figure.

Repeat the above, adding the following files to your Source Files folder:

 
	tripod.rc, tripodDlg.cpp, videoportal.cpp

Next, add the following files to your Header Files folder:


	StdAfx.h, tripod.h, resource.h, tripodDlg.h, videoportal.h

Once all these files have been added, your workspace tree should look like the left image of the above figure.

STEP 5: Including QCSDK and MFC Shared DLLs

QCSDK include files need to be added to your project. From the menubar click Project - Settings. Next, click on the root directory (binarize) and then the C/C++ tab. Under the "Category" combo pulldown box choose "Preprocessor". In the "Additional Include Directories" edit box, type /QCSDK1/inc. Your screen should look like the following figure:

Next, click the "General" tab and under the "Microsoft Foundations Class" pulldown menu, choose "Use MFC in a shared DLL" as shown in the following figure:

Finish off by clicking OK. Next, save all your work by clicking File - Save All. Next compile your project with choosing Build - Rebuild All.

STEP 6: Adding your image processing code

The TRIPOD source, header and resource files used in the previous steps grab the color image frame, converts the red, green and blue pixels into a grayscale value, and stores the frame pixels into a malloc'ed row-column vector. All that remains for you to add your image processing routines.

Your added code goes in the "tripodDlg.cpp" file, under the CTripodDlg::doMyImageProcessing function.

For example, to threshold you'd add the code colored in red:




  void CTripodDlg::doMyImageProcessing(LPBITMAPINFOHEADER lpThisBitmapInfoHeader)
  {
     // doMyImageProcessing:  This is where you'd write your own image processing code
     // Task: Read a pixel's grayscale value and process accordingly

     unsigned int    W, H;	      // Width and Height of current frame [pixels]
     unsigned int    row, col;	      // Pixel's row and col positions
     unsigned long   i;	  	      // Dummy variable for row-column vector
     BYTE	     thresholdValue;  // Value to threshold grayvalue

     char    	     str[80];         // To print message 
     CDC	     *pDC; 	      // Device context need to print message

     W = lpThisBitmapInfoHeader->biWidth;  // biWidth: number of columns
     H = lpThisBitmapInfoHeader->biHeight; // biHeight: number of rows

     // In this example, the grayscale image (stored in m_destinationBmp) is
     // thresholded to create a binary image.  A threshold value close to 255 
     // means that only colors close to white will remain white in binarized 
     // BMP and all other colors will be black

     thresholdValue = 150;

     for (row = 0; row < H; row++) {
	for (col = 0; col < W; col++) {

 	  // Recall each pixel is composed of 3 bytes
	  i = (unsigned long)(row*3*W + 3*col);

	  // Add your code to operate on each pixel. For example 
	  // *(m_destinationBmp + i) refers to the ith pixel in the destinationBmp
          // Since destinationBmp is a 24-bit grayscale image, you must also apply
	  // the same operation to *((m_destinationBmp + i + 1) and *((m_destinationBmp + i + 2)
  	  
	  // Threshold: if a pixel's grayValue is less than thresholdValue
	  if( *(m_destinationBmp + i) <= thresholdValue) 
	      *(m_destinationBmp + i) = 
              *(m_destinationBmp + i + 1) = 
              *(m_destinationBmp + i + 2) = 0; // Make pixel BLACK
	  else
	      *(m_destinationBmp + i) = 
              *(m_destinationBmp + i + 1) = 
              *(m_destinationBmp + i + 2) = 255; // Make pixel WHITE
	  
	}
     }

     // To print message at (row, column) = (75, 580). Comment if not needed
     pDC = GetDC();	
     sprintf(str, "Binarized at a %d threshold", thresholdValue);
     pDC->TextOut(75, 580, str);
     ReleaseDC(pDC);
  }


STEP 7: Save, Compile and Execute

Once you've implemented your image processing algorithm, choose FILE - Save All. Next choose Build - Rebuild All. If compiling is successful choose Build - Execute brinarize.exe. Your application should launch and succesfully threshold and display real-time binarized images. A screen shot of brinarize.exe is shown below:

brinarize.zip (2.4 MB) (Version 1.0 uploaded 04/29/02) contains the source code and executable if you encounter any problems with the above steps.

Code Commentary

Preliminaries

Non-MFC/Win32/VC++ programmers are often surprised when they can't find main and encounter unfamiliar data types like LPBITMAPINFOHEADER and CDC in MFC VC++ code. MFC is a set of classes designed specifically for Windows graphical-user interface (GUI) programming. As such, ANSI C/C++ programmers may find it hard to jump into Windows programming without knowing the classes, data types and perhaps the Win32 API (application programmer's interface). This learning curve can be discouraging when one just wishes to port their image processing algorithm to Windows.

ANSI C/C++ programmers will recognize that in doMyImageProcessing, the code nested between the two for loops is ANSI C. m_destinationBmp is a pointer to an array of pixels and *(m_destinationBmp + i) is the value of the i'th pixel. The two for loops will allow you read, process and write each pixel. After cycling through the array, a final m_destinationBmp results and can be displayed. doMyImageProcessing and displaying m_destinatiomBmp runs in real-time (30 frames/sec) if the nested code is not computationally intensive (like simple threshold calculations),

m_destinationBmp points to a 24-bit grayscale bitmap. It is 320 pixels wide by 240 pixels high. It is malloc'ed and created in the function grayScaleTheFrameData. In this function, sourceBmp points to the actual pixel data in the 24-bit RGB color image captured by your Logitech camera. Being RGB, each pixel in sourceBmp is represented by three bytes (red, green and blue).

The reason for creating m_destinationBmp is that often, machine vision developpers use grayscale images to reduce computation cost. If you need color data, then just use sourceBmp.

An image is an arranged set of pixels. A 2-dimensional array like myImage[r,c] where r and c are the pixel's row and column positions respectively, is an intuitive arrangement as illustrated in the above figure (left). For example, myImage is a (3x4) image having three rows and four columns. myImage[2,1], which refers to the pixel at row 2 column 1, has a pixel intensity value "J".

An alternative arrangment, often encountered in machine vision, is the row-column format which uses a 1-dimensional vector and shown in the figure above (right). A particular pixel is referenced by:


		(myImage + r*W + c)

where myImage is the starting address of the pixels, r and c are the pixel's row and column positions respectively , and W is the total number of columns in the image (i.e the image's width in pixels). To access the pixel's value, one uses the ANSI C de-referencing operator:


		*(myImage + r*W + c)

For example for r=2, c=1 and W=4, then (myImage + r*C + c) yields (myImage + 9). In vector form myImage[9], which is the same as *(myImage + 9), has the pixel value "J".

The row-column format has several advantages over the array. First, memory for an array must be allocated before a program runs. This forces a programmer to size an array according to the largest possible image the program might encounter. As such, small images requiring smaller arrays would lead to wasted memory. Furthermore, passing an array between functions forces copying it on the stack which again wastes memory and takes time. Pointers are much more computationally efficient and memory can be malloc'ed at run-time. Second, once image pixels are arranged in row-column format, you can access a particular pixel with a single variable, as well as take advantage of pointer arithmetic e.g. *(pointToImage++). Arrays take two variables and do not have similar arithmetic operators. For these two reasons row-column formats are used in machine vision, especially when more computationally intensive and time-consuming image processing is involved.

A 24-bit image uses three bytes to specify a single pixel. Often these bytes are the pixel's red, green and blue (RGB) contributions. RGB is also known as the Truecolor format since 16 million different colors are possible with 24-bits. As mentioned above, m_destinationBmp and sourceBmp are 24-bit grayscale and Truecolor images respectively. m_destinationBmp makes all three bytes of a single pixel equal in intensity value. The intensity is a gray value computed from the amount of red, green and blue in the pixel. As such:


  	*(m_destinationBmp + i), *(m_destinationBmp + i + 1), *(m_destinationBmp + i + 2)

are equal (see the function grayScaleTheFrameData if interested). and is the reason for the code seen in Step 6 above where thresholding set all three bytes to either black or white.

Bitmaps, the default image format of the Windows operating system can be saved to a disk file and typically have the .BMP extension. Bitmaps can also exist in memory and be loaded, loaded, displayed and resized. There are two caveats to using bitmaps. First, pixels are stored from left-to-right but bottom-to-top; when a bitmap is viewed, pixels towards the bottom are stored closer the image's starting address. Second, a pixel's color components are stored in reverse order; the first, second and third bytes are the amounts of blue, green and red consecutively. Again, grayScaleTheFrameData function can be referenced to see this reverse-ordering of color.

Code Operation

The flowchart shows brinarize.exe's function calling sequence. A Window's application begins with a call to OnInitDialog. Code here initializes the two videoportal's size. A call to allocateDib allocates memory to display both the image captured by your Logitech camera and the image resulting from doMyImageProcessing i.e. binarizing.

The Logitech SDK defines a variable flag called NOTIFICATIONMSG_VIDEOHOOK and goes true whenever a new image frame is acquired by your Logitech camera. After OnInitDialog, the code in OnPortalNotificationProcessedview checks for this flag and executes. Code here, assigns the pointer lpBitmapPixelData to the frame's pixel data, grayscales the color image, executes your doMyImageProcessing algorithm and then displays your image processing results. If your doMyImageProcessing is not computationally time-consuming, OnPortalNotificationProcessedview will execute at 30 frames/sec.

Displaying the results of your image processing algorithm is handled by the function displayMyResults. It uses the MFC function StretchDIBits which stretches a device-independent bitmap image to fit the videoportal's display window.

Where To Go From Here

This step-by-step tutorial offers a rapid method to develop real-time image processsing applications in Windows. TRIPOD is a set of files that serve as a template where you can easily integrate your machine vision algorithms in ANSI C; a pointer to image pixels and row-column format offered in the doMyImageProcessing function does not require low-level knowledge in MFC, VC++, and the Win32 API. A real-time binarization was illustrated using TRIPOD

Machine vision is fascinating and fun but its implementation often requires specific programming skills and specialized hardware that obsures computer vision theory. For example, binarizing images is fundamentally, a simple concept. TRIPOD's files and the Logitech's Videoportal ActiveX component takes care of the low-level issues like acquring frame data, processing pixels and displaying results. TRIPOD and a Logitech USB camera enable any developper with ANSI C knowledge to quickly implement real-time computer vision algorithms.

Binarizing images just served as a "Hello World" introduction to machine vision. Algorithms like detecting edges and colors, tracking regions and counting objects can be easily implemented. Handbooks like Myler's The Pocket Handbook of Image Processing Algorithms in C (ISBN: 0-13-642240-3) provide code for such applications. Such algorithms can value add your Logitech USB camera with tasks like surveillance, robot navigation and image recognition.

Future work for this tutorial's author includes additional code examples like tracking and visual-servoing. Additionally the author hopes to develop a Visual Basic (VB) version of TRIPOD. VB offers rapid development of Window GUIs without a steep learning curve.

Review of Existing Machine Vision Software

Ideally machine vision development would focus on generating better algorithms and formulating theories rather than struggling with low-level software/hardware issues. For example, algorithms for improved image understanding/recognition or tracking multiple targets are active research areas. However with many existing Windows-based packages, implementing these algorithms is frustratingly difficult, oftentimes demanding specialized DirectX, Win32 API and MFC knowledge. As such, machine vision researchers and developers can resort to Linux/Unix based platforms to have more control over low-level details. The reality however is that software/hardware for Windows-based PC's is more prolific and often affordable. Furthermore end-users of one's machine vision code, like surveillance and manufacturing companies, will most likely want a Windows-based version.

Reviewed below are the author's experiences with some existing software/hardware packages that offer real-time image handling with a Windows PC, where images are grabbed through a video camera, processed and displayed. Software that only works with static image files or non-Windows are not reviewed. The review's purpose is for potential developers to ascertain a package's potential before both investing time and finances. This author's experiences have been frustrating for a number of reasons and hence TRIPOD was conceived and developed: (1) Some packages only have canned solutions, forcing you to use the package's machine vision functions. Often source code is not available or there's little documentation. As such, the package does not lend itself to developing one's own algorithms. (2) Some packages require a strong skill set in DirectX, Win32 API and MFC. Machine vision development demands begin with getting a pointer to the frame's pixel data and algorithms are then a matter of processing pixels. The pre-requisite skill set however makes implementation frustratingly time-consuming especially since there are few books on DirectX programming dedicated to processing video. (3) Some packages have run-time licenses where distributing your machine vision solutions forced purchasing additional licenses.

Free or Low-Cost Packages

This open-source computer vision software library is a beautiful offering and has a dedicated user group. Frustrating however is that it has compile bugs, code is not well-documented and some examples rely on compiled object code or DirectX. Those comfortable with DirectX, MFC and Win32 API can perhaps debug and develop applications. Those without such experience will find the learning curve very steep. With Logitech's ActiveX component and the accompanying manuals one can quickly develop a VB, VC++ or Win32 API application to display real-time video captured by a Logitech USB camera. The manuals explain how to display video acquired by the camera, record video to an AVI file, take snapshots and add text to displays. The Vidbert example source code can be studied to understand how to get a pointer to the frame's image data. Unfortunately the code is not well documented nor is Vidbert's operation explained in the manuals and it does not arrange pixels in a standard row-column format. The ActiveX component is free and has no run-time licenses. The QCSDK has a lot of potential as this tutorial has shown. James Matthews gives a nice programming tutorial as well as a motion detection example. The free download is an executable that demos some applications created with their ActiveX component. Not all the demos work. The commercial version (apx. $149 USD) offers some canned Visual Basic functions like negation and motion detection. Accessing the actual pixel data is not available however. Calling the Technical Help desk did not resolve questions about the non-working demos, pixel data access and compile failures. As such, this package is not really for developers who wish to design their own real-time processing algorithms. It however can give VB programmers a quick way to display video captured by a Logitech Quickcam camera. However, Logitech's SDK provides the same functionality for free. The 14-day free trial OCX (ActiveX) component has potential for real-time image processing with USB and Video For Windows (VFW) compatiable cameras and framegrabbers. The commerical version is priced at $99.95 USD ($39.95 USD student version). There are several examples but little documentation. The help file does suggest a pointer to the image frame's pixel data is available. Furthermore, it appears that programmers skilled in MFC might make use of this OCX. Pong Suvan has a number of nice programs for machine vision. The executables demonstrate some canned processing examples like Sobel filters and blob analysis. He gives some explanations on using his software to develop image processing applications but not in enough detail for non-MFC programmers. Pong frankly states that he cannot give the full source code. As such, it isn't clear exactly how to access the frame's pixel data.

Research Lab Level Packages

The two packages below are relatively expensive and require a separate (CCD) camera purchase. Coreco is a machine vision hardware/software OEM. $3600 buys one a single license for Sherlock which only runs on their framegrabber line. Any code that may be distributed requires purchasing additional licenses. The PCVision framegrabber costs $1250 USD. Additionally, once the software is installed in a PC, it cannot be re-installed on another PC. This author bought the package as a potential development platform for his research students. Although the software's ActiveX component has canned solutions like blob analysis, region-based tracking and optical character recognition, it sadly does not allow one to easily custom programming. Also the algorithms are sadly 2D and perhaps one or two generations behind what's available in research labs. As such, it is not well suited as a machine vision research development platform. Sherlock's market is more geared to manufacturers wishing for a turnkey parts inspection solution. The $2500 USD (educational discount) Matrox Imaging Library (MIL) and $595 USD Meteor II framegrabber are well-suited for machine vision development. It does require run-time licenses but MIL programming is very straight-forward with comprehensive manuals. Developers with ANSI C knowledge can quickly begin developing code.

Review Conclusions

TRIPOD was created to impact the largest audience namely those interested in machine vision and know ANSI C. TRIPOD's hardware demands are modest (Win98 233 MHz minimum, 128 MB RAM) and affordable ($50 USD Logitech USB camera). Non-research lab machine vision developers may find Matrox's MIL too expensive. Also, equipping classroom PCs with Matrox's MIL is financially out of reach of most schools. This prevents professors exposing course students to real-time image processing experiences. The $50 Logitech USB camera however is less than most textbook prices. Every student could potentially buy one and do course exercises and projects at home. Projects could include tracking, motion detection, optical character recognition, structure-from-motion, visual-servoing and disparity/stereo. If TRIPOD is used then professors can focus on teaching the fundamental machine vision theory and ANSI C would be the only pre-requisite. Professors would not have to tangent off with DirectX, ActiveX, Win32 API and MFC lectures. This author intends to introduce an undergraduate/graduate machine vision course with student homeworks and projects implemented using TRIPOD. Pedagogic experiences will be shared in the future.

Author Information

TRIPOD and tutorial were developed by Paul Y. Oh, a robotics professor in the mechanical engineering department of Drexel University in Philadelphia, PA, USA. Prof. Oh's research interests include visual-servoing, robotics, mechatronics and 3D reconstruction of urban areas from aerial photos. Prof. Oh's technical publications can be found in the IEEE Robotics and Automation proceedings and transactions or downloaded.

Special Notes for Win2K and WinXP Installation

It looks like the Logitech SDK only works with certain web cam drivers - 5.4.3 in particular. To get TRIPOD to work with WinXP, you must setup your QuickCam Express or LEGO Vision Command camera with the 5.4.3 driver. Note: TRIPOD has only been tested with the Quickcam Express and LEGO Vision Command Cameras (Win 98/XP) and the Quickcam 3000 (WinXP). Quickcam 3000 Pro comes with Logitech driver 6.0 Quickcam 4000 Pro comes with Logitech driver 7.0 The LEGO VisionCommand comes with Logitech 5.3.2.3222 Click here to email me