Keywords: machine vision, computer vision, robotic vision, real-time, image processing, MFC, C++, Win32 API, Windows, framegrabber, video camera, tracking, centroid, binarize, image features, bitmap, BMP, real-time, USB camera, Logitech, LEGO Vision Command
The left figure is an image grab of a Windows application programmed using TRIPOD. The top viewport (color) displays in real-time, images captured by a Logitech USB camera's. At the same time the bottom viewport displays image processing results, in this case, a binary image (black and white) thresheld at a 150 8-bit grayscale intensity.
TRIPOD was written to empower enthusiasts with an easy-use, open-source software Template to rapidly program Real-time Image PrOcessing algorithms. Enthusiasts include those wishing to use a digital video camera (Logitech USB camera) for Developing robotic, machine, computer vision applications.
The template consists of MFC VC++ source files for which you build your
image processing program on top of, as shown in the next figure. The template
grabs a frame, gives a pointer to its pixel data and displays the image.
You just simply add your code for pixel processing.
The tutorial assumes you know or have the following:
Machine vision is often difficult eventhough some of the fundamental concepts are easy to grasp. Implementing concepts on images acquired in real-time (30 frames/sec) by a video camera, typically demands advanced skills in low-level software programming. Even for ANSI C/C++ programmers, Windows programming is a steep learning curve (e.g. Win32 API and MFC). Furthermore, framegrabbers, CCD cameras are often expensive and their bundled software development kits (SDK) typically are proprietary, involve licensing and royalties. Beyond affordability, such SDKs often just provide ActiveX components and not the source code. As such, developing your custom application is limited to what functions the SDK gives you.
TRIPOD was created to overcome such difficulties and impact the largest possible audience. The itemized assumptions above are common denominators; readers will probably be advanced in one or more skills but would know ANSI C/C++ whether they program in UNIX or Windows. Most readers will have access to a Win98 machine even if they typically use a Linux-based PC or Sun Sparc. The Win98 machine requirements are modest and Logitech's USB cameras are very prolific and affordable (apx. $50 USD for the Logitech QuickCam Express).
The tutorial demonstrates writing a grayscale threshold program using the TRIPOD template. The tutorial breakdown:
Readers with Win32 API and MFC skills can probably dissect the software examples. Doing so, a template resembling TRIPOD could emerge. However, the examples do not provide a pointer to or an array of pixels that most machine vision programmers are used to, namely as a row-column vector. Lastly, Windows default image format is bitmap (BMP) and the QCSDK provides images in 24-bit RGB (red, green, blue) True-color. Most machine vision programmers often use 8-bit grayscale images with pixels stored left-to-right and top-to-bottom (e.g. PGM).
TRIPOD was thus written with these considerations in mind: (1) The MFC VC++ programming framework is provided so that you can just manipulate pixels by adding ANSI C code in the appropriate section; (2) The 24-bit RGB BMP is converted into a 24-bit grayscale version. You can still work with the color BMP if you wish; (3) Code comments and this code description explains the BMP's left-to-right, bottom-to-top storage format; (4) frame data is provided in the standard row-column vector.
An 8-bit grayscale image means that each pixel ranges in value from 0 to 255. An image has (WxH) pixels, where W is width (number of columns) and H is height (number of rows). The algorithm reads each pixel. Pixels below a threshold value are made white (grayscale value = 255), otherwise they are made black (grayscale value = 0). A thresholding pseudocode is thus:
for(row = 0; row < H; row++) { for(col = 0; col < W; col++) { if(pixel[row, col] < thresholdValue) pixel[row, col] = 0; /* make pixel black */ else pixel(row, col) = 255; /* make pixel white */ } }
If you installed your Logitech USB camera and the QCSDK, then you can try running brinarize.exe (Version 1.0 uploaded 04/29/02) and see it work: (NB: note the spelling has a "r" in brinarize.exe).
Assuming also that you installed your Logitech USB camera and the QCSDK, download tripod.zip (2.5 MB) (Version 1.0 uploaded 04/29/02) and unzip it to C:/tripod.
STEP 1: Creating a Win32 Application
Create a Win32 Application; from the menubar choose File - New and select "Win32 Application". For the Location, choose C:/tripod and type brinarize for the Project name (NB: note the spelling has a "r" in brinarize.exe). The Win32 checkbox should be checked. When your screen looks like the following, click OK.
STEP 2: Creating and MFC Project
After you clicked OK above, Choose "Empty Project" when the popup box asks you for a project type. When you click the Finish button, VC++ automatically creates your project's structure and makefiles. Click the "FileView" tab (near the screen's bottom) and you see the Source, Header and Resource folders. From the menubar choose File - Save All and then Build - Rebuild All. You shouldn't get any compile errors since these folders are empty.
STEP 3: Copying TRIPOD files to your applications folder
Using Explorer, copy the TRIPOD template files from the C:/tripod folder to your application project folder e.g. C:/tripod/brinarize:
StdAfx.h, resource.h, tripod.cpp, tripod.h, tripod.rc, tripodDlg.cpp, tripodDlg.h, videoportal.h, videoportal.cpp.
You should also copy the res folder. In VC++, choose FILE - Save All.
STEP 4: Including TRIPOD files to your VC++ Project
In VC++, click the "FileView" tab and expand the binarize files. You should see folders for Source Files, Headers Files and Resources. Click the "Source Files" folder once and then right click and choose "Add Files". You should the left image in the figure below:
Browse to "C:/tripod/brinarize" and add tripod.cpp. Expand the Source Files folder and you should see tripod.cpp listed as shown in middle image in the above figure.
Repeat the above, adding the following files to your Source Files folder:
tripod.rc, tripodDlg.cpp, videoportal.cpp
Next, add the following files to your Header Files folder:
StdAfx.h, tripod.h, resource.h, tripodDlg.h, videoportal.h
Once all these files have been added, your workspace tree should look like the left image of the above figure.
STEP 5: Including QCSDK and MFC Shared DLLs
QCSDK include files need to be added to your project. From the menubar click Project - Settings. Next, click on the root directory (binarize) and then the C/C++ tab. Under the "Category" combo pulldown box choose "Preprocessor". In the "Additional Include Directories" edit box, type /QCSDK1/inc. Your screen should look like the following figure:
Next, click the "General" tab and under the "Microsoft Foundations Class" pulldown menu, choose "Use MFC in a shared DLL" as shown in the following figure:
Finish off by clicking OK. Next, save all your work by clicking File - Save All. Next compile your project with choosing Build - Rebuild All.
STEP 6: Adding your image processing code
The TRIPOD source, header and resource files used in the previous steps
grab the color image frame, converts the red, green and blue pixels into a grayscale
value, and stores the frame pixels into a malloc'ed row-column vector. All that
remains for you to add your image processing routines.
Your added code goes in the "tripodDlg.cpp" file, under the
CTripodDlg::doMyImageProcessing function.
For example, to threshold you'd add the code colored in red:
STEP 7: Save, Compile and Execute
Once you've implemented your image processing algorithm, choose FILE - Save All.
Next choose Build - Rebuild All. If compiling is successful choose
Build - Execute brinarize.exe. Your application should launch and succesfully
threshold and display real-time binarized images. A screen shot of brinarize.exe
is shown below:
brinarize.zip (2.4 MB)
(Version 1.0 uploaded 04/29/02) contains the source code and executable if you encounter any problems with the above steps.
ANSI C/C++ programmers will recognize that in doMyImageProcessing,
the code nested between the two for loops is ANSI C.
m_destinationBmp is a pointer to an array of pixels and
*(m_destinationBmp + i) is the value of the i'th pixel. The two
for loops will allow you read, process and write each pixel. After
cycling through the array, a final m_destinationBmp results and
can be displayed. doMyImageProcessing and displaying m_destinatiomBmp
runs in real-time (30 frames/sec) if the nested code is not computationally
intensive (like simple threshold calculations),
m_destinationBmp points to a 24-bit grayscale bitmap. It is
320 pixels wide by 240 pixels high. It is malloc'ed and created in the function
grayScaleTheFrameData. In this function, sourceBmp points
to the actual pixel data in the 24-bit RGB color image captured by your Logitech
camera. Being RGB, each pixel in sourceBmp is represented by
three bytes (red, green and blue).
The reason for creating m_destinationBmp is that often, machine
vision developpers use grayscale images to reduce computation cost. If you
need color data, then just use sourceBmp.
An image is an arranged set of pixels. A 2-dimensional array like
myImage[r,c] where r and c are the pixel's row and
column positions respectively, is an intuitive arrangement as illustrated
in the above figure (left). For example, myImage is a (3x4) image
having three rows and four columns. myImage[2,1], which refers to
the pixel at row 2 column 1, has a pixel intensity value "J".
An alternative arrangment, often encountered in machine vision, is the row-column
format which uses a 1-dimensional vector and shown in the figure above (right).
A particular pixel is referenced by:
where myImage is the starting address of the pixels, r and c
are the pixel's row and column positions respectively , and W is the total
number of columns in the image (i.e the image's width in pixels). To access the
pixel's value, one uses the ANSI C de-referencing operator:
For example for r=2, c=1 and W=4, then
(myImage + r*C + c) yields (myImage + 9). In vector form
myImage[9], which is the same as *(myImage + 9),
has the pixel value "J".
The row-column format has several advantages over the array. First, memory
for an array must be allocated before a program runs. This forces a
programmer to size an array according to the largest possible image the program
might encounter. As such, small images requiring smaller arrays would lead
to wasted memory. Furthermore, passing an array between functions forces
copying it on the stack which again wastes memory and takes time. Pointers
are much more computationally efficient and memory can be malloc'ed
at run-time. Second, once image pixels are arranged in row-column format,
you can access a particular pixel with a single variable, as well as take
advantage of pointer arithmetic e.g. *(pointToImage++). Arrays
take two variables and do not have similar arithmetic operators. For these
two reasons row-column formats are used in machine vision, especially when
more computationally intensive and time-consuming image processing is involved.
are equal (see the function grayScaleTheFrameData if interested).
and is the reason for the code seen in Step 6
above where thresholding set all three bytes to either black or white.
Bitmaps, the default image format of the Windows operating system can
be saved to a disk file and typically have the .BMP extension. Bitmaps can
also exist in memory and be loaded, loaded, displayed and resized.
There are two caveats to using bitmaps. First, pixels are stored from left-to-right
but bottom-to-top; when a bitmap is viewed, pixels towards the bottom are
stored closer the image's starting address. Second, a pixel's color components
are stored in reverse order; the first, second and third bytes are the
amounts of blue, green and red consecutively. Again, grayScaleTheFrameData
function can be referenced to see this reverse-ordering of color.
The Logitech SDK defines a variable flag called NOTIFICATIONMSG_VIDEOHOOK
and goes true whenever a new image frame is acquired by your Logitech camera.
After OnInitDialog, the code in OnPortalNotificationProcessedview
checks for this flag and executes. Code here, assigns the pointer
lpBitmapPixelData to the frame's pixel data, grayscales the color image,
executes your doMyImageProcessing algorithm and then displays your image
processing results. If your doMyImageProcessing is not computationally
time-consuming, OnPortalNotificationProcessedview will execute at
30 frames/sec.
Displaying the results of your image processing algorithm is handled by
the function displayMyResults. It uses the MFC function
StretchDIBits which stretches a device-independent bitmap image
to fit the videoportal's display window.
Machine vision is fascinating and fun but its implementation often requires
specific programming skills and specialized hardware that obsures
computer vision theory. For example, binarizing images is fundamentally, a
simple concept. TRIPOD's files and the Logitech's Videoportal ActiveX
component takes care of the low-level issues like acquring frame data, processing
pixels and displaying results. TRIPOD and a Logitech USB camera enable
any developper with ANSI C knowledge to quickly implement real-time computer
vision algorithms.
Binarizing images just served as a "Hello World" introduction to machine vision.
Algorithms like detecting edges and colors, tracking regions and counting objects
can be easily implemented. Handbooks like Myler's The Pocket Handbook
of Image Processing Algorithms in C (ISBN: 0-13-642240-3) provide code for
such applications. Such algorithms can value add your Logitech USB camera with
tasks like surveillance, robot navigation and image recognition.
Future work for this tutorial's author includes additional code examples
like tracking and visual-servoing. Additionally the author hopes to
develop a Visual Basic (VB) version of TRIPOD. VB offers rapid development
of Window GUIs without a steep learning curve.
Reviewed below are the author's experiences with some existing software/hardware packages
that offer real-time image handling with a Windows PC, where images are grabbed through
a video camera, processed and displayed. Software that only works with static image files
or non-Windows are not reviewed. The review's purpose is for potential developers
to ascertain a package's potential before both investing time and finances. This
author's experiences have been frustrating for a number of reasons and hence TRIPOD
was conceived and developed: (1) Some packages only have canned solutions, forcing
you to use the package's machine vision functions. Often source code is not available
or there's little documentation. As such, the package does not lend itself to developing
one's own algorithms. (2) Some packages require a strong skill set in DirectX, Win32 API
and MFC. Machine vision development demands begin with getting a pointer to the
frame's pixel data and algorithms are then a matter of processing pixels. The pre-requisite
skill set however makes implementation frustratingly time-consuming especially since
there are few books on DirectX programming dedicated to processing video.
(3) Some packages have run-time licenses where distributing your machine vision
solutions forced purchasing additional licenses.
void CTripodDlg::doMyImageProcessing(LPBITMAPINFOHEADER lpThisBitmapInfoHeader)
{
// doMyImageProcessing: This is where you'd write your own image processing code
// Task: Read a pixel's grayscale value and process accordingly
unsigned int W, H; // Width and Height of current frame [pixels]
unsigned int row, col; // Pixel's row and col positions
unsigned long i; // Dummy variable for row-column vector
BYTE thresholdValue; // Value to threshold grayvalue
char str[80]; // To print message
CDC *pDC; // Device context need to print message
W = lpThisBitmapInfoHeader->biWidth; // biWidth: number of columns
H = lpThisBitmapInfoHeader->biHeight; // biHeight: number of rows
// In this example, the grayscale image (stored in m_destinationBmp) is
// thresholded to create a binary image. A threshold value close to 255
// means that only colors close to white will remain white in binarized
// BMP and all other colors will be black
thresholdValue = 150;
for (row = 0; row < H; row++) {
for (col = 0; col < W; col++) {
// Recall each pixel is composed of 3 bytes
i = (unsigned long)(row*3*W + 3*col);
// Add your code to operate on each pixel. For example
// *(m_destinationBmp + i) refers to the ith pixel in the destinationBmp
// Since destinationBmp is a 24-bit grayscale image, you must also apply
// the same operation to *((m_destinationBmp + i + 1) and *((m_destinationBmp + i + 2)
// Threshold: if a pixel's grayValue is less than thresholdValue
if( *(m_destinationBmp + i) <= thresholdValue)
*(m_destinationBmp + i) =
*(m_destinationBmp + i + 1) =
*(m_destinationBmp + i + 2) = 0; // Make pixel BLACK
else
*(m_destinationBmp + i) =
*(m_destinationBmp + i + 1) =
*(m_destinationBmp + i + 2) = 255; // Make pixel WHITE
}
}
// To print message at (row, column) = (75, 580). Comment if not needed
pDC = GetDC();
sprintf(str, "Binarized at a %d threshold", thresholdValue);
pDC->TextOut(75, 580, str);
ReleaseDC(pDC);
}
Code Commentary
Preliminaries
Non-MFC/Win32/VC++ programmers are often surprised when they can't find main
and encounter unfamiliar data types like LPBITMAPINFOHEADER and CDC
in MFC VC++ code. MFC is a set of classes designed specifically for Windows
graphical-user interface (GUI) programming. As such, ANSI C/C++ programmers
may find it hard to jump into Windows programming without knowing the classes,
data types and perhaps the Win32 API (application programmer's interface). This
learning curve can be discouraging when one just wishes to port their image processing
algorithm to Windows.
(myImage + r*W + c)
*(myImage + r*W + c)
A 24-bit image uses three bytes to specify a single pixel. Often these bytes
are the pixel's red, green and blue (RGB) contributions. RGB is also known as
the Truecolor format since 16 million different colors are possible
with 24-bits. As mentioned above, m_destinationBmp and
sourceBmp are 24-bit grayscale and Truecolor images respectively.
m_destinationBmp makes all three bytes of a single pixel equal in
intensity value. The intensity is a gray value computed from the amount of
red, green and blue in the pixel. As such:
*(m_destinationBmp + i), *(m_destinationBmp + i + 1), *(m_destinationBmp + i + 2)
Code Operation
The flowchart shows brinarize.exe's function calling sequence.
A Window's application begins with a call to OnInitDialog. Code here
initializes the two videoportal's size. A call to allocateDib allocates
memory to display both the image captured by your Logitech camera and the image
resulting from doMyImageProcessing i.e. binarizing.
Where To Go From Here
This step-by-step tutorial offers a rapid method to develop real-time image
processsing applications in Windows. TRIPOD is a set of files that
serve as a template where you can easily integrate your machine vision
algorithms in ANSI C; a pointer to image pixels and row-column format offered
in the doMyImageProcessing function does not require low-level knowledge
in MFC, VC++, and the Win32 API. A real-time binarization was illustrated
using TRIPOD
Review of Existing Machine Vision Software
Ideally machine vision development would focus on generating better algorithms and
formulating theories rather than struggling with low-level software/hardware issues.
For example, algorithms for improved image understanding/recognition or tracking
multiple targets are active research areas. However with many existing Windows-based
packages, implementing these algorithms is frustratingly difficult, oftentimes
demanding specialized DirectX, Win32 API and MFC knowledge. As such, machine vision
researchers and developers can resort to Linux/Unix based platforms to have
more control over low-level details. The reality however is that software/hardware
for Windows-based PC's is more prolific and often affordable. Furthermore end-users
of one's machine vision code, like surveillance and manufacturing companies, will most
likely want a Windows-based version.
Free or Low-Cost Packages
This open-source computer vision software library is a beautiful offering and
has a dedicated user group. Frustrating however is that it has compile bugs,
code is not well-documented and some examples rely on compiled object code
or DirectX. Those comfortable with DirectX, MFC and Win32 API can perhaps
debug and develop applications. Those without such experience will find the
learning curve very steep.
With Logitech's ActiveX component and the accompanying manuals one
can quickly develop a VB, VC++ or Win32 API application to display real-time
video captured by a Logitech USB camera. The manuals explain how to display
video acquired by the camera, record video to an AVI file, take snapshots
and add text to displays. The Vidbert example source code can be studied to
understand how to get a pointer to the frame's image data. Unfortunately the code
is not well documented nor is Vidbert's operation explained in the manuals and
it does not arrange pixels in a standard row-column format. The ActiveX component
is free and has no run-time licenses. The QCSDK has a lot of potential as this
tutorial has shown. James Matthews gives a nice programming
tutorial as
well as a motion
detection example.
The 14-day free trial OCX (ActiveX) component has potential for real-time
image processing with USB and Video For Windows (VFW) compatiable cameras
and framegrabbers. The commerical version is priced at $99.95 USD ($39.95 USD student
version). There are several examples but little documentation. The help file does suggest
a pointer to the image frame's pixel data is available. Furthermore, it appears
that programmers skilled in MFC might make use of this OCX.
Pong Suvan has a number of nice programs for machine vision. The executables
demonstrate some canned processing examples like Sobel filters and blob analysis.
He gives some explanations on using his software to develop image processing
applications but not in enough detail for non-MFC programmers. Pong frankly
states that he cannot give the full source code. As such, it isn't clear
exactly how to access the frame's pixel data.
Research Lab Level Packages
The two packages below are relatively expensive and require a separate (CCD) camera purchase.
Coreco is a machine vision hardware/software OEM. $3600 buys one a single license for Sherlock
which only runs on their framegrabber line. Any code that may be distributed requires purchasing
additional licenses. The PCVision framegrabber costs $1250 USD. Additionally, once the
software is installed in a PC, it cannot be re-installed on another PC. This author bought the
package as a potential development platform for his research students. Although the software's
ActiveX component has canned solutions like blob analysis, region-based tracking and optical
character recognition, it sadly does not allow one to easily custom programming. Also
the algorithms are sadly 2D and perhaps one or two generations behind what's available in
research labs. As such, it is not well suited as a machine vision research development platform.
Sherlock's market is more geared to manufacturers wishing for a turnkey parts inspection solution.
The $2500 USD (educational discount) Matrox Imaging Library (MIL) and $595 USD Meteor II
framegrabber are well-suited for machine vision development. It does require run-time
licenses but MIL programming is very straight-forward with comprehensive manuals. Developers
with ANSI C knowledge can quickly begin developing code.
Review Conclusions
TRIPOD was created to impact the largest audience namely those interested in machine
vision and know ANSI C. TRIPOD's hardware demands are modest (Win98 233 MHz minimum,
128 MB RAM) and affordable ($50 USD Logitech USB camera). Non-research lab machine vision
developers may find Matrox's MIL too expensive. Also, equipping classroom PCs with Matrox's MIL
is financially out of reach of most schools. This prevents professors exposing course students
to real-time image processing experiences. The $50 Logitech USB camera however
is less than most textbook prices. Every student could potentially buy one and do course
exercises and projects at home. Projects could include tracking, motion detection, optical character
recognition, structure-from-motion, visual-servoing and disparity/stereo. If TRIPOD is used
then professors can focus on teaching the fundamental machine vision theory and
ANSI C would be the only pre-requisite. Professors would not have to tangent off with
DirectX, ActiveX, Win32 API and MFC lectures. This author intends to introduce
an undergraduate/graduate machine vision course with student homeworks and projects implemented
using TRIPOD. Pedagogic experiences will be shared in the future.
Author Information
TRIPOD and tutorial were developed by Paul Y. Oh, a robotics professor
in the mechanical engineering department of Drexel University in Philadelphia,
PA, USA. Prof. Oh's research interests
include visual-servoing,
robotics, mechatronics and 3D reconstruction
of urban areas from aerial photos. Prof. Oh's technical publications can be found in the
IEEE Robotics and Automation proceedings and transactions or
downloaded.
Special Notes for Win2K and WinXP Installation
It looks like the Logitech SDK only works with certain
web cam drivers - 5.4.3 in particular. To get TRIPOD to
work with WinXP, you must setup your QuickCam Express
or LEGO Vision Command camera with the 5.4.3 driver.
Note: TRIPOD has only been tested with the Quickcam Express
and LEGO Vision Command Cameras (Win 98/XP) and the Quickcam 3000 (WinXP).
Quickcam 3000 Pro comes with Logitech driver 6.0
Quickcam 4000 Pro comes with Logitech driver 7.0
The LEGO VisionCommand comes with Logitech 5.3.2.3222
Click here to email me