erratic semaphore: computer vision

Showing posts with label computer vision. Show all posts

Thursday, March 17, 2011

Median Filtering in OpenCV

I was just browsing through the OpenCV source to learn more about how it implements smoothing. I noticed a few interesting things that say something more general about how OpenCV2 is structured.

In OpenCV1 there is cvSmooth(), which lets you pass a parameter like CV_GAUSSIAN or CV_MEDIAN to specify what kind of smoothing you want. In OpenCV2, this function coexists with CV2-style functions like cv::medianBlur() and cv::GaussianBlur() (note that Gaussian is capitalized because it is a proper name). If you scroll to the very bottom of smooth.cpp, you'll find cvSmooth(), where it becomes evident that the newer cv::medianBlur and cv::GaussianBlur() are the implementations, while cvSmooth() is a wrapper that simply calls them.

Reading the documentation, I was surprised to find that many of the blurring functions support in-place processing. Due to the way median filtering works, in-place operation is a non-trivial property. Digging into cv::medianBlur(), you'll find:

void medianBlur(const Mat& src0, Mat& dst, int ksize) {
...
 dst.create( src0.size(), src0.type() );
...
 cv::copyMakeBorder( src0, src, 0, 0,
  ksize/2, ksize/2, BORDER_REPLICATE );
...
}

First, it calls Mat::create() on the dst Mat (in CV1, this would be an IplImage*). Mat::create() ensures that dst is the right size and type. If you pass it an unallocated Mat then this step allocates it for you, which makes it easy to use but less efficient. Then it does a copyMakeBorder(), which makes it safe to run the median filter on the edges of the image. So even if you give medianBlur() an allocated dst Mat, it's still going to be allocating a big working image for doing the blur! Finally, there's this mess of an if statement:

double img_size_mp = (double)(size.width*size.height)/(1 << 20);
if( ksize <=
  3 + (img_size_mp < 1 ? 12 : img_size_mp < 4 ? 6 : 2)*
  (MEDIAN_HAVE_SIMD && checkHardwareSupport(CV_CPU_SSE2) ? 1 : 3)) {
 medianBlur_8u_Om( src, dst, ksize );
} else {
 medianBlur_8u_O1( src, dst, ksize );
}

This is actually a really pleasant surprise. There are two things that might happen here: medianBlur_8u_Om() or medianBlur_8u_O1(). The _Om() function takes as long to run as your image is big (called O(n) time, or linear time) while the _O1() function takes a constant amount of time to run, regardless of how big your image is (O(1) time, or constant time). The O(1) implementation isn't trivial, and was only implemented in a 2007 paper. If the O(1) function is available, why not just always use that? The answer is in the if statement above: sometimes when your kernel size is smaller (relative to your total image size) it's actually faster to use the O(n) function. OpenCV has gone to the trouble of figuring out where that cutoff is, and this if statement encodes that cutoff — automatically switching between the implementations for us.

In conclusion, if you need the most blazingly-fast median filtering code ever, first you need to figure out which side of the if statement you're on (O(n) or O(1)). Then you should prepare a reusable buffer for yourself using cv::copyMakeBorder(), and call medianBlur_8u_O1() or medianBlur_8u_Om() directly.

Saturday, May 08, 2010

Gaze-informed Perceptual Compression

A video chat program that tracks your eye movement and sends gaze information to the other user. The other user's computer compresses the entire image heavily, with the exception of what you're looking at. To you, it just looks like the entire image is clear.

Sunday, February 21, 2010

3D Video Scanner for Cheap

Here's a way you might try making a 3D video scanner for the cost of a webcam:

Weccam with VSYNC broken out
Bright LED or LED array
Ambient illumination

Mount the LED at approximately the same location as the camera lens. Turn the LED on for alternating VSYNC pulses. The 3D decoding process is as follows: the light intensity at every point can be modeled using the equation i = r * (a + s), where:

i is the captured intensity at that pixel
r is the reflectivity at that point
a is the ambient illumination at that point
s is the illumination due to the LED source at that point

Sampling with the LED on and off yields two equations:

i_on = r * (a + s)
i_off = r * (a + 0)

And s corresponds to distance proportionally to an inverse square law:

s(d) = f / d^2

Where f is a scaling factor that relates s to a. Solving for d yields:

i_off = r * a
i_off / a = r
i_on = (i_off / a) * (a + (f / d^2))
((a * i_on) / i_off) - a = f / d^2
a * ((i_on / i_off) - 1) = f / d^2
d = sqrt(f / (a * ((i_on / i_off) - 1)))

The values for a and f can be approximated by hand, or calibrated based on a reference plane. a must be truly uniform, but if the LED is approximately at the same location as the lens then f can be calibrated for automatically to account for its non-point-source qualities.

The disadvantages here are primarily the assumption about ambient illumination, and the simplified material model. The advantages would be the cost and utter simplicity. The fact that it relies on a non-coded point source for illumination means you can work with infrared just as easily as visible light. Furthermore, it actually relies on ambient illumination while many other systems try to minimize it.

Friday, January 08, 2010

Flash Mob 3D Scanning

Pick a local monument.
Organize a flash mob via Craigslist.
Instruct everyone to take photos of the monument.
Everyone then uploads and tags their photos.
These photos are then uploaded to Photosynth for 3D reconstruction.

As a variant, people can just record video and walk around. This relies on the videos being high resolution.

Saturday, December 05, 2009

3D capture in performance

Capturing 3D data from moving scenes is a hard problem in itself, but the harder problem is how to integrate it with a performance in real time. First you have to hide the 3D capture process from the audience, and then you need to present something related to the 3D capture in an engaging way.

Re-projecting coarse effects can be approximated with a multi-camera system, and fine details are dangerous due to system latency.

The real question is: what can you do with 3D data that you can't do with 2D images? The most interesting idea I have: you can cast shadows on a 3D form in a way you can't on a 2D form.

Removing Image Noise

Gray-level images for foreground/background are often afflicted by fine noise when they are thresholded.

I've used a few different approaches to dealing with this:

blurring before thresholding
median filtering before thresholding
morphological filtering after thresholding
distance transform after thresholding, followed by thresholding again as a more continuous morphological filtering algorithm

Unfortunately, these all have the same problem: inner corners are rounded. These algorithms can't tell the difference between a black pixel in an inner corner and a black pixel that is just noise.

Perhaps what we should really be using is something like bilateral filtering (in Photoshop, "Surface Blur"). Bilateral filtering preserves sharp edges, while blurring large undefined regions. Unfortunately, bilateral filtering creates a sort of "glow" that still has issues with corners.

Maybe we need something more like "smart blur"? It seems to not propagate across edges...

erratic semaphore