Thursday, March 17, 2011

Median Filtering in OpenCV

I was just browsing through the OpenCV source to learn more about how it implements smoothing. I noticed a few interesting things that say something more general about how OpenCV2 is structured.

In OpenCV1 there is cvSmooth(), which lets you pass a parameter like CV_GAUSSIAN or CV_MEDIAN to specify what kind of smoothing you want. In OpenCV2, this function coexists with CV2-style functions like cv::medianBlur() and cv::GaussianBlur() (note that Gaussian is capitalized because it is a proper name). If you scroll to the very bottom of smooth.cpp, you'll find cvSmooth(), where it becomes evident that the newer cv::medianBlur and cv::GaussianBlur() are the implementations, while cvSmooth() is a wrapper that simply calls them.

Reading the documentation, I was surprised to find that many of the blurring functions support in-place processing. Due to the way median filtering works, in-place operation is a non-trivial property. Digging into cv::medianBlur(), you'll find:

void medianBlur(const Mat& src0, Mat& dst, int ksize) {
...
 dst.create( src0.size(), src0.type() );
...
 cv::copyMakeBorder( src0, src, 0, 0,
  ksize/2, ksize/2, BORDER_REPLICATE );
...
}

First, it calls Mat::create() on the dst Mat (in CV1, this would be an IplImage*). Mat::create() ensures that dst is the right size and type. If you pass it an unallocated Mat then this step allocates it for you, which makes it easy to use but less efficient. Then it does a copyMakeBorder(), which makes it safe to run the median filter on the edges of the image. So even if you give medianBlur() an allocated dst Mat, it's still going to be allocating a big working image for doing the blur! Finally, there's this mess of an if statement:

double img_size_mp = (double)(size.width*size.height)/(1 << 20);
if( ksize <=
  3 + (img_size_mp < 1 ? 12 : img_size_mp < 4 ? 6 : 2)*
  (MEDIAN_HAVE_SIMD && checkHardwareSupport(CV_CPU_SSE2) ? 1 : 3)) {
 medianBlur_8u_Om( src, dst, ksize );
} else {
 medianBlur_8u_O1( src, dst, ksize );
}

This is actually a really pleasant surprise. There are two things that might happen here: medianBlur_8u_Om() or medianBlur_8u_O1(). The _Om() function takes as long to run as your image is big (called O(n) time, or linear time) while the _O1() function takes a constant amount of time to run, regardless of how big your image is (O(1) time, or constant time). The O(1) implementation isn't trivial, and was only implemented in a 2007 paper. If the O(1) function is available, why not just always use that? The answer is in the if statement above: sometimes when your kernel size is smaller (relative to your total image size) it's actually faster to use the O(n) function. OpenCV has gone to the trouble of figuring out where that cutoff is, and this if statement encodes that cutoff — automatically switching between the implementations for us.

In conclusion, if you need the most blazingly-fast median filtering code ever, first you need to figure out which side of the if statement you're on (O(n) or O(1)). Then you should prepare a reusable buffer for yourself using cv::copyMakeBorder(), and call medianBlur_8u_O1() or medianBlur_8u_Om() directly.

Monday, March 14, 2011

Social Media Predictors

Who is winning on the internet right now?

Let's say you watch a video on YouTube. You're the 500th viewer, and later that day it explodes to 100k views. This gives you a score of 500/100000 = .005. The next day you watch a video, you're the 500th viewer, but the video never goes beyond 1k views. So your score that day is 500/1000 = .5. Your average score is (.005 + .5) / 2 = ~.25.

Let's say the person with the lowest score is winning. Unfortunately, the only institution that's really in a position to calculate this score is Google.

Thursday, February 10, 2011

libfreenect, three months in

it's been three months,
we're already telling students,
"you need to threshold the depth image"
and waving around our kinect for a more complete perspective

now go get your kinect
and put it in the same spot you put it
when you first brought it home.

do you remember the feeling
of a new eye in your house?
a welcome intruder?

watch it sitting there and try to remember
the feeling that things are somehow "more 3d"
now that the computer can see it too.

when you first brought it home
was it pointing away from you?
proving itself to you,
identifying the scale of a scene
larger than itself?

it's been three months,
which direction are you pointing it now?

Wednesday, November 10, 2010

libfreenect

is it just me, or is this kind of exciting?

not exciting because it's a new "gadget",
but because it's different kind of tool.

without the ps3eye,
the eyewriter wouldn't exist in its current form.
what should we make with kinect?
is there anything we couldn't do before?

how long until we tell students
"to detect someone,
first you need to threshold the depth image" or,
"for a full 3d map of a space,
you'll need about 4 kinects in the center of room"

how long until the new posture is "hands forward" instead of "hands up"?
"superman" instead of "surrender"?

how long until we just wave a kinect around,
get a complete 3d map of a space
feed it into our projection mapping toolkit
and start making interesting work
instead of worrying about the mapping?

and finally, what kind of work is inevitable with 3d sensing?
how long until there is a clear 3d interaction aesthetic?
and we say "i've seen this before, i bet they did it with a kinect" ;)

Sunday, August 29, 2010

In Response to "Glitching vs..."

A couple of days ago Evan Meaney wrote a blog post titled "Glitching vs. Processing vs. Moshing vs. Signal Interference". I really appreciate Evan's glitch work, from "Ceibas Cycle" to his writings in "on glitching", where he describes the inevitable collaboration with information theory present in all digital work. But this most recent post just doesn't make any sense.

He states that the post is inspired by the diverse work submitted to GLI.TC/H. I can understand wanting to make some loose categories to help group submissions, but the language in the post sounds like he's building a framework. He asks readers to "use this space as a means to explore and delineate, to observe and report, to enumerate...". But it's hard to do that with just four independent categories (with names like "glitching" and "processing") with no unifying structure other than the context of visual arts.

I think I understand where Evan is coming from, so I'd like to try again.

Let's start with noise.

Noise is what happens when we don't understand something. Noise can be manifest in any media: confusion about the clothing of a particular culture, inability to separate a visual foreground from background, the misunderstanding of a foreign rhythm or melody as arrhythmic or atonal. In our failure to contextualize, we create noise.

Glitch means finding noise when we expect to understand. Glitch is an experience, driven by expectation, emerging from consciousness rather than computation. Just as noise would not exist without us to misunderstand it, glitch would not exist without us to misexpect it.

Glitch art is about dwelling in and exploring these experiences, which sometimes means attempting to reproduce them. These reproductions may be executed in a variety of ways. Sometimes it will involve imitating the processes that regularly lead to glitches. This includes direct memory corruption at the byte level, redirection of streams, removal of key frames, analog interference, and circuit bending. Other times it eschews these processes, and opts to evoke the sensation by other means: through the "Bad TV" effect, or in the choice of palette, shapes, motion, melody, etc.

Most of the time, glitch art falls somewhere in between, drawing on the processes that give rise to glitches, but ultimately focused on evoking the experience by whatever means necessary.

Evan suggests that "a true glitch is not reproducible". I believe "true" glitch is unrelated to reproducibility. "True" glitch is tied solely to expectation. The reason it seems like something "stable" is no longer a glitch is simply because it's packaged as such (i.e., a "glitch") removing the possibility of expecting anything else.

That said, acknowledging that glitch is an experience gives us freedom as artists to share that experience regardless of the procedural purity of our practice. Saying that we're just "imperfect" is a cop-out based on a misguided understanding of what glitch artists are aiming for.

Monday, May 10, 2010

3D Scanning as Dense Microphone Array

Sound is the displacement of matter over time.

A microphone detects sound at a single point, either via direct physical coupling, or using optical methods (as with Laser microphones).

3D scanning can also detect displacement of reflective matter over time. Using a 3D scanning setup with a very large angle between the camera and projector, very minor displacement variations can be detected. Using a high framerate camera, this displacement can be measured at audio frequencies. Every pixel then corresponds to a virtual laser microphone: instead of the usual microphone at a point, a fringe analysis microphone is comprised of N points as determined by the camera resolution.

Saturday, May 08, 2010

Gaze-informed Perceptual Compression

A video chat program that tracks your eye movement and sends gaze information to the other user. The other user's computer compresses the entire image heavily, with the exception of what you're looking at. To you, it just looks like the entire image is clear.