Saturday, May 19, 2012

Rewriting Things

In the life of every nontrivial project, there is a point where you look back and ask yourself: "How did we get here? What a tangled web we have weaved! Can we ever recover? If we push ahead, are we digging ourselves into a deeper hole, or is there a light at the other end?"

I'd like to offer some personal advice.

If you look at your code base and say "There is so much that's right, even if it's really messy and has some serious bugs." then you should keep going, especially if other people are involved in the project. If you look at your code and say "Actually, no one is really using this yet, and if I switch things around before people have to deal with it, they'll thank me later." then you should head back. Start over. Get it right from the beginning while you have a chance, or it's really going to hurt later.

You'll know when it's ok to start over, because you won't be too worried about it. You won't deliberate very long, the choice will be pretty obvious.

But if there is some turmoil in your heart, you need to step back and get some perspective. If it's taking you weeks or months to figure out what to do, don't give up. The answer is that you need to work through it. You need to spend that time reflecting on exactly what is wrong, and what you can do about it. If you get abandon all your work at that point, you're going to regret it later. Especially if you divide the community that is contributing to it, they'll end up duplicating their effort just to learn the same lessons.

Take OpenCV as an example. The C interface was a solid backbone for thousands of developers, but the community got to a point where it needed to work past the limitations of that language. C++ looked incredibly appealing, but it would required some significant changes to how OpenCV works in order to support those features. What did they do? They didn't rewrite OpenCV from scratch, but they started building a new foundation by reflecting on what worked and what didn't work in the original interface. They took the whole community along with them and now OpenCV 2 is strong and better than ever. One of the most important things is that a lot of the same people are involved. If there was a fork, or someone decided to start from scratch, then nothing learned during the development of OpenCV 1 would be completely applied to OpenCV 2. Some of the same paradigms might be there, but a complete, continuous understanding would be missing.

On the other hand, look at pd and Max. They could have combined efforts, and learned from each other in a more direct way. But now we have two slightly different environments that are each missing features of the other.

This might all sound unnecessarily verbose for such a simple message. But that's because this is actually a metaphor for something completely unrelated to software development.

Not the Best

I want to be the best, in everything that I do. I try to always push myself, which is great, but this competitive spirit isn't always a good thing. I've been learning recently that an unbalanced competitive spirit can have a bunch of terrible side effects. It can cause:

  • Jealousy for the success of others.
  • A tendency to seek disadvantages for others.
  • Condescending behavior in order to discourage others.
  • Unnecessary frustration when you're not at your best.

All of these things can also lead to passive aggressive behavior. Passive aggression is a way of internalizing these effects, sinking deeper into them, while putting up a front of being above them. Passive aggressive behavior puts you on a pedestal, by acknowledging that you could sink to jealousy, or frustration, or anything else — but you're "better than that". If you can actually rise above these things, there is no need to explicitly acknowledge your progress.

A great example of passive aggressive behavior is providing positive sentiments after a negative statement. Telling someone you would appreciate it if they changed their behavior, and following it up with "thanks!" is one way of pushing the point that you're "better than that," and you're not "really frustrated," when in fact you're trying to mask your frustration. Better responses include: not saying anything and letting it go, or stating clearly, without any masking, how you feel. If it feels like you're exposed and your frustration is out in the open, you're probably doing it right.

An unbalanced desire to "be the best" can also cause an unwillingness towards empathic behavior. In order to have empathy for others, you need to join them where they are, and relate to their state of mind. This requires a loss of ego, and a loss of pride. Instead, the oposite behavior is chosen: the competitive person becomes sarcastic, or offers outlandish advice, or attempts to make the problem seem insignificant or trivial. I don't think I'm very sarcastic, but I regularly offer outlandish advice when people come to me with problems. Sometimes it's an effective for dealing with your own problems, but it doesn't always mean the same thing in the form of advice.

Sarcasm is maybe the most dangerous of these responses. When someone hears sarcasm, the first reaction is disbelief; that the statement sounds ridiculous. Then, they're forced into a reversal of their understanding, where they accept the true intention of the speaker. Sarcasm forces the person hearing it into the mindset of the person speaking it. This is key: sarcasm is a shield against empathy. With sarcasm, you reject the validity of another person's situation and instead force them to empathize with you.

I'm still learning these lessons, but right now they're informed by a healthy dose of criticism from a variety of open source software developers, and exactly one failed romance.

Tuesday, January 17, 2012

Make Fewer Decisions, Write Less Code

I have two suggestions for aspiring C++ programmers:

  • Make fewer decisions.
  • Write less code.

These might seem counterintuitive at first, so let me explain.

Let's say you have a setter method for an object.

class MyObject {
public:
 void setSize(int);
private:
 int size;
}

When you implement that method, how do you name the argument? Here is one way:

void MyObject::setSize(int size) {
 this->size = size;
}

You may have also seen "int _size" or "int inSize", or maybe MyObject has a variable "mSize" instead, or any number of other combinations. I personally do it the above way because I have a thorough understanding of variable scope, and this happens to minimizes the number of unique symbols.

The important thing isn't which approach you pick, it's that you consistently use that solution. Each time you see a setter method, you should be able to write out those two lines without thinking twice. In other words, when you come across a mundane problem you should always have the same solution. You shouldn't even have to make a decision: every time there is more than one way to do something, pick the way that works most of the time, and always use it.

As another example, perhaps you have a collection of things you want to loop through. Here is the first thing I write:

for(int i = 0; i < n; i++)

Code formatting is a kind of recurring decision. What is the smallest set of rules you can formulate for the way you format code? These are two for the above line:

  • Binary operators are surrounded by spaces.
  • Semicolons are followed by spaces.

By minimizing the number of rules you have to follow, and having them cover as many situations as possible, you can reduce the amount of decisions you have to make. Maybe the above rules can be further simplified by removing the word "binary"?

Similarly important are your design pattern choices, all the way down to variable and enumeration idioms:

After writing the above line, you can define "n" on the line above. You'll probably need it again later. And sometimes you can even use something like boost to write the whole thing with a single idiom.

If you're ever asking yourself how your code should be formatted or whether to use < or <= for a loop condition, you're probably wasting time that could be better spent on high level decisions. When I'm using Xcode I use its terrible auto-indent feature to make sure my code is consistent, even though I aesthetically disagree with some of its decisions — it's the fastest way of normalizing my code.

Besides making fewer decisions (and, in the process, writing more consistent code), it's important to write less. Writing less means finding creative ways to use fewer symbols, fewer objects, fewer control statements, fewer loops. Writing less should never make things more complicated. Consider the two functions below:

bool both(bool a, bool b) {
 if(a) {
  if(b) {
   return true;
  }
  if(!b) {
   return false;
  }
 }
}
bool both(bool a, bool b) {
 return a && b;
}

It's an extreme example, but the point is: when you write less code, there are fewer opportunities to make mistakes (there is actually a mistake in the first one, can you see it?). In the situation above, you can simplify your code with truth table analysis. To sum the numbers from 1 to n you can (arguably) simplify your code with recursion, or the more efficient analytic version:

int sum(int n) {
 int sum = 0;
 for(int i = 1; i <= n; i++) {
  sum += i;
 }
 return sum;
}
int sum(int n) {
 if(n > 0) {
  return 1 + sum(n - 1);
 } else {
  return 0; 
 }
}
int sum(int n) {
 return (n * (n + 1)) / 2;
}

For each of those techniques, how many ways could they be wrong? Which one is the simplest?

Thursday, March 17, 2011

Median Filtering in OpenCV

I was just browsing through the OpenCV source to learn more about how it implements smoothing. I noticed a few interesting things that say something more general about how OpenCV2 is structured.

In OpenCV1 there is cvSmooth(), which lets you pass a parameter like CV_GAUSSIAN or CV_MEDIAN to specify what kind of smoothing you want. In OpenCV2, this function coexists with CV2-style functions like cv::medianBlur() and cv::GaussianBlur() (note that Gaussian is capitalized because it is a proper name). If you scroll to the very bottom of smooth.cpp, you'll find cvSmooth(), where it becomes evident that the newer cv::medianBlur and cv::GaussianBlur() are the implementations, while cvSmooth() is a wrapper that simply calls them.

Reading the documentation, I was surprised to find that many of the blurring functions support in-place processing. Due to the way median filtering works, in-place operation is a non-trivial property. Digging into cv::medianBlur(), you'll find:

void medianBlur(const Mat& src0, Mat& dst, int ksize) {
...
 dst.create( src0.size(), src0.type() );
...
 cv::copyMakeBorder( src0, src, 0, 0,
  ksize/2, ksize/2, BORDER_REPLICATE );
...
}

First, it calls Mat::create() on the dst Mat (in CV1, this would be an IplImage*). Mat::create() ensures that dst is the right size and type. If you pass it an unallocated Mat then this step allocates it for you, which makes it easy to use but less efficient. Then it does a copyMakeBorder(), which makes it safe to run the median filter on the edges of the image. So even if you give medianBlur() an allocated dst Mat, it's still going to be allocating a big working image for doing the blur! Finally, there's this mess of an if statement:

double img_size_mp = (double)(size.width*size.height)/(1 << 20);
if( ksize <=
  3 + (img_size_mp < 1 ? 12 : img_size_mp < 4 ? 6 : 2)*
  (MEDIAN_HAVE_SIMD && checkHardwareSupport(CV_CPU_SSE2) ? 1 : 3)) {
 medianBlur_8u_Om( src, dst, ksize );
} else {
 medianBlur_8u_O1( src, dst, ksize );
}

This is actually a really pleasant surprise. There are two things that might happen here: medianBlur_8u_Om() or medianBlur_8u_O1(). The _Om() function takes as long to run as your image is big (called O(n) time, or linear time) while the _O1() function takes a constant amount of time to run, regardless of how big your image is (O(1) time, or constant time). The O(1) implementation isn't trivial, and was only implemented in a 2007 paper. If the O(1) function is available, why not just always use that? The answer is in the if statement above: sometimes when your kernel size is smaller (relative to your total image size) it's actually faster to use the O(n) function. OpenCV has gone to the trouble of figuring out where that cutoff is, and this if statement encodes that cutoff — automatically switching between the implementations for us.

In conclusion, if you need the most blazingly-fast median filtering code ever, first you need to figure out which side of the if statement you're on (O(n) or O(1)). Then you should prepare a reusable buffer for yourself using cv::copyMakeBorder(), and call medianBlur_8u_O1() or medianBlur_8u_Om() directly.

Monday, March 14, 2011

Social Media Predictors

Who is winning on the internet right now?

Let's say you watch a video on YouTube. You're the 500th viewer, and later that day it explodes to 100k views. This gives you a score of 500/100000 = .005. The next day you watch a video, you're the 500th viewer, but the video never goes beyond 1k views. So your score that day is 500/1000 = .5. Your average score is (.005 + .5) / 2 = ~.25.

Let's say the person with the lowest score is winning. Unfortunately, the only institution that's really in a position to calculate this score is Google.

Thursday, February 10, 2011

libfreenect, three months in

it's been three months,
we're already telling students,
"you need to threshold the depth image"
and waving around our kinect for a more complete perspective

now go get your kinect
and put it in the same spot you put it
when you first brought it home.

do you remember the feeling
of a new eye in your house?
a welcome intruder?

watch it sitting there and try to remember
the feeling that things are somehow "more 3d"
now that the computer can see it too.

when you first brought it home
was it pointing away from you?
proving itself to you,
identifying the scale of a scene
larger than itself?

it's been three months,
which direction are you pointing it now?

Wednesday, November 10, 2010

libfreenect

is it just me, or is this kind of exciting?

not exciting because it's a new "gadget",
but because it's different kind of tool.

without the ps3eye,
the eyewriter wouldn't exist in its current form.
what should we make with kinect?
is there anything we couldn't do before?

how long until we tell students
"to detect someone,
first you need to threshold the depth image" or,
"for a full 3d map of a space,
you'll need about 4 kinects in the center of room"

how long until the new posture is "hands forward" instead of "hands up"?
"superman" instead of "surrender"?

how long until we just wave a kinect around,
get a complete 3d map of a space
feed it into our projection mapping toolkit
and start making interesting work
instead of worrying about the mapping?

and finally, what kind of work is inevitable with 3d sensing?
how long until there is a clear 3d interaction aesthetic?
and we say "i've seen this before, i bet they did it with a kinect" ;)