Tuesday, September 05, 2006

Spamming Turing

One of the reasons I love Larry Kagan's art is for its origin: the shadows were originally a problem he sought to eradicate instead of a feature to take advantage of. What if we apply this idea to spam email? What could it possibly be good for?

If we look at the problem on a document-level, there is the potential for communicating relevant information in the form of spam. But we already know this doesn't work: nothing is relevant to everyone with an email address (much less to everyone in your address book, as some people have proven to me with their cute animals/national anthem/animated gif forwards).

On the other hand, if we look at it on a larger structural-level, we see something more interesting: at least half a billion email addresses receiving spam, most of which have some sort of spam filter in place (this is a guess based in the popularity of Yahoo! mail, Hotmail, Gmail, etc.). Content-based spam filters are, in a sense, fitness functions for the human-ness of a message. What's more, when an email gets past a spam filter, you get a real live human to decide whether its legitimate or not.

Ignoring any ethical dilemmas, I propose a learning system that makes an attempt to "reach out" to others via email, revising its attempts based on the clicks each different email receives (of course, there would be a URL in the message). I predict it will derive a shorter version of the Nigerian email scam, or something with the same theme ("I'm in need of trouble and need a response").

3 comments:

Jason said...

Google seems to do something similar to this with GMail among GMail users, I think, which is why GMail's spam filter is awesome.

Kyle said...

Gmail does collect spam characteristics from all users, which puts them in an ideal spot to pull this off. They would just let some genetic programming environment sit around all day, trying to evolve code that generates a message that can get past their spam filter. Whenever such a message was produced, they would send it to a bunch of their users, who would either read it and respond (the "from" email address would have to be faked of course) or call it "junk", providing a fitness function for the GP environment.

Jason said...

That's freaking ingenious. Like a self-instituted arms race against yourself (or, even more fun, against another department of your own company).

I need to train myself to start thinking of solving certain problems in those terms.