Tuesday, February 28, 2006

Chunking Language

When we need to copy something we'll shift our focus from the source to our copy, back and forth, until we're done. A simple revelation: the number of symbols we can take in one glance is directly proportional to the information density and our fluentness.

For me to transcribe Chinese, I have to study each stroke; for Russian, a word at a time; in Spanish, whole phrases, and in English entire sentences, or multiple sentences. The more fluent you are, the easier it is to see the effect of information density: it takes more glances to copy from an encyclopedia than a novel.

This effect seems to appear in all sorts of languages: when I first started writing in classical staff notation, I had to count lines and study each note. Over time, I was able to glance at entire measures. Code is another interesting case: depending on how often you write in a language and how verbose the syntax is, you can take in bigger or smaller "chunks" at once.

I'd expect this is all because we can only keep so much in our immediate "working memory" at once, and the more constrained something is by our "long-term memory" the more information we can store (essentially because it's compressed as references). If we can show a more direct relation between our familiarity with a symbol and the speed at which we copy it (or, the size of our chunks), this could be developed into a technique for mining word frequency information.

No comments: