I’ve tried for a while to figure out why computer vision is mostly still in research labs in spite of the fact that there are many thousands of people and different algorithms and codebases for doing computer vision. One analogy that occurs to me is image compression.
There are an infinite number of ways of compressing an image, and each one gives a different result. In principle, we could have 1,000s of people around the world working by themselves on this very hard problem. But, it would be better to take a combination of the best ideas, and have everyone use that.
While codecs and computer vision seem quite different, they share an important similarity: in the pipeline of computer vision, from pre-processing to feature extraction, each step produces a smaller amount of data. At the end of the analysis you might be left with the data that this is an image of your house, which is just a few bytes. This compression is also precisely what a codec does.
Another similarity is that decoding is much simpler than encoding. Decompressing an image is faster than compressing it, and the encoders can typically get smarter while the decoder doesn’t even realize it. Likewise, we have plenty of software today that can generate a photo-realistic image of a house. The computer is doing the reverse process of what happens in our eyes.
So perhaps it could be that we have 1000s of computer vision people around the world taking an image and extracting the data, but it is some combination is the best. To be fair, this doesn’t tell us how hard problem is. Will it take the best ideas of 3 or 50 people?
To answer that involves look at each piece. Note that there is plenty of good free code for image processing, which is an important piece of computer vision. When it gets to lines and edges, it seems like that is less well decided. But I suspect that there are many ways to do this, but we should just pick some robust way and move on. [More here]
I’ve discovered the best codebase for people who want to work on computer vision is http://stefanv.github.com/scikits.image/index.html. It’s got Python-powered SciPy and DVCS.
So let’s get going.