Scene recognition and object recognition systems could work in tandem, according to a paper presented last weekend at the International Conference on Learning Representations. In fact, the researchers believe that one might reinforce the other.
The findings come after researchers created the most successful scene classification system ever, using the world’s largest machine-learning database of images labelled by scene type and a process known as deep learning. Interestingly, as the system learned how to classify scenes it also learned how to classify objects.
“Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building,” Antonio Torralba of MIT said in a press release. Torralba, an associate professor of computer science and engineering, was a senior author on new paper outlining the findings. “It could be that the representations for scenes are parts of scenes that don’t make any sense, like corners or pieces of objects. But it could be that it’s objects: To know that something is a bedroom, you need to see the bed; to know that something is a conference room, you need to see a table and chairs. That’s what we found, that the network is really finding these objects.”
Deep learning is a modern take on neural networks: a classic artificial intelligence technique that can be used to identify features from training data, but makes no assumptions about what features will look like. The data is assessed in layers, modeled after neurons in the human brain. Each layer has processing units that perform random computations on incoming data, before passing it on to the next layer. This layered structure is what guides neural nets, instead of feature assumptions. The more data a neural network is fed, the more it can adjust its efforts to produce reliable findings. Each unit in a neural network responds differently. For example, one unit might respond to a particular feature, and if that feature is present it will respond strongly. But if that feature is not present, there is a weak response or no response.
The first layers of the neural network classified geometric patterns, while the higher layers selected classes of objects.
This “suggests that even if you have some very limited labels and very limited tasks, if you train a model that is a powerful model on them, it could also be doing less limited things,” Alexei Efros, an associate professor of computer science at the University of California at Berkeley said in a press release. “This kind of emergent behavior is really neat.”