But NEIL also makes associations between these things to obtain common sense information that people just seem to know without ever saying - that cars often are found on roads, that buildings tend to be vertical and that ducks look sort of like geese. Based on text references, it might seem that the colour associated with sheep is black, but people - and NEIL - nevertheless know that sheep typically are white.
"Images are the best way to learn visual properties", stated Abhinav Gupta, assistant research professor in Carnegie Mellon's Robotics Institute. "Images also include a lot of common sense information about the world. People learn this by themselves and, with NEIL, we hope that computers will do so as well."
A computer cluster has been running the NEIL programme since late July and already has analyzed three million images, identifying 1,500 types of objects in half a million images and 1,200 types of scenes in hundreds of thousands of images. It has connected the dots to learn 2,500 associations from thousands of instances.
The research team, including Xinlei Chen, a Ph.D. student in CMU's Language Technologies Institute, and Abhinav Shrivastava, a Ph.D. student in robotics, will present its findings on December 4 at the IEEE International Conference on Computer Vision in Sydney, Australia.
One motivation for the NEIL project is to create the world's largest visual structured knowledge base, where objects, scenes, actions, attributes and contextual relationships are labeled and catalogued.
"What we have learned in the last 5-10 years of computer vision research is that the more data you have, the better computer vision becomes", Abhinav Gupta stated.
Some projects, such as ImageNet and Visipedia, have tried to compile this structured data with human assistance. But the scale of the Internet is so vast - Facebook alone holds more than 200 billion images - that the only hope to analyze it all is to teach computers to do it largely by themselves.
Abhinav Shrivastava said NEIL can sometimes make erroneous assumptions that compound mistakes, so people need to be part of the process. A Google Image search, for instance, might convince NEIL that "pink" is just the name of a singer, rather than a colour.
"People don't always know how or what to teach computers", he observed. "But humans are good at telling computers when they are wrong."
People also tell NEIL what categories of objects, scenes, etc., to search and analyse. But sometimes, what NEIL finds can surprise even the researchers. It can be anticipated, for instance, that a search for "apple" might return images of fruit as well as laptop computers. But Abhinav Gupta and his landlubbing team had no idea that a search for F-18 would identify not only images of a fighter jet, but also of F18-class catamarans.
As its search proceeds, NEIL develops subcategories of objects - tricycles can be for kids, for adults and can be motorized, or cars come in a variety of brands and models. And it begins to notice associations - that zebras tend to be found in savannahs, for instance, and that stock trading floors are typically crowded.
NEIL is computationally intensive, the research team noted. The programme runs on two clusters of computers that include 200 processing cores.
This research is supported by the Office of Naval Research and Google Inc.
The public can now view NEIL's findings at the project website at http://www.neil-kb.com .