One of the creative methods Google has used for associating keywords with images is their Image Labeler game, which has been in “beta” for some years. As you may be aware, it takes images from their extensive repository of spidered pictures, and assigns one simultaneously to two different people who opt to play the game. Each participant submits keywords describing the image presented to them, attempting to also match keywords submitted by the other participant.
If you’ve reviewed very many websites and webpages, you’ll quickly see that there would be a great many cases where Google might spider some images, yet not have very much data to go on in terms of what the image is all about. Ideally, webmasters add images onto webpages with very clear captions right below them, and also use the ALT parameter in the IMG tag to tell what the image depicts. (ALT Text or “Alternative Text” is a parameter that allows a designer to supply some meta-data with an image — the ALT text describes the image in text, enabling audio browsers to speak the image’s text for blind and vision-impaired web users, and the text can also be used by search engines.) Well-optimized sites might even have their image filenames also reflect descriptive keywords, too. However, it’s frequently the case that a webpage designer neglects to do such things, leaving search engines to try to decipher how to make the images appear for appropriate keywords.
So, Google’s Image Labeler game is one of many methods they’re using to overcome the lack of info they encounter in crawling the web. (They also employ some more sophisticated techniques in combination with this, such as supervised multiclass labeling and optical character recognition (“OCR”).)
It recently struck me that Google could easily make use of the Image Labeler in another way as well — a sort of hidden, “off-label use” of the technology.
Since ALT text is a way of hiding text on webpages, it’s also been misused quite a bit by less-ethical SEO marketers and by people who wisht to take shortcuts in promoting webpages. Search engineers have long warned marketers not to put too much text into the ALT element, or else it could be deemed to be spammy behavior.
So, one way that Google could use Image Labeler would be to sometimes check the honesty of the website designer. Images with suspicious amounts of ALT text could be analyzed via Image Labeler, and if people didn’t submit words that closely related to the ALT text, the image and the site where it’s hosted could recieve a low quality score rating — or even become penalized.
One of the types of images that is frequently prone to keyword-stuffing is a site’s logo graphic. Websites which have suspiciously high amounts of text within an image that has “logo” in the filename could be submitted via Image Labeler, and if the game’s players didn’t validate the keywords associated, it could get the website dinged. Here’s one example of a Dallas plumber’s over-stuffed logo, and its ALT text:
Of course, Google has other ways of detecting keyword-stuffing in IMG ALT text. I’ve heard their image search engineers suggest that ALT text that’s longer than a certain number of words could be considered automatically suspect. Likewise, I’ve seen instances where they appear to be using OCR to automatically convert text embedded in graphics into searchable keyword text.
But, I’m also sometimes seeing some images which have text in them presented in the Image Labeler game, so I think it’s quite possible that they could indeed be employing the “off-label use” I’m describing here.
Pingback: SearchCap: The Day In Search, August 19, 2010