For a while now, I’ve been covering how Google’s increasing focus upon quality measurements are steadily translating into actual ranking factors. Four years ago, I first conjectured that Usability could supplant SEO. Back then, we could see that Google’s human evaluators added quality ratings into the mix, affecting page rankings. Since then, Google added helpful tools for usability testing and page speed diagnostics. This year they’ve continued this progression by incorporating page speed as a ranking factor and the recent “Mayday Update” apparently shifted some ranking factor weighting from keyword relevancy to quality criteria.
Considering Google’s desire to quantify and assess elements of quality in webpages, what are some other possible things which they might attempt to algorithmically measure and base rankings upon?
One possible area which occurs to me is in testing the text body of pages, particularly that of the main body of articles and blog posts.
Spammers frequently use programs to automatically assemble snippets of text for the purpose of targeting many combinations of keyword phrases without requiring that their pages be all hand-written. Some spammers merely steal others’ content, screen-scraping pages and redisplaying it on their own sites. The more savvy ones know that the search engines seek to detect duplicate content, and try to credit the originating sites as authoritative for matching keyword phrases. Recognizing that purely identical text can get filtered due to duplication detection, these spammers may automatically insert random words throughout the text, resulting in weird sentences and nonsensical writing.
Less dramatically, marketers who desire to rapidly develop out thousands of pages of content sometimes resort to copy writing companies that outsource article assignments to third-world countries. Poorly-educated writers result in terrible grammar and bad spelling. And, foreign companies sometimes hire bad translators to convert their pages for English readers. (Such bad writing can be entertaining — check out Engrish for samples to make you grin.)
Spammy sites and pages with poorly-written articles would definitely be deemed to be low-quality by most consumers. Most of us don’t want to end up on such sites, and Google doesn’t want to lead us there. So, if I were them, I’d try to find ways to detect such poor content.
But, would it be possible for Google to detect bad writing?
I think the answer to that is a resounding “yes”!
Have you noticed how many different software packages offer spellcheck functionality? And, software such as Microsoft Word can assess grammar in documents as well as word/phrase variety and reading level.
So, it would be possible for Google to detect bad writing. I’m not sure how process-intensive this sort of functionality would be for them, but it could easily be incorporated as a subsidiary process that operates after many of the more-rapid ranking systems have already assessed a webpage. A grammar/spelling grading process could trickle-in ratings over time.
Is Google using such a process?
For now, this is unclear. It would be hard to isolate the effects of such a process, since their human quality evaluators might also negatively rate a page for some of the same reasons. A few different search engine patents relating to display of search snippets, assessing reviews, and ranking of news stories mention the possibility of utilyzing grammar and spelling in ranking processes. This sort of text analysis appears so simple that I think it would be a no-brainer to believe that Google would use such a system.
For spammers, I’d say that the bar is moving higher. Ignoring quality for the sake of a fast buck is getting more difficult. The days of merely sprinkling a keyword phrase randomly among paragraphs of words are probably coming to a close.
For marketers, the implications of text quality assessment should be clear. Cheaply-written content ought to be avoided for sustainable, long-term benefit. And, if you wish to improve your rankings, have a professional writer with good English skills look over all of your content, and correct any errors they may find.
You don’t really even need proof that this could be used as a quality ranking factor for it to make sense to clean up the grammar and spelling on your site. Poor grammar and spelling can make a bad impression on consumers, resulting in loss of trust and lower conversion rates. Improving your site’s user experience provides short and longterm benefits.
Interesting theory. I know that they’re definitely doing something in this regard, probably doing more than just “duplicate content” detection.
I guess now we all need to incorporate a grammar checker into our daily routine?
I think at baseline, a spellcheck would be in order. After that, there could be a number of possible text analyzers. Grammar validation would be a good idea, but also it could be good to check for sufficient variety in writing and maybe even checking the reading level of a document, although I don’t know what an ideal reading level would be for internet — likely somewhat variable depending upon industry/content.
I’m all for this if they can do it properly…thinking about how to code something to work properly makes my head hurt though. I wonder if they’re also going to base some factors on the length, and perhaps even more strength for ages of articles indexed.
Ryan, there are some existing patents that Google has which specifically state length of text as important. I’ve seen this mainly in the area of assessing importance, authority and trust of online reviews that people may write about businesses, but it would make sense that an assessment of text length must be incorporated in some way. I think that the relative amount of text on a page, and its arrangement, could algorithmically define whether a document is an article or not.
OK, you mean those word salad keyword dumps that try to appear as content generated by real users? Or are you talking about grads from 7-11 university that did major in an new language known as apu?
Google could try to filter out some of the MFA spam. They would “MAKE FOR BERRY BERRY HAPPY” users.
Thank you for the article. With caffeine live right now, I would think Google would be too busy to determine each posts and human audit billions of pages that go lives each day but there are a good few factors that determined if the content is written for users or not.
Google denies using analytic data as ranking but I have hard time believing them as I suspect they use bounce rates and etc as factors to see if users likes the content or not and if the users don’t like spend much time reading a blog post then that could mean low quality post.
But then again one could argue that post was so good that converted user into customer or it was to the point and easy to read.
Seems like there would be some algo ways to test bad quality writing per your post, but I think there are a lot of signals that could point to a block of text put on a page specifically for keyword spamming. It would be interesting to see how the manual reviewers react to seeing these elements on a page since sometimes it’s unclear if the publisher is truly keyword spamming or is just a poor UI designer.
The spinning software out there is getting pretty damn sophisticated too. I’ve heard of some internet marketers being able to generate spun text that actually sounds real and would pass a human review.
Hi Chris,
I’d bake a cake the day it was proved Google was using writing quality as a ranking factor. From an aesthetic viewpoint, you would think that this could be seen as the MOST important factor of a document. I have always thought that Google used inbound links as a sort of substitute for this, figuring that if people link to a document, it must be good. But, it would be so much more satisfying if there were a direct evaluation of the copy itself. That being said…most spellchekers still don’t know that words like ‘blog’ and ‘Google’ are real, so we have a ways to go with those.
Really nice article!
I agree with Dev Basu some of the text spinning software is coming of age and becoming more sophisticated in crafting meaningful sentences.
There is no way this can be true. If I decided to open up a blog but I’m not an “english scholar” as my grade 10 teacher once said, google will punish me? even though I may be putting in some great content that users love. Give me a break, they will not do this.
There are probably some easily-detectable hallmarks of auto-generated text which differentiate from “non-English scholar” writing.
However, I’d still say that many users find frequent grammar and spelling mistakes irritating.
If, as you say, it’s badly-written, yet good content — Google would probably let other signals inform their algorithm to give it more weight despite writing mistakes. For instance, if the blog achieved lots of links and mentions and such.
I wasn’t at all saying this was the end-all and be-all of Google ranking factors. Yet, it could be one of the hundreds they are employing.