Monday, July 16, 2007

Text Readability and numerical methods of analysis

I've accidentally stumbled upon the concept of Text Readability while I was searching some books on Amazon. They have this feature to show you how easy it is to read by the use of some automated indexing and analysis methods. I've researched a little and I came up with this collection of links:

SMOG (Simple Measure Of Gobbledygook) estimates the years of education needed to understand a text. As input data it uses the number of polysyllables (words with 3 or more syllables) and number of sentences. Note: if your text needs a PhD to read it doesn't mean it's smart, but that it is difficult.

Flesch-Kincaid Readability Tests - the Reading Ease and Grade Level tests. They both use as input values the number of words, sentences and syllables.

Automated Readability Index - also tries to determine the years of US education needed to understand a text. It uses as input values characters/word and words/sentence.

Fry Readability Formula - it is a graphical method of determining the education level needed to understand a text. It computes the number of sentences and syllables over a hundred words and the values are plotted onto a graph.

Gunning fog index - same thing. Uses words/sentence and number of complex words and total words. A complex word is the same thing as a polysyllable, only with a higher readability index :)

Raygor Readability Estimate - looks very similar to the Fry.

Coleman-Liau Index - like the ARI and not the others, it uses characters to compute readability. Uses total number of characters, words and syllables.

Linsear Write - Uses number of simple and complex words and the number of sentences.

Zipf's law - an empirical law (based on observation rather than determined theoretically) it states that the frequency of any word in a natural language text is roughly inversely proportional to it's rank in the frequency table.

But how does that help me?!

Well, there are online tools that do the work for you:
Tests Document Readability And Improve It
Lingua::EN::Fathom Perl CGI
EULA Analyser
Style and Diction
Reproducible Fry Graphs
Readability Studio

This text for example has the following stats:
Gunning Fog index : 12.93
Coleman Liau index : 11.25
Flesh Kincaid Grade level : 11.39
ARI (Automated Readability Index) : 10.21
SMOG : 12.72
Flesch Reading Ease : 44.16

Which means that if you didn't finish high-school, you're pretty much screwed :)