Spell Checkers and Tree Traversal

TLDR:

Spelling sucks. Tree Traversal is used in search algorithms (like spell checkers). If you use VSCode try Code Spell Checker.

My original plan for this post was to discuss how spell checkers work and maybe even show snippets from my new favorite tool, Code Spell Checker for VSCode. As I began researching, reality set in that this topic covers far more than is reasonable for a short blog post and life with a toddler and a full-time software engineering Bootcamp will not allow the weeks or months of time to begin to explore the field. That being said, I have learned a few things that are worth sharing.

Before we get to that, some background. I can’t begin to explain how much I hate spelling. It's not that I don’t appreciate how important it is, I know it would be nearly impossible to communicate without strict adherence to a standard set of rules. As for why I hate spelling, I'm dyslexic, and somewhere between moderately to severely so. I have adapted and handle it pretty well so that I really don’t notice it too much anymore. Sometimes I know I can’t spell a word, but for the most part, I have learned to spell (well close enough) but the process is one of bruit memorization, I write a word enough that I finally learn it. Shortcuts, phonetic spelling, etc. just don’t work for me.

Now, much of the above was a lie. I can only say that I “handle it” because of the inclusion of spell checkers in almost every application. Without a spellchecker, I would still be a disaster. Even with them, I had a boss in all seriousness threaten to make me take spelling classes before. I avoided the topic and him for a week or two and never brought it up again (I know you occasionally read these, but you can’t make me anymore :-P ). Now, a good spell checker is rarely enough to get me through, for example “dyslexic”, I literally have no idea how to spell that. Modern technology comes to the rescue again, a quick “Hey Siri” or “Alexa, how do you spell….” and I'm off to the races again. As an aside, Rudolf Berlin, the man who coined the term dyslexia, was a sadist in my opinion, that word is impossible.

P.S. the engineers at Grammarly. You are my heroes.

Now, my blog is focused on software engineering, so why do I bring this up? Well, as it turns out spelling is EXTREMELY important, more so than anything else I have done. Hmans are surprisigly guud at interperting writen laguage, eevn wen it is eror ridled, so you can get away with the occasional error in a report or email. Computers on the other hand can’t, they do exactly what we tell them to. So, something as simple as navigating to a web page would fall to pieces if you misspelled the URL www.linkeddin.com. More insidious would be misspelling method names or file names in a highly collaborative environment. Even if you are consistent, as soon as someone else tries to expand on your code, it all falls apart because they call .really_cool_method but the program only has the .realy_cool_method you wrote.

Ok, that is why I am mildly obsessed with spell checkers (and will be resisting this topic as I can learn more), but how do they work. Well, in the very highest of levels it's five simple steps.

  1. Read text on-screen
  2. Search the dictionary for matches on each word.
  3. If the word is not in the dictionary, find close matches.
  4. Highlight the offending word and allow the user to select a replacement.
  5. Replace the offending word with the selected.

Wow, simple enough (ha). There is so much to cover here we are only sratch the begninngin and discuss the first two steps.

Step 1. Read text on-screen:

For web applications, this is very straightforward, JavaScript has many ways of grabbing text as the user is inputting them. For example event listeners looking for when you press the space bar.

document.addEventListener(“keydown”, function(e) {
if (e.key === “Spacebar”) {
// Word find and parse function
// Spell check function
}
});

Yes, I'm glossing over the actions of actually grabbing the text, split()ing it into an array, and then running the individual words through your spell checker, but that is not too difficult (and from what I can tell, not how professional applications actually do it anyway).

Step 2. Search the dictionary for a match:

This can be very simple, simply iterate through each word in your source dictionary and compare it.

dictonary = #array of words
searchWord = #word you are checking
spelled_correclty = false
for each item in dictonary
if word == item
spelled_correctly = true
end
end
if !sesplled_crrectly
#Highlight word and run function to find recomendations
end

But unless you have a true beast of a computer, this type of search is going to be painful, to say the least. You can spread this up by implementing different search algorithms (like binary search) but the fastest ways involve both the search method and the search data structure.

Tree traversal search method involves first having a structured data set to look through. The search Tree (or Trie) is a character by character map of the possible combinations that result in words. It also turns out that this structured data format is heavily used in many areas of computer science to help with search algorithms, not just in spell checkers.

Now, these trees can get very complicated in the effort to improve both search efficiency and storage efficiency but they still work off of the same parent child principle.

Needless to say, since data trees can be complicated and are used in many different areas, there are many different ways to search through them, all with the advantages and disadvantages.

At a high level there are two main approaches Depth First and Bredth First. A debt first search progressively moves down each branch before moving to the next. Below is a simple debth first search that checkes each node value (letter in your word dictionary tree) against the character value in the string. If it matches, you grab the next child node, if not you move on to the next character. This is in essence the method that spell checkers use (though deployed applications are far more optimized).

Bredth first is simular in concept, but instead of diving to the bottom of a tree, it checks everything at the same level before going deeper. Its advantage is completeness where every branch is checked before moving on. This has applications in garbage collection (memory management), path finding and many other areas.

So, as I mentioned above, this is a very brief introduction to spell checkers and tree traversal. Later we will go over how we determine what we do when a word is not in the dictonary. In the meantime take a moment to recognize the ammazing work that has gone into creating these tools.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store