Skip to navigation


Scripts that help me manage my websites

Extracting text for spellchecks, counting words and validating the library

As a writer, I like to make sure my documentation is grammatically correct and free of spelling mistakes. This is difficult enough when using a normal word processor, but when you embed your prose as code comments, where English gets mixed up with assembly code, then there are very few automated tools that are going to help here. I use a text editor to create my sites, and running a text editor's spellchecker on source code is not on the optimal path to happiness.

To help with this issue, I've written a suite of scripts that help me improve the quality of my non-code content. They can be found in the code-analysis folder of the bbcelite-scripts repository, and they break down into four groups:

  • Text extraction, which lets me extract the non-code text from the source repositories and web content, so I can check the spelling and grammar using a word processor
  • Word counts, so I can see just how much I'm writing
  • Code images, which provide a fascinating view into the structure of the game binaries
  • Validation, which can check the underlying repository logic that's used to generate my sites

These scripts use Perl and Python, and they are tailored to my repository structure. If you've managed to follow the installation instructions in the bbcelite-scripts repository, then you can run these scripts locally to see what they do. They're not particularly fancy, but they enable me to do things that would be otherwise impossible.

Here are a few choice highlights from the code image and word count tools, just for fun.

Binary code as an image
-----------------------

The smallest version of Elite is, not surprisingly, the Electron version. To see just how small the Electron version is, we can convert the main game binary into an image, with one byte per pixel, and a greyscale showing each byte's value, with 0 being shown as black, 255 being shown as white, and interim values as greyscale pixels. The result is a 156-pixel square, like this:

The game binary for Acorn Electron Elite as an image

There are two candidates for the largest version. Elite-A has both a standard and second processor version, as well as 23 ship files, and if we put these all together, we get a 405-pixel square:

The game binary for Elite-A as an image

This is probably a bit unfair, as the NES version has the largest single-platform binary with unrepeated code, giving us a 363-pixel square in which you can clearly see the division into ROM banks:

The game binary for NES Elite as an image

As for the other games I've analysed, they are pretty small compared to the larger Elite versions. Aviator is the smallest of all, with a 141-pixel square:

The game binary for Aviator as an image

Then comes Revs with a 185-pixel square (and that's including all five of the Acornsoft track files):

The game binary for Revs as an image

And finally Lander, with its four-byte instructions, has a 199-pixel square:

The game binary for Aviator as an image

You can see more images in the code-images folder in the repository.

Word counts
-----------

Another interesting tool is the word count script. This tool extracts the commentary and deep dive text from the repositories and website, but instead of saving it to a text file so it can be checked, the script counts the words.

Here are some highlights from the script:

  • At the time of writing (summer 2024), the deep dives across all four of my disassembly sites total 373,303 words. In comparison, Dostoyevsky's hefty tome Crime and Punishment is only 211,591 words.
  • The NES Elite source code contains 400,109 words of commentary (this is ignoring the code - it's just the commentary). The Lord of the Rings contains 455,125 words, which is not a great deal bigger.
  • If you add up the unique commentary across my four projects (so that's the commentary from NES Elite, Aviator, Revs and Lander), then it comes to 856,398 words. This is more then the complete set of Shakespeare's plays, which come to just 835,997 words.
  • The total amount of commentary across all my projects is 3,176,752 words. If you add up all the words in The Lord of the Rings, War and Peace, Crime and Punishment and the King James Bible, you only get 2,037,140 words.
  • The total amount of commentary and deep dive content across all my projects is 3,550,055 words. If you add up the four tomes from the last bullet point, and add in the full set of Harry Potter novels, you get 3,121,765 words, which is quite a bit less.

I like to write, I guess.

You can see more results and analysis in the word-count-code-comments and word-count-deep-dives folders in the repository.