I Love Finite State Machines!
Friday, 16th August 2019
I learned about Finite State Machines over twenty years ago and was captivated by them. I've used them in all kinds of projects, from game logic to more serious things. I used them to control the enemies in Ossuary, my simple ZX Spectrum roguelike, and for simple tokenisation in projects like BOOPL and Tiny BASIC.
The idea, if you've not used them before, is that at any time you're in a particular state. Depending on what your next input is, you might stay in the same state, or move to another state. Each state has its own rules about what to do with the input and what other states it might change to.
So in Ossuary, for example, a monster might be idle, or aimless wandering, or chasing the player. In each of these states its input is the player position and other actions; if the monster is idle or wandering, a player attacking or moving too close will cause it to switch to chasing the player; if chasing, a player getting too far ahead may cause the monster to switch to aimless wandering.
In Tiny BASIC I've used a FSM in the tokeniser to switch between states like reading a number, reading a keyword, reading a multi-character operator like <>, and the default state of not quite knowing what's coming next. There are other methods for tokenising, such as holding an array of regular expressions which is gradually reduced until only one matches. But for a simple language like Tiny BASIC the FSM approach is more readable and easier to understand.
I did get a bit carried away and started using a FSM for parsing, too. But this is where the FSM hits its limits. To contrast with simple tokenising, in parsing you need to remember your context. It's not enough to know that you're parsing an expression; to include that expression in its proper place in the parse tree you need to know that the expression is part of, for example, a PRINT or a LET statement. At least that's the impression I'm getting from my attempts to develop my interpreter.
So after writing a partial parser that got as far as interpreting and verifying line numbers, I had to scrap much of what I'd done on the parser and replace it with a hierarchical approach. Now the function that parses a statement will call directly the functions that parse the different types of statement, which in turn will call the functions that parse their consituent parts like the strings and expressions.
The parser is back to being able to identify and validate line numbers. Hopefully soon, it'll start to recognise actual statements and put them in their proper place in the parse tree.