Damian Walker

Personal Web Pages

Tiny BASIC: The Great Refactoring

Sunday, 18th August 2019

With further work on Tiny BASIC I managed to get the parser to recognise simple LET statements in the form:

LET X=5

It doesn't properly understand expressions yet, so as far as the parser is concerned, an expression for now is just a number. Despite this, I did implement the logic that an expression is split into terms, which are split into factors, as fits the original specification. And at this point, I noticed that the parser was getting a bit cumbersome.

By "cumbersome" I mean that it's approaching 500 lines, and barely does a fraction of what it needs to do when complete. The logic for the actual parsing is mixed together with the logic of handling the data structures that make up the parse tree, and it's getting difficult to search-by-sight for a particular function.

I envisage that expressions will be the most complex part of the language, as most of the statements have simple components, or lists, and the occasional noise word. So now's the time to do some refactoring, before I proceed to try to parse complete expressions. Not only will I refactor the parser, but for consistency I'll refactor the tokeniser too.

For the tokeniser, the plan is to separate out all the code that deals with tokens from the act of tokenising. So one module, for the tokens, will act a bit like an OOP class, having a constructor and a destructor, and some set/get functions. Since the token structure isn't private, anything can be set/got from outisde the module, but since C requires careful handling of dynamically-allocated memory, some set/get functions for dynamic parts of a token (like its textual content) benefit at least from helpful set functions. Most of this work I've already done.

The parser will be more complicated. There are multiple data structures: for statements in general, for particular types of statement, for expressions. These will probably all want their own modules. There are also parse errors; I'll be separating these out into an error handling module, which can also be used by the tokeniser and by the runtime module. Eventually, only the actual parsing logic will be left in the parser.

What this means is that I'll be doing quite a lot of work over a period of days, while making no progress on functionality. But it'll probably give me a burst of speed when it's done, as I'll be working with cleaner source code. Until I mess it up again.