The Language System

Building the Lexer From Scratch

The lexer's job is simple: read characters and group them into meaningful chunks called tokens. But let's actually build it step by step.

What It Does

You give it text, it gives you tokens. That's it.

For example, shape circle c1 { radius: 50 } becomes:

SHAPE token (the keyword "shape")
IDENTIFIER token with value "circle"
IDENTIFIER token with value "c1"
LBRACE token (the "{")
IDENTIFIER token with value "radius"
COLON token
NUMBER token with value 50
RBRACE token (the "}")

Building the Basic Structure From Scratch

To build a lexer from scratch, you need to track where you are in the input string and what character you're currently looking at. Here's how to set it up step by step:

Step 1: Create the Lexer Class Start with a class that holds the input and tracks position:

export class Lexer {
    constructor(input) {
        // Step 1.1: Store the input string
        // This is the source code we're going to tokenize.
        // We store it as a property so all methods can access it.
        this.input = input;           // The source code string

        // Step 1.2: Track position in the string
        // position is the index into the input string (0-based).
        // We start at 0 because we haven't read anything yet.
        this.position = 0;            // Current character position

        // Step 1.3: Track line number for error messages
        // When the lexer encounters an error, we need to tell the user
        // where it happened. Line numbers start at 1 (not 0) because that's
        // what users expect ("error on line 5" not "error on line 4").
        this.line = 1;                // Current line number

        // Step 1.4: Track column number for error messages
        // Column tells us which character on the line has the error.
        // Like line, it starts at 1 (first character is column 1).
        this.column = 1;              // Current column number

        // Step 1.5: Store the current character
        // Instead of always accessing this.input[this.position], we store
        // the current character in a property. This makes the code cleaner
        // and faster (one property access instead of array access).
        // If input is empty, this.input[0] is undefined, so we use || null
        // to convert undefined to null (which we use to mean "end of input").
        this.currentChar = this.input[0] || null;  // The character we're looking at
    }
}

Why Track Line/Column? When parsing fails, you need to tell the user "error on line 5, column 12". Without this, errors are useless. Users can't fix errors if they don't know where they are. The line/column tracking is essential for good error messages.

How It Works:

input is the source code string (e.g., "shape circle c1 { radius: 50 }")
position is the index into that string (0 = first character, 1 = second character, etc.)
line and column track position for error messages
currentChar is a convenience - instead of writing this.input[this.position] everywhere, we store it

Building This Step by Step:

Create a new file lexer.mjs
Export a class called Lexer
Add a constructor that takes input as a parameter
Store input as this.input
Initialize position to 0
Initialize line to 1
Initialize column to 1
Set currentChar to the first character (or null if input is empty)

Building the Main Tokenization Loop From Scratch

The core method is getNextToken(). This is the heart of the lexer - it reads characters and returns tokens. Here's how to build it step by step:

Step 1: Create the Main Loop Structure The loop continues until we reach the end of input (when currentChar is null):

getNextToken() {
    // Step 1.1: Loop while there are characters to read
    // currentChar is null when we've reached the end of the input string.
    // We keep looping until we've processed everything.
    while (this.currentChar !== null) {
        // Inside the loop, we check what kind of character we're looking at
        // and handle it appropriately. Each check is in order of likelihood
        // (most common first) for performance.

        // Step 1.2: Skip whitespace (spaces, tabs, newlines)
        // Whitespace doesn't create tokens - it just separates other tokens.
        // We check for whitespace first because it's very common.
        // /\s/ is a regex that matches any whitespace character.
        if (/\s/.test(this.currentChar)) {
            this.skipWhitespace();  // Skip all consecutive whitespace
            continue;  // Go to next iteration (don't create a token)
            // continue skips the rest of the loop body and starts the next
            // iteration. This means we don't try to create a token from whitespace.
        }

        // Step 1.3: Skip comments (lines starting with //)
        // Comments also don't create tokens. We check if current character
        // is '/' and the next character (peek) is also '/'.
        // peek() looks at the next character without consuming it.
        if (this.currentChar === '/' && this.peek() === '/') {
            this.skipComment();  // Skip the entire comment line
            continue;  // Go to next iteration (don't create a token)
        }

        // Step 1.4: Handle numbers (digits 0-9)
        // If we see a digit, we know we're starting a number token.
        // /\d/ matches any digit (0-9).
        if (/\d/.test(this.currentChar)) {
            return this.number();  // Read the number and return a NUMBER token
            // We return immediately because number() handles reading all digits
            // and returns a complete token.
        }

        // Step 1.5: Handle identifiers and keywords (letters and underscores)
        // Identifiers are names like "circle", "c1", "radius".
        // Keywords are special identifiers like "shape", "param".
        // We check for letters or underscore first character.
        // /[a-zA-Z_]/ matches any letter (upper or lower) or underscore.
        if (/[a-zA-Z_]/.test(this.currentChar)) {
            return this.identifier();  // Read the identifier and return a token
            // identifier() will check if it's a keyword or regular identifier
            // and return the appropriate token type.
        }

        // Step 1.6: Handle strings (text in quotes)
        // Strings start with a double quote character.
        if (this.currentChar === '"') {
            return this.parseString();  // Read the string and return a STRING token
        }

        // Step 1.7: Handle hex colors (like #FF0000)
        // Hex colors start with a '#' character.
        if (this.currentChar === '#') {
            return this.parseHexColor();  // Read the hex color and return a COLOR token
        }

        // Step 1.8: Handle operators and punctuation
        // This includes things like '{', '}', ':', ',', etc.
        // We'll implement this next, but for now we'll handle the common ones.
        // ... (we'll get to this in detail)
    }

    // Step 1.9: Return EOF token when input is exhausted
    // When the loop exits (currentChar is null), we've read everything.
    // Return an EOF (End Of File) token to signal we're done.
    return new Token('EOF', null, this.line, this.column);
    // EOF token has no value (null) but has line/column for consistency.
}

The Pattern Explained: The lexer follows a simple pattern:

Check what kind of character we're looking at
Call the appropriate method to read that token type
Return the token immediately
Skip whitespace/comments without creating tokens (use continue)

Why This Order Matters:

Whitespace is checked first because it's most common
Comments are checked early because they're also common
Numbers, identifiers, strings are checked in order of likelihood
Operators come last because they're single characters (fast to check)

Building This Step by Step:

Create the getNextToken() method in your Lexer class
Add the while loop that continues while currentChar !== null
Add whitespace check first (most common case)
Add comment check second
Add number check (if digit, call number())
Add identifier check (if letter/underscore, call identifier())
Add string check (if quote, call parseString())
Add hex color check (if '#', call parseHexColor())
Add EOF return at the end (when loop exits)
Implement each helper method (skipWhitespace, number, identifier, etc.) one by one

Building Helper Methods From Scratch

You need several helper methods to make the lexer work. Here's how to build each one step by step:

Step 1: Build the advance() Method This is the most important helper - it moves forward one character and updates all tracking:

advance() {
    // Step 1.1: Move position forward by one
    // This consumes the current character and moves to the next one.
    this.position++;

    // Step 1.2: Check if we've reached the end of input
    // If position is >= input.length, we've read all characters.
    if (this.position >= this.input.length) {
        this.currentChar = null;  // End of input - set to null to signal EOF
        // We don't update column here because we're at EOF.
    } else {
        // Step 1.3: We're not at EOF, so get the next character
        // Read the character at the new position.
        this.currentChar = this.input[this.position];

        // Step 1.4: Update column number
        // Column tracks horizontal position on the current line.
        // We increment it because we moved one character to the right.
        this.column++;
    }
}

Why advance() is Critical: Every time you consume a character (read it and process it), you must call advance() to move forward. Without it, you'd be stuck reading the same character forever. The method also handles end-of-input detection by setting currentChar to null.

Step 2: Build the peek() Method This looks ahead without consuming the character:

peek() {
    // Step 2.1: Check if there's a next character
    // We look at position + 1 (the next character) without moving.
    // If position + 1 is >= input.length, there's no next character.
    if (this.position + 1 >= this.input.length) {
        return null;  // No next character - return null
    }

    // Step 2.2: Return the next character without consuming it
    // We read input[position + 1] but don't call advance().
    // This lets us "look ahead" to decide what to do next.
    return this.input[this.position + 1];
}

Why peek() is Useful: Sometimes you need to check the next character before deciding what to do. For example, to distinguish = from ==, you peek at the next character. If it's also =, you have ==. If not, you have just =. This is called "lookahead" in parsing.

Step 3: Build the skipWhitespace() Method This skips all consecutive whitespace characters:

skipWhitespace() {
    // Step 3.1: Loop while current character is whitespace
    // /\s/ matches any whitespace: space, tab, newline, etc.
    // We continue until we hit a non-whitespace character.
    while (this.currentChar && /\s/.test(this.currentChar)) {
        // Step 3.2: Handle newlines specially
        // Newlines change both line and column.
        if (this.currentChar === '\n') {
            this.line++;      // Move to next line
            this.column = 1;  // Reset column to 1 (start of new line)
        }

        // Step 3.3: Move forward one character
        // advance() updates position, currentChar, and column.
        // For newlines, we already updated line and reset column above.
        this.advance();
    }
    // When loop exits, currentChar is either null (EOF) or a non-whitespace character.
}

Why Skip Whitespace: Whitespace doesn't create tokens - it just separates them. shape circle c1 has whitespace between tokens, but we don't want whitespace tokens. We skip all whitespace and continue to the next meaningful character.

Step 4: Build the skipComment() Method This skips single-line comments (// style):

skipComment() {
    // Step 4.1: Skip the first '/' character
    // We already know currentChar is '/' (checked in getNextToken).
    // We need to consume it.
    this.advance();  // Skip first /

    // Step 4.2: Skip the second '/' character
    // peek() already confirmed the next char is '/'.
    // Now we consume it.
    this.advance();  // Skip second /

    // Step 4.3: Skip everything until newline or EOF
    // Comments run until the end of the line.
    // We loop until we hit '\n' (newline) or null (EOF).
    while (this.currentChar !== null && this.currentChar !== '\n') {
        this.advance();  // Skip each character in the comment
    }
    // When loop exits, we're at the newline (or EOF).
    // The newline will be handled by skipWhitespace() if called next.
}

Why Comments Need Special Handling: Comments are like whitespace - they don't create tokens. But they're more complex because they can span multiple characters. We need to skip everything until the end of the line.

Building These Methods Step by Step:

Start with advance() - it's the foundation
Add peek() - needed for lookahead
Add skipWhitespace() - needed to skip spaces
Add skipComment() - needed to skip comments

Each method builds on the previous ones. advance() is used by all the others.

Building the Identifier and Keyword Reader From Scratch

When you see a letter or underscore, you need to read until you hit something that can't be part of an identifier. This method handles both regular identifiers (like variable names) and keywords (like "shape", "param").

How to Build It Step by Step:

Step 1: Create the Method Structure Start with an empty method that will accumulate characters:

identifier() {
    let result = '';
    // We'll build the identifier string character by character
    // result starts empty and we'll append characters to it
}

Step 2: Read Valid Identifier Characters Identifiers can contain letters (a-z, A-Z), digits (0-9), and underscores (_). Keep reading while the current character matches this pattern:

identifier() {
    let result = '';

    // Keep reading while it's a valid identifier character
    // /[a-zA-Z0-9_]/ matches: letters (upper or lower), digits, or underscore
    // The loop continues until we hit a character that's not valid for identifiers
    // (like a space, operator, punctuation, etc.)
    while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
        result += this.currentChar;  // Append current character to result
        this.advance();               // Move to next character
    }
    // When loop exits, we've read the complete identifier
    // currentChar is now something that can't be part of an identifier
}

Why This Pattern Works:

We start with an empty string
Each iteration adds one character and moves forward
The loop stops when we hit an invalid character (space, operator, etc.)
At the end, result contains the complete identifier

Step 3: Check if It's a Keyword After reading the identifier, check if it matches a keyword. Keywords are special identifiers that have meaning in the language:

identifier() {
    let result = '';

    // Read the identifier (steps 1-2 above)
    while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Check if it's a keyword
    // Keywords are special identifiers that have language meaning
    const keywords = {
        'shape': 'SHAPE',    // Keyword for creating shapes
        'param': 'PARAM',    // Keyword for defining parameters
        'if': 'IF',          // Keyword for conditional statements
        'for': 'FOR',        // Keyword for loops
        // ... add all keywords your language supports
    };

    // Determine token type
    // If it's a keyword, return the keyword token type
    // Otherwise, it's a regular identifier
    const lowerResult = result.toLowerCase();  // Convert to lowercase for comparison
    const tokenType = keywords[lowerResult] || 'IDENTIFIER';
    // If lowerResult is in keywords map, use that token type
    // Otherwise, default to 'IDENTIFIER'

    return new Token(tokenType, result, this.line, this.column);
}

Why Case-Insensitive for Keywords: Users type SHAPE, Shape, shape - they should all work. The language should be forgiving about case for keywords. But regular identifiers like myShape vs myshape are different (case-sensitive). This gives flexibility: keywords work in any case, but variable names are case-sensitive.

The Complete Method:

identifier() {
    let result = '';

    // Keep reading while it's a valid identifier character
    while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Check if it's a keyword
    const keywords = {
        'shape': 'SHAPE',
        'param': 'PARAM',
        'if': 'IF',
        'for': 'FOR',
        // ... all keywords
    };

    // Case-insensitive check
    const lowerResult = result.toLowerCase();
    const tokenType = keywords[lowerResult] || 'IDENTIFIER';

    return new Token(tokenType, result, this.line, this.column);
}

Why Read First, Then Check: This approach is simpler than trying to match keywords as you go. By reading the whole identifier first, you can then do a simple dictionary lookup. If you tried to match keywords character-by-character, you'd need complex state machines and backtracking. This way is cleaner and easier to extend with new keywords.

Building This Method:

Create identifier() method
Add empty result string
Add while loop that reads valid identifier characters
Add keyword dictionary
Add case-insensitive keyword lookup
Return token with appropriate type

Building the Number Reader From Scratch

Numbers can be integers (50) or decimals (3.14). You need to read all digits, handle decimal points, and convert the string to an actual number.

How to Build It Step by Step:

Step 1: Create the Method and Initialize Start with an empty string to accumulate digits:

number() {
    let result = '';
    // We'll build the number string digit by digit
    // Then convert it to an actual number at the end
}

Step 2: Read Integer Part (Digits Before Decimal Point) Read all consecutive digits. This handles both integers and the integer part of decimals:

number() {
    let result = '';

    // Read digits
    // /\d/ matches any digit (0-9)
    // Keep reading while we see digits
    while (this.currentChar && /\d/.test(this.currentChar)) {
        result += this.currentChar;  // Append digit to result
        this.advance();              // Move to next character
    }
    // When loop exits, we've read all consecutive digits
    // currentChar is now either a decimal point, or something else
}

Why Read Digits First: This approach handles both integers (50) and decimals (3.14) with the same initial logic. The first loop reads the whole number part (before decimal), then we check if there's more.

Step 3: Check for Decimal Point After reading digits, check if there's a decimal point. If yes, read the fractional part:

number() {
    let result = '';

    // Read integer part (step 2)
    while (this.currentChar && /\d/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Check for decimal point
    // If currentChar is '.', we have a decimal number
    if (this.currentChar === '.') {
        result += '.';      // Add decimal point to result
        this.advance();     // Move past the decimal point

        // Read fractional digits (after decimal point)
        // Same pattern as integer part
        while (this.currentChar && /\d/.test(this.currentChar)) {
            result += this.currentChar;
            this.advance();
        }
        // Now we have the complete decimal number in result
    }
    // If no decimal point, result already contains the integer
}

Why This Order Works:

Read integer digits first (handles 50 and the 3 in 3.14)
Then check for decimal point
If decimal point exists, read fractional digits (the 14 in 3.14)
This handles both cases with the same code structure

Step 4: Convert String to Number After reading the number string, convert it to an actual JavaScript number:

number() {
    let result = '';

    // Read integer part
    while (this.currentChar && /\d/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Check for decimal point and read fractional part
    if (this.currentChar === '.') {
        result += '.';
        this.advance();
        while (this.currentChar && /\d/.test(this.currentChar)) {
            result += this.currentChar;
            this.advance();
        }
    }

    // Convert to actual number
    // parseFloat() converts string to number
    // '50' becomes 50, '3.14' becomes 3.14
    const numValue = parseFloat(result);

    // Return token with number value (not string)
    return new Token('NUMBER', numValue, this.line, this.column);
}

Why Convert to Number: The token value must be a number (not a string) so the parser and interpreter can do math with it. parseFloat('50') returns the number 50, not the string '50'. This is essential for arithmetic operations later.

The Complete Method:

number() {
    let result = '';

    // Read digits
    while (this.currentChar && /\d/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Check for decimal point
    if (this.currentChar === '.') {
        result += '.';
        this.advance();

        // Read more digits after decimal
        while (this.currentChar && /\d/.test(this.currentChar)) {
            result += this.currentChar;
            this.advance();
        }
    }

    // Convert to actual number
    const numValue = parseFloat(result);
    return new Token('NUMBER', numValue, this.line, this.column);
}

Why This Pattern Works:

We stop reading digits when we hit a non-digit character (space, operator, etc.)
This naturally handles both integers and decimals
parseFloat() handles the conversion automatically
The token value is a number, ready for arithmetic operations

Building This Method:

Create number() method with empty result string
Add while loop to read integer digits
Add check for decimal point
If decimal point exists, add it to result and read fractional digits
Convert result string to number using parseFloat()
Return NUMBER token with numeric value

Why not handle negative numbers here: If we tried to parse -50 as one token, we'd need to backtrack when we see 50 - 20 (is it minus or subtraction?). Instead, -50 becomes two tokens: MINUS followed by NUMBER(50). The parser then treats the MINUS as a unary operator when it appears before a number. This separation keeps the lexer simple - it just tokenizes, it doesn't understand operator precedence.

Building the String Parser From Scratch

Strings are text between double quotes and can contain escape sequences like \n for newline. You need to read everything between the quotes and handle escape sequences properly.

How to Build It Step by Step:

Step 1: Skip the Opening Quote When you see a ", you know a string is starting. Skip past it:

parseString() {
    let result = '';
    this.advance();  // Skip opening quote
    // We've already seen the opening quote in getNextToken()
    // Now we need to read everything until the closing quote
}

Step 2: Read Characters Until Closing Quote Loop through characters, stopping when you hit the closing quote or end of file:

parseString() {
    let result = '';
    this.advance();  // Skip opening quote

    // Read characters until closing quote or EOF
    // Loop continues while currentChar is not null (EOF) and not '"' (closing quote)
    while (this.currentChar !== null && this.currentChar !== '"') {
        // We'll handle the character here
        result += this.currentChar;
        this.advance();
    }
    // When loop exits, we've either hit the closing quote or EOF
}

Step 3: Handle Escape Sequences When you see a backslash, the next character is special. Handle escape sequences:

parseString() {
    let result = '';
    this.advance();  // Skip opening quote

    while (this.currentChar !== null && this.currentChar !== '"') {
        // Check if this is an escape sequence
        if (this.currentChar === '\\') {
            this.advance();  // Skip the backslash

            // Handle escape sequences
            // The character after backslash tells us what to do
            if (this.currentChar === 'n') {
                result += '\n';  // Newline character
            } else if (this.currentChar === 't') {
                result += '\t';  // Tab character
            } else if (this.currentChar === '"') {
                result += '"';   // Escaped quote (literal quote in string)
            } else if (this.currentChar === '\\') {
                result += '\\';  // Escaped backslash (literal backslash)
            } else {
                // Unknown escape sequence - just use the character as-is
                result += this.currentChar;
            }
            this.advance();  // Move past the escape sequence character
        } else {
            // Normal character - just add it
            result += this.currentChar;
            this.advance();
        }
    }
}

Why Escape Sequences Matter:

\n becomes a newline character (ASCII 10) - allows multi-line strings
\t becomes a tab character (ASCII 9) - allows indentation in strings
\" becomes a literal quote - allows quotes inside strings
\\ becomes a literal backslash - allows backslashes in strings

Step 4: Validate Closing Quote After the loop, check that we actually found a closing quote:

parseString() {
    let result = '';
    this.advance();  // Skip opening quote

    // Read characters and handle escapes (steps 2-3)
    while (this.currentChar !== null && this.currentChar !== '"') {
        if (this.currentChar === '\\') {
            this.advance();
            // Handle escape sequences...
            if (this.currentChar === 'n') {
                result += '\n';
            } // ... etc
            this.advance();
        } else {
            result += this.currentChar;
            this.advance();
        }
    }

    // Validate we found closing quote
    if (this.currentChar === '"') {
        this.advance();  // Skip closing quote
    } else {
        // We hit EOF before finding closing quote - error!
        this.error('Unterminated string literal');
    }

    return new Token('STRING', result, this.line, this.column);
}

Why Check for Closing Quote: The loop exits when currentChar is null (EOF) or '"' (closing quote). If it's null, the string was never closed - that's an error. If it's '"', we successfully found the closing quote and can continue.

The Complete Method:

parseString() {
    let result = '';
    this.advance();  // Skip opening quote

    while (this.currentChar !== null && this.currentChar !== '"') {
        if (this.currentChar === '\\') {
            // Escape sequence
            this.advance();  // Skip the backslash
            if (this.currentChar === 'n') {
                result += '\n';
            } else if (this.currentChar === 't') {
                result += '\t';
            } else if (this.currentChar === '"') {
                result += '"';  // Escaped quote
            } else if (this.currentChar === '\\') {
                result += '\\';  // Escaped backslash
            } else {
                result += this.currentChar;  // Unknown escape, just use the char
            }
            this.advance();
        } else {
            result += this.currentChar;
            this.advance();
        }
    }

    if (this.currentChar === '"') {
        this.advance();  // Skip closing quote
    } else {
        this.error('Unterminated string literal');
    }

    return new Token('STRING', result, this.line, this.column);
}

How Escape Sequences Work: When we see a backslash, we know the next character is special. We skip the backslash, check what follows, and convert it to the actual character. The backslash acts as an escape character - it tells the parser "the next character has special meaning, don't treat it literally." The loop condition checks for both null (end of file) and '"' (closing quote) - if we hit null before a quote, the string is unterminated and we error. The final check ensures we actually consumed the closing quote - if we didn't, the loop ended because we hit the end of file, which means the string was never closed.

Building This Method:

Create parseString() method
Skip opening quote with advance()
Add while loop that continues until closing quote or EOF
Inside loop, check for backslash (escape sequence)
If backslash, handle escape sequences (\n, \t, \", \\)
If normal character, add it to result
After loop, validate closing quote exists
Return STRING token with the parsed string value

Building the Hex Color Parser From Scratch

Hex colors start with # and can be 3, 4, 6, or 8 hex digits. You need to read the hex digits and validate the format.

How to Build It Step by Step:

Step 1: Start with the Hash Symbol Hex colors always start with #. We've already seen it in getNextToken(), so skip past it:

parseHexColor() {
    let result = '#';
    this.advance();  // Skip the #
    // We start result with '#' because hex colors include it
    // Now we need to read the hex digits
}

Step 2: Read Hex Digits Read all consecutive hexadecimal digits (0-9, a-f, A-F):

parseHexColor() {
    let result = '#';
    this.advance();  // Skip the #

    // Read hex digits
    // /[0-9a-fA-F]/ matches any hexadecimal digit
    // Keep reading while we see valid hex characters
    while (this.currentChar && /[0-9a-fA-F]/.test(this.currentChar)) {
        result += this.currentChar;  // Append hex digit to result
        this.advance();              // Move to next character
    }
    // When loop exits, we've read all hex digits
    // currentChar is now something that's not a hex digit
}

Why Hex Digits: Hexadecimal uses base-16, so digits are 0-9 and letters A-F (or a-f). This allows values from 0-15 per digit, which is perfect for color values (0-255 in decimal = 00-FF in hex).

Step 3: Validate the Length After reading digits, check that the length is valid. Hex colors must be 3, 4, 6, or 8 digits (not counting the #):

parseHexColor() {
    let result = '#';
    this.advance();  // Skip the #

    // Read hex digits (step 2)
    while (this.currentChar && /[0-9a-fA-F]/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Validate length
    // result includes the '#', so we subtract 1 to get digit count
    const hexLength = result.length - 1;  // Minus the #

    // Valid lengths: 3, 4, 6, or 8 digits
    if (hexLength === 3 || hexLength === 4 || hexLength === 6 || hexLength === 8) {
        return new Token('HEXCOLOR', result, this.line, this.column);
    } else {
        // Invalid length - error!
        this.error(`Invalid hex color format: ${result}`);
    }
}

Why Validate Length:

#FF (2 digits) is invalid - too short
#FFF (3 digits) is valid - RGB shorthand, gets expanded to #FFFFFF
#FFFF (4 digits) is valid - RGBA shorthand with alpha
#FFFFFF (6 digits) is valid - full RGB
#FFFFFFFF (8 digits) is valid - full RGBA with alpha
Other lengths are invalid

The Complete Method:

parseHexColor() {
    let result = '#';
    this.advance();  // Skip the #

    // Read hex digits
    while (this.currentChar && /[0-9a-fA-F]/.test(this.currentChar)) {
        result += this.currentChar;
        this.advance();
    }

    // Validate length
    const hexLength = result.length - 1;  // Minus the #
    if (hexLength === 3 || hexLength === 4 || hexLength === 6 || hexLength === 8) {
        return new Token('HEXCOLOR', result, this.line, this.column);
    } else {
        this.error(`Invalid hex color format: ${result}`);
    }
}

Why Validate: #FF is invalid (only 2 digits). #FFF is valid (3 digits, gets expanded to #FFFFFF). #FFFF is valid (4 digits with alpha). #FFFFFF is valid (6 digits). #FFFFFFFF is valid (8 digits with alpha). Validation ensures we only accept properly formatted hex colors.

Building This Method:

Create parseHexColor() method
Start result with '#' and skip past it
Add while loop to read hex digits (0-9, a-f, A-F)
Calculate hex length (result.length - 1, excluding the #)
Validate length is 3, 4, 6, or 8
Return HEXCOLOR token or error if invalid

Handling Operators and Punctuation

Single characters are straightforward:

// In getNextToken(), after checking for identifiers, numbers, etc.

switch (this.currentChar) {
    case '{':
        this.advance();
        return new Token('LBRACE', '{', this.line, this.column);
    case '}':
        this.advance();
        return new Token('RBRACE', '}', this.line, this.column);
    case ':':
        this.advance();
        return new Token('COLON', ':', this.line, this.column);
    case ',':
        this.advance();
        return new Token('COMMA', ',', this.line, this.column);
    // ... etc
}

Multi-character operators need special handling:

// Check for == before =
if (this.currentChar === '=' && this.peek() === '=') {
    this.advance();  // Skip first =
    this.advance();  // Skip second =
    return new Token('EQUALS', '==', this.line, this.column - 1);
}

if (this.currentChar === '=') {
    this.advance();
    return new Token('ASSIGN', '=', this.line, this.column);
}

Order matters: Check == before =, otherwise == becomes two ASSIGN tokens.

Error Handling

When something goes wrong, throw an error with position info:

error(message) {
    throw new Error(`Lexer error at line ${this.line}, col ${this.column}: ${message}`);
}

Why include position? "Error: Unexpected character" is useless. "Error at line 5, col 12: Unexpected character '&'" is helpful.

This should output an array of tokens. If it doesn't, you've got a bug.

Common Issues

Tokenizing stops early:

Check that you're calling advance() after reading each character
Make sure you're not skipping valid characters

Keywords not recognized:

Check the keywords object has the right case
Verify the case-insensitive comparison works

Numbers parsed wrong:

Make sure you're using parseFloat(), not keeping as string
Check decimal point handling

Strings break:

Verify escape sequence handling
Check that you're consuming the closing quote

The lexer is the simplest part. Get this right, and the parser becomes much easier.

The Token Class

Every token has:

type - What kind of token (IDENTIFIER, NUMBER, SHAPE, etc.)
value - The actual content ("circle", 50, etc.)
line and column - Where it came from (for error messages)

The line/column tracking is crucial. When something breaks, you need to tell the user where.

Keywords

All the reserved words are in a big object in identifier(). When you see a letter, the lexer reads until it can't anymore, then checks that object. If it finds a match, it's a keyword token. Otherwise, it's an IDENTIFIER.

Gotcha: The keyword check is case-insensitive (result.toLowerCase()), but the actual token value keeps the original casing. This matters for some edge cases.

Numbers

Numbers are straightforward - read digits, maybe a decimal point, maybe more digits. The lexer converts the string to an actual number using parseFloat().

Gotcha: Negative numbers aren't handled in the lexer. -50 becomes two tokens: MINUS and NUMBER(50). The parser handles the negation. This is actually fine - it keeps the lexer simple.

Strings and Colors

Strings are between double quotes. The lexer handles escape sequences (\n, \", etc.).

Hex colors start with # and can be 3, 4, 6, or 8 hex digits (the 4 and 8 include alpha). The lexer validates the length - if it's wrong, it throws an error.

There's also a COLORNAME token type for named colors like "red", "blue", "gray" (or "grey" - we support both spellings because people are inconsistent).

Adding New Keywords

Want to add a new keyword? Three steps:

Add it to the keywords object in identifier():

const keywords = {
 // ... existing stuff
 'mykeyword': 'MYKEYWORD',
};

The parser needs to handle MYKEYWORD tokens (see parser section)
The interpreter needs to do something with it (see interpreter section)

That's it. The lexer part is the easiest.

Building the Parser From Scratch

The parser takes tokens and builds an Abstract Syntax Tree (AST). This is where we figure out what the code actually means. Let's build it step by step.

What the Parser Does

The parser converts a flat list of tokens into a tree structure. For example, shape circle c1 { radius: 50 } becomes:

{
    type: 'shape',
    shapeType: 'circle',
    name: 'c1',
    params: {
        radius: { type: 'number', value: 50 }
    }
}

The interpreter doesn't care about the original text - it only works with AST nodes. The AST represents the structure of the program.

Building the Basic Parser Structure From Scratch

The parser takes tokens from the lexer and builds an Abstract Syntax Tree (AST). You need a class that holds the lexer and tracks the current token.

How to Build It Step by Step:

Step 1: Create the Parser Class Start with a class that holds the lexer and initializes the first token:

export class Parser {
    constructor(lexer) {
        // Step 1.1: Store the lexer reference
        // The parser needs the lexer to get tokens
        this.lexer = lexer;

        // Step 1.2: Get the first token (look ahead one token)
        // We need to start with one token already loaded
        // This is called "lookahead" - we always have one token ahead
        // This is LL(1) parsing - we only need to look at one token to decide what to do next
        this.currentToken = this.lexer.getNextToken();
    }
}

Why Load First Token: We need to know what token we're currently looking at before we can parse. By loading the first token in the constructor, we're ready to parse immediately. This is a common pattern in recursive descent parsers. LL(1) means "Left-to-right, Leftmost derivation, 1 token lookahead" - we only need to peek at one token to decide what to parse next.

Step 2: Build the eat() Method This is the core method - it consumes a token if it matches what you expect:

eat(tokenType) {
    // Step 2.1: Check if current token matches expected type
    // If it matches, we can consume it
    if (this.currentToken.type === tokenType) {
        // Step 2.2: Save the token (in case caller needs it)
        const token = this.currentToken;

        // Step 2.3: Consume the token and get the next one
        // Move forward by getting the next token from the lexer
        this.currentToken = this.lexer.getNextToken();  // Move to next token

        // Step 2.4: Return the consumed token
        // Some parsing methods need the token value
        return token;
    } else {
        // Step 2.5: Token doesn't match - error!
        // This is a syntax error - the code doesn't match the grammar
        this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
    }
}

Why eat() is Essential: This is the core of the parser. It checks if the current token matches what we expect, consumes it, and moves forward. If it doesn't match, we throw an error. This is called "consuming" a token. This pattern makes parsing code cleaner - instead of checking and advancing everywhere, you just call eat('SHAPE') and it handles both the check and the advancement.

Step 3: Build the Error Method When parsing fails, you need to report an error with position information:

error(message) {
    // Throw an error with position information
    // Include line and column from the current token for better error messages
    throw new Error(`Parser error at line ${this.currentToken.line}, col ${this.currentToken.column}: ${message}`);
}

Why Include Position: Error messages are much more helpful when they include where the error occurred. Users can quickly find and fix the problem.

The Complete Basic Structure:

export class Parser {
    constructor(lexer) {
        this.lexer = lexer;
        this.currentToken = this.lexer.getNextToken();  // Look ahead one token
    }

    error(message) {
        throw new Error(`Parser error at line ${this.currentToken.line}, col ${this.currentToken.column}: ${message}`);
    }

    eat(tokenType) {
        if (this.currentToken.type === tokenType) {
            const token = this.currentToken;
            this.currentToken = this.lexer.getNextToken();  // Move to next token
            return token;
        } else {
            this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
        }
    }
}

How It Works:

Constructor stores lexer and loads first token (lookahead)
eat() consumes expected tokens and errors on unexpected ones
error() throws parsing errors with position information
This foundation supports all parsing methods

Why Look Ahead: We always have currentToken set to the next token we're about to process. This is LL(1) parsing - we only need to look at one token to decide what to do next. This makes the parser simple and efficient.

Building This Step by Step:

Create Parser class with constructor
Store lexer reference in constructor
Load first token in constructor (this.currentToken = this.lexer.getNextToken())
Add eat() method that checks token type, saves token, advances, and returns token
Add error() method for error reporting with position information
This structure is the foundation for all parsing methods

The Main Parse Method

The entry point parses a whole program:

parse() {
    const statements = [];

    while (this.currentToken.type !== 'EOF') {
        statements.push(this.parseStatement());
    }

    return statements;  // Array of AST nodes
}

A program is just a list of statements. We parse each one until we hit End-Of-File.

Parsing Statements

The parseStatement() method dispatches based on the current token:

parseStatement() {
    switch (this.currentToken.type) {
        case 'PARAM':
            return this.parseParam();
        case 'SHAPE':
            return this.parseShape();
        case 'UNION':
        case 'DIFFERENCE':
        case 'INTERSECTION':
            return this.parseBooleanOperation();
        case 'IF':
            return this.parseIfStatement();
        case 'FOR':
            return this.parseForLoop();
        case 'DEF':
            return this.parseFunctionDefinition();
        default:
            this.error(`Unexpected token: ${this.currentToken.type}`);
    }
}

How it works: Look at the current token type, call the appropriate parsing method. Each method knows how to parse its specific construct.

Gotcha: The order in the switch doesn't matter, but make sure you're checking for keywords before falling through to generic cases. If IF is a keyword, it should be handled here, not as an identifier.

Parsing a Shape Statement

Let's parse shape circle c1 { radius: 50, x: 0 }:

parseShape() {
    this.eat('SHAPE');                    // Consume 'shape' keyword

    const shapeType = this.currentToken.value;
    this.eat('IDENTIFIER');               // Consume shape type ('circle')

    const name = this.currentToken.value;
    this.eat('IDENTIFIER');               // Consume shape name ('c1')

    this.eat('LBRACE');                   // Consume '{'

    // Parse properties until we see '}'
    const params = {};
    while (this.currentToken.type !== 'RBRACE') {
        const key = this.currentToken.value;
        this.eat('IDENTIFIER');           // Property name ('radius')
        this.eat('COLON');                // Consume ':'
        const value = this.parseExpression();  // Parse the value
        params[key] = value;

        // Optional comma (we allow trailing commas)
        if (this.currentToken.type === 'COMMA') {
            this.eat('COMMA');
        }
    }

    this.eat('RBRACE');                   // Consume '}'

    return {
        type: 'shape',
        shapeType: shapeType,
        name: name,
        params: params
    };
}

Step by step:

Eat the SHAPE token (verify it's there)
Get the shape type (next identifier)
Get the shape name (next identifier)
Eat the opening brace
Parse properties in a loop until we see the closing brace
Each property is key: value
Eat the closing brace
Return the AST node

The loop: We keep parsing properties until we hit RBRACE. This handles zero or more properties. The while loop naturally handles the "zero or more" part of the grammar.

Parsing Expressions

Expressions are where it gets interesting. You need to handle operator precedence. 2 + 3 * 4 should be 2 + (3 * 4), not (2 + 3) * 4.

The trick is to use separate methods for each precedence level:

parseExpression() {
    let node = this.parseTerm();  // Start with terms (higher precedence)

    // Handle + and - (lowest precedence)
    while (this.currentToken.type === 'PLUS' || this.currentToken.type === 'MINUS') {
        const operator = this.currentToken.type;
        this.eat(operator);
        node = {
            type: 'binary_op',
            operator: operator.toLowerCase(),
            left: node,
            right: this.parseTerm()  // Right side is also a term
        };
    }

    return node;
}

parseTerm() {
    let node = this.parseFactor();  // Start with factors (highest precedence)

    // Handle * and / (higher precedence than + and -)
    while (this.currentToken.type === 'MULTIPLY' || this.currentToken.type === 'DIVIDE') {
        const operator = this.currentToken.type;
        this.eat(operator);
        node = {
            type: 'binary_op',
            operator: operator.toLowerCase(),
            left: node,
            right: this.parseFactor()  // Right side is also a factor
        };
    }

    return node;
}

parseFactor() {
    const token = this.currentToken;

    // Numbers
    if (token.type === 'NUMBER') {
        this.eat('NUMBER');
        return { type: 'number', value: token.value };
    }

    // Identifiers (parameters, shape references)
    if (token.type === 'IDENTIFIER') {
        this.eat('IDENTIFIER');
        return { type: 'identifier', value: token.value };
    }

    // Strings
    if (token.type === 'STRING') {
        this.eat('STRING');
        return { type: 'string', value: token.value };
    }

    // Parentheses
    if (token.type === 'LPAREN') {
        this.eat('LPAREN');
        const expr = this.parseExpression();
        this.eat('RPAREN');
        return expr;
    }

    // Unary minus (negative numbers)
    if (token.type === 'MINUS') {
        this.eat('MINUS');
        return {
            type: 'unary_op',
            operator: 'minus',
            operand: this.parseFactor()
        };
    }

    this.error(`Unexpected token in expression: ${token.type}`);
}

How precedence works:

parseExpression() calls parseTerm() - so terms (multiplication/division) are evaluated first
parseTerm() calls parseFactor() - so factors (numbers, identifiers, parentheses) are evaluated first
When you see 2 + 3 * 4:
- parseExpression() sees +, so it calls parseTerm() for the left (2) and right (3 * 4)
- parseTerm() sees *, so it calls parseFactor() for left (3) and right (4)
- Result: 2 + (3 * 4) - correct!

The while loops: They handle left-associativity. 10 - 5 - 2 becomes (10 - 5) - 2, not 10 - (5 - 2).

Parsing Parameters

Parameters are simple: param size 100

parseParam() {
    this.eat('PARAM');

    const name = this.currentToken.value;
    this.eat('IDENTIFIER');

    const value = this.parseExpression();

    return {
        type: 'param',
        name: name,
        value: value
    };
}

The value can be any expression - a number, a calculation, a function call, etc.

Parsing If Statements

If statements: if condition { ... } else { ... }

parseIfStatement() {
    this.eat('IF');

    const condition = this.parseCondition();  // Parse the condition

    this.eat('LBRACE');
    const thenBody = [];
    while (this.currentToken.type !== 'RBRACE') {
        thenBody.push(this.parseStatement());
    }
    this.eat('RBRACE');

    let elseBody = null;
    if (this.currentToken.type === 'ELSE') {
        this.eat('ELSE');
        this.eat('LBRACE');
        elseBody = [];
        while (this.currentToken.type !== 'RBRACE') {
            elseBody.push(this.parseStatement());
        }
        this.eat('RBRACE');
    }

    return {
        type: 'if_statement',
        condition: condition,
        thenBody: thenBody,
        elseBody: elseBody
    };
}

The else clause: It's optional. Check if the next token is ELSE, and only parse it if it is.

Parsing the body: The body is a list of statements. We parse them in a loop until we hit the closing brace.

Parsing For Loops

For loops: for i from 0 to 10 step 2 { ... }

parseForLoop() {
    this.eat('FOR');

    const variable = this.currentToken.value;
    this.eat('IDENTIFIER');

    this.eat('FROM');
    const from = this.parseExpression();

    this.eat('TO');
    const to = this.parseExpression();

    let step = null;
    if (this.currentToken.type === 'STEP') {
        this.eat('STEP');
        step = this.parseExpression();
    }

    this.eat('LBRACE');
    const body = [];
    while (this.currentToken.type !== 'RBRACE') {
        body.push(this.parseStatement());
    }
    this.eat('RBRACE');

    return {
        type: 'for_loop',
        variable: variable,
        from: from,
        to: to,
        step: step,
        body: body
    };
}

The step is optional: If there's no STEP keyword, we use null and the interpreter defaults to 1.

Parsing Boolean Operations

Boolean operations: union u1 { add c1, add r1 }

parseBooleanOperation() {
    const operation = this.currentToken.type.toLowerCase();  // 'union', 'difference', etc.
    this.eat(this.currentToken.type);

    const name = this.currentToken.value;
    this.eat('IDENTIFIER');

    this.eat('LBRACE');
    const shapes = [];
    while (this.currentToken.type !== 'RBRACE') {
        if (this.currentToken.type === 'ADD' || this.currentToken.type === 'SUBTRACT') {
            const op = this.currentToken.type.toLowerCase();
            this.eat(this.currentToken.type);
            const shapeRef = this.parseExpression();  // Could be identifier or expression
            shapes.push({ op: op, shape: shapeRef });
        } else {
            this.error(`Expected 'add' or 'subtract' in boolean operation`);
        }
    }
    this.eat('RBRACE');

    return {
        type: 'boolean_operation',
        operation: operation,
        name: name,
        shapes: shapes
    };
}

The shapes array: Each entry has an operation (add or subtract) and a shape reference. The shape reference is usually an identifier, but we parse it as an expression to be flexible.

Error Recovery

When parsing fails, you want helpful errors:

error(message) {
    const line = this.currentToken ? this.currentToken.line : 'unknown';
    const column = this.currentToken ? this.currentToken.column : 'unknown';
    const tokenType = this.currentToken ? this.currentToken.type : 'EOF';
    throw new Error(`Parser error at line ${line}, col ${column}: ${message}. Got token: ${tokenType}`);
}

Include context: Tell the user what token you expected and what you got. This makes debugging much easier.

Common Issues

Parser stops early:

Check that you're eating all required tokens
Verify you're not skipping tokens accidentally

Wrong precedence:

Make sure parseExpression() calls parseTerm(), which calls parseFactor()
Check that operators are handled at the right level

Infinite loops:

Make sure loops have exit conditions
Verify you're consuming tokens in loops (not just checking them)

AST structure wrong:

Check that you're returning objects with type fields
Verify nested structures match what the interpreter expects

The parser is the most complex part. Get the expression parsing right, and the rest follows naturally.

The Main Loop

parse() {
    const statements = [];
    while (this.currentToken.type !== 'EOF') {
        statements.push(this.parseStatement());
    }
    return statements;
}

A program is just a list of statements. We parse each one until we hit the end of the file.

Statement Parsing

The parseStatement() method is a big switch statement. It looks at the current token type and calls the appropriate parsing method:

PARAM → parseParam()
SHAPE → parseShape()
UNION/DIFFERENCE/INTERSECTION → parseBooleanOperation()
IF → parseIfStatement()
etc.

Gotcha: The order matters. If you have if as a keyword, make sure it's checked before generic identifiers, or you'll get weird errors.

Shape Parsing

Shapes follow the pattern: shape <type> <name> { <properties> }

The parser:

Eats the SHAPE token
Gets the shape type (identifier like "circle")
Gets the shape name (another identifier)
Eats the {
Parses properties until it sees }
Eats the }

Properties are key: value pairs. The key is an identifier, then a colon, then an expression (which can be a number, string, parameter reference, math expression, etc.).

Expression Parsing

This is where it gets interesting. Expressions need to respect operator precedence - 2 + 3 * 4 should be 2 + (3 * 4), not (2 + 3) * 4.

The parser handles this with separate methods:

parseExpression() handles + and -
parseTerm() handles * and /
parseFactor() handles the base cases (numbers, identifiers, parentheses, etc.)

The trick is that parseExpression() calls parseTerm(), which calls parseFactor(). This creates the right precedence automatically.

How it works: When you see 2 + 3 * 4, parseExpression() sees the +, so it:

Takes what's on the left (2, from parseTerm())
Eats the +
Takes what's on the right (3 4, from parseTerm() which handles the ``)

So the * gets grouped before the +, which is what we want.

AST Node Structure

Different node types have different structures, but they all have a type field. Here are the common ones:

Shape node:

{
    type: 'shape',
    shapeType: 'circle',
    name: 'c1',
    params: { radius: { type: 'number', value: 50 } }
}

Parameter node:

{
    type: 'param',
    name: 'size',
    value: { type: 'number', value: 100 }
}

Binary operation:

{
    type: 'binary_op',
    operator: '+',
    left: { type: 'number', value: 10 },
    right: { type: 'number', value: 20 }
}

If statement:

{
    type: 'if_statement',
    condition: { /* expression */ },
    thenBody: [ /* statements */ ],
    elseBody: [ /* statements */ ]  // optional
}

The structure is pretty straightforward - it mirrors the code structure.

Adding New Syntax

To add a new language construct:

Add the keyword to the lexer (if it's a keyword)
Add a parsing method like parseMyNewThing()
Add a case to parseStatement() that calls it
Make sure it returns an AST node with a type field
Add interpreter support (see below)

The parser is pretty modular - adding new constructs is usually straightforward.

Building the Interpreter From Scratch

The interpreter is where things actually happen. It walks through the AST and creates shapes, sets parameters, runs loops, etc. Let's build it step by step.

The Environment

The interpreter needs somewhere to store runtime state. That's the Environment:

export class Environment {
    constructor() {
        this.parameters = new Map();  // Parameter name → value
        this.shapes = new Map();      // Shape name → shape object
        this.layers = new Map();      // Layer name → layer object
        this.functions = new Map();   // Function name → function definition
    }

    setParameter(name, value) {
        this.parameters.set(name, value);
    }

    getParameter(name) {
        if (!this.parameters.has(name)) {
            throw new Error(`Parameter not found: ${name}`);
        }
        return this.parameters.get(name);
    }

    createShapeWithName(type, name, params) {
        const shape = {
            type: type,
            shapeType: type,
            params: params,
            transform: {
                position: params.position || [0, 0],
                rotation: params.rotation || 0,
                scale: [1, 1]
            }
        };
        this.shapes.set(name, shape);
        return shape;
    }
}

Why Maps? Fast lookups. When you reference param.size, we need to find it quickly. Maps are O(1) lookup.

The shape structure: Shapes have type, params, and transform. The renderer uses this structure. Don't change it without updating the renderer.

Basic Interpreter Structure

Start with a class that holds the environment:

export class Interpreter {
    constructor() {
        this.env = new Environment();
        this.functions = new Map();
        this.constraints = [];
        this.currentLoopCounter = undefined;  // For loop name mangling
        this.currentFunctionContext = null;   // For function name mangling
    }

    interpret(ast) {
        let result = null;
        for (const node of ast) {
            result = this.evaluateNode(node);
        }
        return {
            parameters: this.env.parameters,
            shapes: this.env.shapes,
            layers: this.env.layers,
            functions: this.functions,
            constraints: this.constraints,
            result: result
        };
    }
}

The main loop: Walk through each AST node, evaluate it, return everything at the end. The result object contains all the runtime state that other systems need.

Evaluating Nodes

The evaluateNode() method dispatches to specific evaluators:

evaluateNode(node) {
    // Handle name mangling for loops
    if (node.type === 'shape' && this.currentLoopCounter !== undefined) {
        node = {
            ...node,
            name: `${node.name}_${this.currentLoopCounter}`
        };
    }

    switch (node.type) {
        case 'param':
            return this.evaluateParam(node);
        case 'shape':
            return this.evaluateShape(node);
        case 'boolean_operation':
            return this.evaluateBooleanOperation(node);
        case 'if_statement':
            return this.evaluateIfStatement(node);
        case 'for_loop':
            return this.evaluateForLoop(node);
        case 'function_definition':
            return this.evaluateFunctionDefinition(node);
        case 'function_call':
            return this.evaluateFunctionCall(node);
        default:
            throw new Error(`Unknown node type: ${node.type}`);
    }
}

Name mangling: If we're in a loop, we append the loop counter to shape names. This prevents name collisions when the same shape name is used in multiple loop iterations.

Evaluating Parameters

Parameters are simple: store the value in the environment.

evaluateParam(node) {
    const value = this.evaluateExpression(node.value);
    this.env.setParameter(node.name, value);
    return value;
}

The value is an expression: It could be 100, or 50 + 50, or param.otherParam * 2. We evaluate it first, then store the result.

Evaluating Shapes

This is where shapes get created:

evaluateShape(node) {
    // Generate unique name
    let shapeName = node.name;
    if (this.currentFunctionContext) {
        shapeName = `${shapeName}_${this.currentFunctionContext.name}_${this.currentFunctionContext.callId}`;
    } else if (this.currentLoopCounter !== undefined) {
        shapeName = `${shapeName}_${this.currentLoopCounter}`;
    }

    // Evaluate all parameter expressions
    const params = {};
    for (const [key, expr] of Object.entries(node.params)) {
        const evaluatedValue = this.evaluateExpression(expr);
        params[key] = this.processShapeParameter(key, evaluatedValue);
    }

    // Apply defaults and process special parameters
    this.processShapeFillParameters(node.shapeType, params);

    // Create the shape
    const shape = this.env.createShapeWithName(node.shapeType, shapeName, params);
    return shape;
}

Step by step:

Generate a unique name (handles loops/functions)
Evaluate each parameter expression
Process and validate parameters
Apply shape-specific defaults
Create the shape object
Store it in the environment
Return it

Parameter evaluation: Each property value is an expression. radius: 50 is easy, but radius: param.size * 2 needs evaluation. We evaluate each one.

Evaluating Expressions

Expressions can be literals, identifiers, binary operations, function calls, etc.:

evaluateExpression(node) {
    switch (node.type) {
        case 'number':
            return node.value;

        case 'string':
            return node.value;

        case 'identifier':
            // Check if it's a parameter
            if (this.env.parameters.has(node.value)) {
                return this.env.getParameter(node.value);
            }
            // Check if it's a shape reference
            if (this.env.shapes.has(node.value)) {
                return node.value;  // Return name as string for boolean ops
            }
            throw new Error(`Undefined identifier: ${node.value}`);

        case 'binary_op':
            const left = this.evaluateExpression(node.left);
            const right = this.evaluateExpression(node.right);
            return this.applyBinaryOperator(node.operator, left, right);

        case 'unary_op':
            const operand = this.evaluateExpression(node.operand);
            if (node.operator === 'minus') {
                return -operand;
            }
            throw new Error(`Unknown unary operator: ${node.operator}`);

        case 'array':
            return node.elements.map(el => this.evaluateExpression(el));

        case 'function_call':
            return this.evaluateFunctionCall(node);

        default:
            throw new Error(`Unknown expression type: ${node.type}`);
    }
}

Recursive evaluation: Expressions can contain other expressions. 10 + param.size has a binary operation with a number and an identifier. We evaluate recursively.

Identifier lookup: First check parameters, then shapes. If neither, it's an error. This means you can't have a parameter and shape with the same name (the parameter wins).

Binary Operators

Apply operators to evaluated operands:

applyBinaryOperator(op, left, right) {
    switch (op) {
        case '+': return left + right;
        case '-': return left - right;
        case '*': return left * right;
        case '/': 
            if (right === 0) throw new Error('Division by zero');
            return left / right;
        case '%': return left % right;
        case '==': return left === right;
        case '!=': return left !== right;
        case '<': return left < right;
        case '<=': return left <= right;
        case '>': return left > right;
        case '>=': return left >= right;
        case 'and': return left && right;
        case 'or': return left || right;
        default:
            throw new Error(`Unknown operator: ${op}`);
    }
}

Type coercion: JavaScript does this automatically. "5" + 3 becomes "53" (string concatenation), "5" * 3 becomes 15 (numeric multiplication). This is usually what you want, but be aware of it.

Division by zero: Check for this explicitly. JavaScript returns Infinity, but that's probably not what you want.

Evaluating If Statements

If statements evaluate conditionally:

evaluateIfStatement(node) {
    const condition = this.evaluateExpression(node.condition);

    if (condition) {
        // Evaluate then body
        for (const stmt of node.thenBody) {
            this.evaluateNode(stmt);
        }
    } else if (node.elseBody) {
        // Evaluate else body
        for (const stmt of node.elseBody) {
            this.evaluateNode(stmt);
        }
    }

    return null;  // If statements don't return values
}

The condition: Can be any expression that evaluates to a truthy/falsy value. param.size > 50, true, param.enabled and param.visible, etc.

The bodies: Arrays of statements. We evaluate each one in order.

Evaluating For Loops

For loops create multiple shapes:

evaluateForLoop(node) {
    const from = this.evaluateExpression(node.from);
    const to = this.evaluateExpression(node.to);
    const step = node.step ? this.evaluateExpression(node.step) : 1;

    // Set loop counter for name mangling
    this.currentLoopCounter = from;

    while (this.currentLoopCounter <= to) {
        // Evaluate loop body
        for (const stmt of node.body) {
            this.evaluateNode(stmt);
        }

        this.currentLoopCounter += step;
    }

    // Clear loop counter
    this.currentLoopCounter = undefined;
}

How it works:

Evaluate the range (from, to, step)
Set currentLoopCounter to the starting value
Loop while counter <= to
Evaluate the body (shapes created here get the counter appended to their names)
Increment counter
Clear the counter when done

Name mangling: When evaluateNode() sees a shape and currentLoopCounter is set, it appends the counter to the name. So shape circle c1 in a loop becomes c1_0, c1_1, c1_2, etc.

Gotcha: Nested loops overwrite currentLoopCounter. If you have nested loops with the same shape names, they'll conflict. This is a known limitation.

Evaluating Boolean Operations

Boolean operations combine shapes:

evaluateBooleanOperation(node) {
    // Get shape names from the AST
    const shapeNames = node.shapes.map(s => {
        if (typeof s === 'string') {
            return s;
        } else if (s.shape) {
            // It's an object with op and shape
            return this.evaluateExpression(s.shape);
        } else {
            return this.evaluateExpression(s);
        }
    });

    // Get actual shape objects
    const shapes = shapeNames.map(name => {
        if (!this.env.shapes.has(name)) {
            throw new Error(`Shape not found: ${name}`);
        }
        return this.env.shapes.get(name);
    });

    // Perform the boolean operation (uses ClipperLib)
    const result = this.booleanOperator.perform(
        node.operation,
        shapes
    );

    // Mark original shapes as consumed
    shapes.forEach(shape => {
        shape._consumedByBoolean = true;
    });

    // Create result shape
    const resultShape = {
        type: node.operation,
        shapeType: node.operation,
        params: {
            operation: node.operation,
            shapes: shapeNames
        },
        _consumedByBoolean: false
    };

    // Store result
    this.env.shapes.set(node.name, resultShape);
    return resultShape;
}

Shape references: The AST might have shape names as strings, or as identifier expressions. We evaluate them to get the actual names.

Boolean operations: Use ClipperLib (loaded from CDN) to compute the actual geometry. This is expensive - polygon clipping is complex.

Marking consumed: Original shapes are marked _consumedByBoolean = true. The renderer skips them. We don't delete them because other code might reference them.

Evaluating Functions

Functions are stored separately from the environment:

evaluateFunctionDefinition(node) {
    this.functions.set(node.name, {
        parameters: node.parameters,
        body: node.body
    });
    return null;  // Function definitions don't return values
}

evaluateFunctionCall(node) {
    const func = this.functions.get(node.name);
    if (!func) {
        throw new Error(`Function not found: ${node.name}`);
    }

    // Save current parameter map
    const oldParams = new Map(this.env.parameters);

    // Bind arguments to parameters
    for (let i = 0; i < func.parameters.length; i++) {
        const argValue = this.evaluateExpression(node.arguments[i]);
        this.env.setParameter(func.parameters[i], argValue);
    }

    // Execute function body
    let result = null;
    this.currentFunctionContext = {
        name: node.name,
        callId: this.functionCallCounters.get(node.name) || 0
    };
    this.functionCallCounters.set(node.name, (this.functionCallCounters.get(node.name) || 0) + 1);

    for (const stmt of func.body) {
        result = this.evaluateNode(stmt);
        if (this.currentReturn !== null) {
            result = this.currentReturn;
            this.currentReturn = null;
            break;
        }
    }

    // Restore parameter map
    this.env.parameters = oldParams;
    this.currentFunctionContext = null;

    return result;
}

Scope management: Functions create a new scope. Parameters defined in the function don't leak out. Parameters from outside are still accessible (unless shadowed by function parameters).

Name mangling: Function calls also do name mangling. If a function creates a shape named c1 and you call it twice, you get c1_myFunc_0 and c1_myFunc_1. This prevents collisions.

Return values: Functions can return values. The return statement sets this.currentReturn, which we check after each statement.

Processing Shape Parameters

Shape parameters need special handling:

processShapeParameter(key, value) {
    // Handle position arrays
    if (key === 'position' && Array.isArray(value)) {
        return value;
    }

    // Handle color names
    if ((key === 'color' || key === 'fillColor' || key === 'strokeColor') && typeof value === 'string') {
        return this.resolveColorName(value);
    }

    // Everything else is passed through
    return value;
}

resolveColorName(colorName) {
    const colorMap = {
        'red': '#FF0000',
        'green': '#008000',
        'blue': '#0000FF',
        // ... etc
    };
    return colorMap[colorName.toLowerCase()] || colorName;
}

Color resolution: Named colors like "red" need to be converted to hex. We do this here.

Position arrays: Position is [x, y]. We validate that it's an array with two numbers.

What Gets Returned

At the end, the interpreter returns everything:

return {
    parameters: this.env.parameters,  // All parameters
    shapes: this.env.shapes,           // All shapes (this is what renderer uses)
    layers: this.env.layers,           // All layers
    functions: this.functions,         // Function definitions
    constraints: this.constraints,     // Constraint definitions
    result: result                     // Last evaluated value
};

The renderer uses result.shapes: This is a Map of shape name → shape object. The renderer iterates over it and draws each shape.

This should output the parameters and shapes. If shapes aren't created, check:

Are you evaluating expressions correctly?
Are shapes being stored in the environment?
Are parameter lookups working?

Common Issues

Shapes not created:

Check that evaluateShape() is being called
Verify shapes are stored in env.shapes
Make sure shape names are unique

Parameters not found:

Check that parameters are stored before they're used
Verify parameter lookup in evaluateExpression()
Make sure parameter names match

Wrong values:

Check expression evaluation
Verify operator application
Make sure type coercion is working as expected

Name collisions:

Check name mangling in loops/functions
Verify shape names are unique
Make sure you're not overwriting shapes accidentally

The interpreter is the execution engine. Get this right, and your language works. The lexer and parser just prepare the data - the interpreter actually does things.

Shape Objects

Shapes are just JavaScript objects. They look like:

{
    type: 'circle',
    shapeType: 'circle',  // Sometimes both, for compatibility
    params: {
        radius: 50,
        x: 0,
        y: 0,
        fill: true,
        color: '#FF0000'
    },
    transform: {
        position: [0, 0],
        rotation: 0,
        scale: [1, 1]
    }
}

The renderer uses these objects to draw things. The interpreter's job is to create them.

The Main Loop

interpret(ast) {
    let result = null;
    for (const node of ast) {
        result = this.evaluateNode(node);
    }
    return {
        parameters: this.env.parameters,
        shapes: this.env.shapes,
        // ... other stuff
    };
}

Simple: walk through each AST node, evaluate it, return everything at the end.

Evaluating Nodes

evaluateNode() is a big switch statement that dispatches to specific evaluators:

evaluateNode(node) {
    switch (node.type) {
        case 'param':
            return this.evaluateParam(node);
        case 'shape':
            return this.evaluateShape(node);
        case 'boolean_operation':
            return this.evaluateBooleanOperation(node);
        // ... etc
    }
}

Each evaluator knows how to handle its specific node type.

Parameter Evaluation

When you see param size 100:

Evaluate the value expression (in this case, just 100)
Store it in env.parameters with key "size"

Later, when the interpreter sees param.size in an expression, it looks it up in the parameters map.

Gotcha: Parameters are evaluated eagerly. If you do param x 10 + 5, the value stored is 15, not the expression 10 + 5.

Shape Evaluation

This is where shapes get created:

Generate a unique name (handles loops/functions - more on that later)
Evaluate all the parameter expressions
Process and validate the parameters
Create the shape object
Store it in env.shapes

Important: Shape names need to be unique. If you're in a loop, the same shape name gets used multiple times, so we append the loop counter: c1_0, c1_1, etc.

Expression Evaluation

Expressions can be:

Literals (numbers, strings)
Identifiers (parameters, shape references)
Binary operations (+, -, *, /, etc.)
Function calls
Arrays

The evaluator recursively evaluates sub-expressions. For 10 + param.size, it:

Evaluates 10 → 10
Evaluates param.size → looks up "size" in parameters → 100
Applies + operator → 110

Gotcha: Shape references in expressions return the shape name as a string, not the shape object. This is for boolean operations - you reference shapes by name.

Boolean Operations

Boolean operations (union, difference, intersection) are interesting:

Get the shape names from the AST
Look up the actual shape objects
Call the boolean operator (uses Vatti clipping algorithm)
Mark the original shapes as _consumedByBoolean = true (so they don't render)
Create a new result shape
Store it

The result shape has type set to the operation name ("union", "difference", etc.) and the renderer knows how to handle it.

Important: The original shapes are still in the map, but they're marked as consumed. The renderer skips them. This is simpler than deleting them, which would break references.

Control Flow

If statements: Evaluate the condition, if true evaluate the then body, else evaluate the else body (if it exists). Pretty standard.

For loops: This is where the name mangling happens. When you're in a loop, this.currentLoopCounter is set. When creating shapes, the name gets the counter appended: c1_0, c1_1, etc.

Gotcha: The loop counter is a property on the interpreter instance. If you have nested loops, the inner one overwrites the outer one. This is a known limitation - nested loops with the same shape names will conflict.

Functions

Functions are stored in this.functions (separate from the environment). When you call a function:

Save the current parameter map
Bind the arguments to parameter names
Execute the function body
Restore the old parameter map

This creates a new scope. Parameters defined in the function don't leak out, and parameters from outside are still accessible (unless shadowed).

Gotcha: Function calls also do name mangling. If a function creates a shape named c1 and you call it twice, you get c1_myFunc_0 and c1_myFunc_1. This prevents name collisions.

What Gets Returned

At the end, the interpreter returns an object with:

parameters - all the parameters
shapes - all the shapes (this is what the renderer uses)
layers - all the layers
functions - function definitions
constraints - constraint definitions
result - the last evaluated value

The renderer takes result.shapes and draws them.

Common Gotchas

Name Collisions

Shape names need to be unique. The interpreter handles this with name mangling in loops and functions, but if you manually create shapes with the same name, the later one overwrites the earlier one. No error is thrown - it just silently overwrites.

Parameter Lookup

When evaluating an identifier, the interpreter checks:

Is it a parameter? → return its value
Is it a shape? → return its name (as string)
Otherwise → error

This means you can't have a parameter and a shape with the same name. The parameter wins.

Expression Evaluation Order

Expressions are evaluated left-to-right, but operator precedence is respected. So 2 + 3 * 4 is 2 + (3 * 4) = 14, not (2 + 3) * 4 = 20.

Boolean Operations

After a boolean operation, the original shapes are still in the map but marked as consumed. Don't try to use them in another boolean operation - use the result shape instead.

Error Messages

All errors should include line and column numbers. The lexer tracks this, the parser passes it through, and the interpreter should preserve it. If you're adding new code, make sure errors are helpful.

How to Add Features

Adding a New Shape Type

The lexer and parser already handle shape <type> generically, so you usually don't need to change them. You need to:

Add shape creation logic (usually in Shapes.mjs or the interpreter)
Add rendering support (in the renderer)

The shape type is just a string - "circle", "rectangle", etc. The interpreter and renderer need to know what to do with it.

Adding a New Operator

Lexer: Add character handling (if it's a single char) or a parsing method (if it's multi-char)
Parser: Add to expression parsing with the right precedence
Interpreter: Add evaluation logic in applyBinaryOperator() or wherever makes sense

For example, to add ^ for exponentiation:

Lexer: handle ^ character → POWER token
Parser: add to parseFactor() or create a new precedence level
Interpreter: case 'power': return Math.pow(left, right)

Adding a New Control Structure

Lexer: Add keyword
Parser: Add parseMyNewThing() method
Parser: Add case to parseStatement()
Interpreter: Add case to evaluateNode() and implement the logic

For example, to add while loops:

Lexer: 'while': 'WHILE'
Parser: parseWhileStatement() that parses condition and body
Interpreter: evaluateWhileStatement() that loops while condition is true

How to Build the Language System - Complete Step-by-Step Guide

This section provides a complete, step-by-step guide for building the entire language system (Lexer, Parser, Interpreter) from scratch.

Prerequisites

Before building the language system, you need:

Basic JavaScript knowledge
Understanding of tokenization, parsing, and interpretation concepts
A text editor and browser for testing

Part 1: Building the Lexer

Step 1.1: Create the Token Class

File: src/lexer.mjs

What You're Building: The Token class represents a single token in the source code. Every piece of code (keywords, identifiers, numbers, operators) becomes a Token object. This class stores the token's type, value, and position information.

Why This Class Exists: Tokens are the building blocks of parsing. Instead of working with raw characters, the lexer converts characters into tokens, which are easier for the parser to work with. The Token class provides a structured way to represent these tokens with all necessary information.

Understanding Each Property:

type: The category of token (e.g., 'IDENTIFIER', 'NUMBER', 'SHAPE', 'LBRACE'). This tells the parser what kind of token it is.
value: The actual content of the token (e.g., "circle" for an identifier, 50 for a number, "{" for a brace). This is the data the parser needs.
line: The line number where this token appears in the source code. Critical for error messages - users need to know where errors occurred.
column: The column number where this token starts. Also critical for precise error reporting.

Why Store Position: When the parser encounters an error, it needs to tell the user exactly where the problem is. "Error at line 5, column 12" is much more helpful than "Error somewhere in your code". The line and column information comes from the lexer's position tracking.

How to Build It:

Step 1.1.1: Create the Class Structure Start by creating a class that will hold all token information:

export class Token {
  constructor(type, value, line, column) {
    this.type = type;      // Token type: 'IDENTIFIER', 'NUMBER', etc.
    this.value = value;    // Token value: actual string/number
    this.line = line;      // Line number (for error messages)
    this.column = column;  // Column number (for error messages)
  }

Step 1.1.2: Add toString Method (Optional but Helpful) Add a method to convert the token to a string for debugging:

  toString() {
    return `Token(${this.type}, ${this.value}, ${this.line}:${this.column})`;
  }
}

Why toString(): This method is helpful for debugging. When you log a token, you'll see a readable representation like "Token(NUMBER, 50, 1:10)" instead of "[object Object]".

The Complete Token Class:

export class Token {
  constructor(type, value, line, column) {
    this.type = type;      // Token type: 'IDENTIFIER', 'NUMBER', etc.
    this.value = value;    // Token value: actual string/number
    this.line = line;      // Line number (for error messages)
    this.column = column;  // Column number (for error messages)
  }

  toString() {
    return `Token(${this.type}, ${this.value}, ${this.line}:${this.column})`;
  }
}

Building This Step by Step:

Create a new file src/lexer.mjs
Export a class called Token
Add constructor with four parameters: type, value, line, column
Store each parameter as an instance property (this.type, this.value, etc.)
Add optional toString() method for debugging
This class will be used throughout the lexer to create token objects

Test:

const token = new Token('NUMBER', 50, 1, 10);
console.log(token.toString()); // Token(NUMBER, 50, 1:10)

Step 1.2: Create the Basic Lexer Structure

What You're Building: The Lexer class is the foundation of the tokenization system. It reads characters from the source code one at a time and converts them into tokens. This step creates the basic structure with position tracking and helper methods.

Why This Structure: The lexer needs to track its position in the source code, know what character it's currently looking at, and be able to move forward. It also needs helper methods to peek ahead, advance position, and report errors with precise location information.

How to Build It Step by Step:

Step 1.2.1: Create the Lexer Class and Constructor Start with the class definition and constructor that initializes all tracking variables:

export class Lexer {
  constructor(input) {
    // Step 1.2.1.1: Store the input string
    // This is the entire source code that needs to be tokenized
    this.input = input;

    // Step 1.2.1.2: Initialize position tracking
    // Position is a zero-based index into the input string
    // We start at position 0 (first character)
    this.position = 0;

    // Step 1.2.1.3: Initialize line and column tracking
    // Line and column start at 1 (human-readable, not zero-based)
    // These are used for error messages
    this.line = 1;
    this.column = 1;

    // Step 1.2.1.4: Get the current character
    // If input is empty, currentChar will be null
    // Otherwise, it's the character at position 0
    this.currentChar = this.input[0] || null;
  }

Why These Properties:

input: Stores the entire source code. The lexer needs to read through this character by character.
position: Zero-based index tracking where we are in the string. Used to access characters via input[position].
line and column: Human-readable position (starting at 1). Essential for error messages that users can understand.
currentChar: The character we're currently examining. null means we've reached the end of input.

Step 1.2.2: Implement the advance() Method This method moves the lexer forward by one character:

  advance() {
    // Step 1.2.2.1: Move position forward
    // Increment the position index to point to the next character
    this.position++;

    // Step 1.2.2.2: Check if we've reached the end
    // If position is beyond the input length, we're at end of file
    if (this.position >= this.input.length) {
      this.currentChar = null;  // Signal end of input
    } else {
      // Step 1.2.2.3: Update current character
      // Get the character at the new position
      this.currentChar = this.input[this.position];

      // Step 1.2.2.4: Update column number
      // Moving forward horizontally increases the column
      this.column++;
    }
  }

Why This Method: Every time the lexer consumes a character, it needs to move forward. This method handles that movement and updates all tracking variables. It's called constantly throughout tokenization.

Important Note About Line Tracking: Notice that advance() doesn't update line. That's because line increments happen in skipWhitespace() when a newline character is encountered. This separation keeps the logic clear.

Step 1.2.3: Implement the peek() Method This method looks ahead at the next character without consuming it:

  peek() {
    // Step 1.2.3.1: Check if there's a next character
    // If position + 1 is beyond input length, there's nothing ahead
    if (this.position + 1 >= this.input.length) {
      return null;  // No next character
    }

    // Step 1.2.3.2: Return the next character
    // Return the character at position + 1 without moving position
    return this.input[this.position + 1];
  }

Why This Method: Sometimes you need to look ahead to decide what to do. For example, = could be assignment or == could be equality. By peeking ahead, you can check if the next character is also = before deciding which token to create. The key is that peek() doesn't call advance(), so it doesn't consume the character.

Step 1.2.4: Implement the error() Method This method reports errors with precise location information:

  error(message) {
    // Step 1.2.4.1: Throw error with position information
    // Include line and column so users know exactly where the problem is
    throw new Error(`Lexer error at line ${this.line}, col ${this.column}: ${message}`);
  }
}

Why This Method: When the lexer encounters something it can't handle (like an unexpected character), it needs to report an error. Including line and column information makes debugging much easier. Users can go directly to the problematic location in their code.

The Complete Basic Structure:

export class Lexer {
  constructor(input) {
    this.input = input;           // Source code string
    this.position = 0;             // Current character position
    this.line = 1;                 // Current line number
    this.column = 1;               // Current column number
    this.currentChar = this.input[0] || null;  // Current character
  }

  advance() {
    this.position++;
    if (this.position >= this.input.length) {
      this.currentChar = null;  // End of input
    } else {
      this.currentChar = this.input[this.position];
      this.column++;
    }
  }

  peek() {
    // Look ahead one character without consuming it
    if (this.position + 1 >= this.input.length) {
      return null;
    }
    return this.input[this.position + 1];
  }

  error(message) {
    throw new Error(`Lexer error at line ${this.line}, col ${this.column}: ${message}`);
  }
}

Building This Step by Step:

Add the Lexer class to src/lexer.mjs (same file as Token class)
Create constructor that takes input parameter
Initialize input property with the source code string
Initialize position to 0 (start at beginning)
Initialize line to 1 (first line)
Initialize column to 1 (first column)
Initialize currentChar to first character (or null if empty)
Create advance() method that moves position forward
Update currentChar when advancing
Update column when advancing (but not line - that's handled elsewhere)
Set currentChar to null when reaching end of input
Create peek() method that returns next character without consuming
Return null if no next character exists
Create error() method that throws error with line/column information
This basic structure provides the foundation for all tokenization

Test:

const lexer = new Lexer('hello');
console.log(lexer.currentChar); // 'h'
lexer.advance();
console.log(lexer.currentChar); // 'e'

Step 1.3: Implement Whitespace and Comment Skipping

What You're Building: Methods to skip over whitespace characters and comments. These don't produce tokens - they're just ignored during tokenization. However, they're important for tracking line numbers correctly.

Why These Methods: Whitespace and comments are not meaningful tokens - they're just formatting. The lexer needs to skip over them without creating tokens. However, newlines in whitespace are important because they affect line number tracking.

How to Build It Step by Step:

Step 1.3.1: Implement skipWhitespace() Method This method consumes all consecutive whitespace characters:

skipWhitespace() {
  // Step 1.3.1.1: Loop while current character is whitespace
  // /\s/ matches any whitespace: space, tab, newline, etc.
  while (this.currentChar && /\s/.test(this.currentChar)) {
    // Step 1.3.1.2: Check for newline character
    // Newlines are special - they increment the line number
    if (this.currentChar === '\n') {
      this.line++;      // Move to next line
      this.column = 1;  // Reset column to 1 (start of new line)
    }

    // Step 1.3.1.3: Advance past the whitespace character
    // This consumes the character and moves to the next one
    this.advance();
  }
}

Why Handle Newlines Separately: When a newline is encountered, we need to increment the line number and reset the column to 1. This ensures that line/column tracking stays accurate. The advance() method increments column, but for newlines, we want to reset it to 1 instead.

Step 1.3.2: Implement skipComment() Method This method consumes single-line comments (starting with //):

skipComment() {
  // Step 1.3.2.1: Skip the first forward slash
  // We know currentChar is '/' and peek() shows another '/'
  // So we consume the first one
  this.advance();  // Skip first /

  // Step 1.3.2.2: Skip the second forward slash
  // Now we're at the second '/', consume it too
  this.advance();  // Skip second /

  // Step 1.3.2.3: Skip all characters until newline
  // Comments continue until the end of the line
  // We loop until we hit a newline or end of file
  while (this.currentChar !== null && this.currentChar !== '\n') {
    this.advance();  // Skip until newline
  }
  // Note: The newline itself is NOT consumed here
  // It will be handled by skipWhitespace() if called next
}

Why This Approach: Comments start with // and continue until the end of the line. We consume both slashes, then skip all characters until we hit a newline. The newline itself is not consumed - it will be handled by skipWhitespace() if that's called next, which ensures line tracking works correctly.

The Complete Methods:

skipWhitespace() {
  while (this.currentChar && /\s/.test(this.currentChar)) {
    if (this.currentChar === '\n') {
      this.line++;
      this.column = 1;  // Reset column on newline
    }
    this.advance();
  }
}

skipComment() {
  // Skip // comments
  this.advance();  // Skip first /
  this.advance();  // Skip second /
  while (this.currentChar !== null && this.currentChar !== '\n') {
    this.advance();  // Skip until newline
  }
}

Building This Step by Step:

Create skipWhitespace() method in the Lexer class
Add while loop that continues while current character is whitespace
Check if current character is newline
If newline, increment line and reset column to 1
Call advance() to consume the whitespace character
Create skipComment() method
Call advance() twice to skip both forward slashes
Add while loop that continues until newline or end of file
Call advance() to skip each comment character
These methods ensure whitespace and comments don't create tokens

Step 1.4: Implement Number Reading

What You're Building: A method that reads numeric literals from the source code. This handles both integers (like 50) and floating-point numbers (like 3.14). The method accumulates digits, optionally handles a decimal point, and converts the string to a number.

Why This Method: Numbers in source code are sequences of digits, possibly with a decimal point. The lexer needs to recognize these sequences and convert them into NUMBER tokens. This method handles the reading and conversion process.

How to Build It Step by Step:

Step 1.4.1: Initialize Result String and Read Integer Part Start by reading all consecutive digits:

number() {
  // Step 1.4.1.1: Initialize empty string to accumulate digits
  // We'll build the number string character by character
  let result = '';

  // Step 1.4.1.2: Read all consecutive digits
  // /\d/ matches any digit (0-9)
  // Continue reading as long as we have digits
  while (this.currentChar && /\d/.test(this.currentChar)) {
    result += this.currentChar;  // Add digit to result string
    this.advance();              // Move to next character
  }

Why Build String First: We accumulate digits into a string first, then convert to a number at the end. This is simpler than trying to build the number mathematically, and handles both integers and decimals uniformly.

Step 1.4.2: Handle Decimal Point (Optional) Check if there's a decimal point and read fractional digits:

  // Step 1.4.2.1: Check for decimal point
  // If the next character is '.', we have a floating-point number
  if (this.currentChar === '.') {
    result += '.';      // Add decimal point to result
    this.advance();     // Move past the decimal point

    // Step 1.4.2.2: Read fractional digits
    // After the decimal point, read all consecutive digits
    while (this.currentChar && /\d/.test(this.currentChar)) {
      result += this.currentChar;  // Add digit to result
      this.advance();              // Move to next character
    }
  }

Why Two-Stage Reading: We read the integer part first, then check for a decimal point. If there's a decimal point, we read the fractional part. This two-stage approach correctly handles both integers (no decimal point) and floats (with decimal point).

Step 1.4.3: Convert to Number and Return Token Convert the accumulated string to a number and create a token:

  // Step 1.4.3.1: Convert string to number
  // parseFloat() converts the string representation to an actual number
  // This handles both integers and floating-point numbers
  const numValue = parseFloat(result);

  // Step 1.4.3.2: Create and return NUMBER token
  // The token contains the numeric value, not the string
  return new Token('NUMBER', numValue, this.line, this.column);
}

Why parseFloat(): parseFloat() converts the string to an actual JavaScript number. This ensures the token's value is a number type, not a string. The parser and interpreter can then use it in mathematical operations.

Important Note About Negative Numbers: This method doesn't handle negative numbers. -50 would be tokenized as two tokens: MINUS and NUMBER(50). The parser handles the negation. This keeps the lexer simple and follows common language design patterns.

The Complete Method:

number() {
  let result = '';

  // Read digits
  while (this.currentChar && /\d/.test(this.currentChar)) {
    result += this.currentChar;
    this.advance();
  }

  // Check for decimal point
  if (this.currentChar === '.') {
    result += '.';
    this.advance();

    // Read fractional digits
    while (this.currentChar && /\d/.test(this.currentChar)) {
      result += this.currentChar;
      this.advance();
    }
  }

  // Convert to number
  const numValue = parseFloat(result);
  return new Token('NUMBER', numValue, this.line, this.column);
}

Building This Step by Step:

Create number() method in the Lexer class
Initialize empty result string
Add while loop to read consecutive digits
Append each digit to result and advance
Check if current character is decimal point
If yes, append decimal point and advance
Add while loop to read fractional digits
Append each fractional digit and advance
Convert result string to number using parseFloat()
Create and return NUMBER token with numeric value
This method correctly handles both integers and floating-point numbers

Test:

const lexer = new Lexer('123 45.67');
const token1 = lexer.number(); // NUMBER, 123
lexer.skipWhitespace();
const token2 = lexer.number(); // NUMBER, 45.67

Step 1.5: Implement Identifier and Keyword Reading

What You're Building: A method that reads identifiers (variable names, shape names) and keywords (reserved words like shape, param, if). Identifiers can contain letters, numbers, and underscores. Keywords are special identifiers that have meaning in the language.

Why This Method: Identifiers and keywords both start with a letter or underscore. The lexer reads the entire sequence first, then checks if it's a keyword. If it's a keyword, it returns a keyword token. Otherwise, it returns an IDENTIFIER token.

How to Build It Step by Step:

Step 1.5.1: Read Identifier Characters Start by reading all characters that can be part of an identifier:

identifier() {
  // Step 1.5.1.1: Initialize empty string to accumulate characters
  let result = '';

  // Step 1.5.1.2: Read identifier characters
  // Identifiers can contain: letters (a-z, A-Z), digits (0-9), underscores (_)
  // Continue reading as long as we have valid identifier characters
  while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
    result += this.currentChar;  // Add character to result
    this.advance();              // Move to next character
  }

Why This Pattern: The regex /[a-zA-Z0-9_]/ matches any letter (uppercase or lowercase), any digit, or an underscore. This is the standard pattern for identifiers in most programming languages. The loop continues until we hit a character that can't be part of an identifier (like a space, operator, or punctuation).

Step 1.5.2: Check if It's a Keyword After reading the identifier, check if it matches a keyword:

  // Step 1.5.2.1: Define keywords object
  // This maps keyword strings (lowercase) to their token types
  const keywords = {
    'shape': 'SHAPE',
    'param': 'PARAM',
    'if': 'IF',
    'else': 'ELSE',
    'for': 'FOR',
    'from': 'FROM',
    'to': 'TO',
    'step': 'STEP',
    'union': 'UNION',
    'difference': 'DIFFERENCE',
    'intersection': 'INTERSECTION',
    'add': 'ADD',
    'subtract': 'SUBTRACT',
    'true': 'TRUE',
    'false': 'FALSE',
    'and': 'AND',
    'or': 'OR',
    'not': 'NOT'
  };

Why Keywords Object: This object maps keyword strings to their token types. When we read an identifier, we check if it matches a keyword. If it does, we return a keyword token. Otherwise, we return an IDENTIFIER token.

Step 1.5.3: Determine Token Type and Return Check if the identifier is a keyword, then create the appropriate token:

  // Step 1.5.3.1: Convert to lowercase for comparison
  // Keywords are case-insensitive, so we compare in lowercase
  const lowerResult = result.toLowerCase();

  // Step 1.5.3.2: Check if it's a keyword
  // If it's in the keywords object, use the keyword token type
  // Otherwise, it's a regular identifier
  const tokenType = keywords[lowerResult] || 'IDENTIFIER';

  // Step 1.5.3.3: Create and return token
  // Use the original result (preserving case) as the value
  // This allows identifiers to be case-sensitive while keywords are case-insensitive
  return new Token(tokenType, result, this.line, this.column);
}

Why Case-Insensitive Keywords: Keywords are case-insensitive (shape, Shape, SHAPE all mean the same thing), but identifiers are case-sensitive (circle and Circle are different). We convert to lowercase for keyword comparison, but preserve the original case in the token value.

The Complete Method:

identifier() {
  let result = '';

  // Read identifier characters
  while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
    result += this.currentChar;
    this.advance();
  }

  // Check if it's a keyword
  const keywords = {
    'shape': 'SHAPE',
    'param': 'PARAM',
    'if': 'IF',
    'else': 'ELSE',
    'for': 'FOR',
    'from': 'FROM',
    'to': 'TO',
    'step': 'STEP',
    'union': 'UNION',
    'difference': 'DIFFERENCE',
    'intersection': 'INTERSECTION',
    'add': 'ADD',
    'subtract': 'SUBTRACT',
    'true': 'TRUE',
    'false': 'FALSE',
    'and': 'AND',
    'or': 'OR',
    'not': 'NOT'
  };

  const lowerResult = result.toLowerCase();
  const tokenType = keywords[lowerResult] || 'IDENTIFIER';

  return new Token(tokenType, result, this.line, this.column);
}

Building This Step by Step:

Create identifier() method in the Lexer class
Initialize empty result string
Add while loop to read identifier characters (letters, digits, underscores)
Append each character to result and advance
Define keywords object mapping strings to token types
Convert result to lowercase for keyword comparison
Check if lowercase result is in keywords object
If yes, use keyword token type; otherwise use 'IDENTIFIER'
Create and return token with determined type and original case value
This method correctly distinguishes keywords from identifiers

Step 1.6: Implement String Reading

What You're Building: A method that reads string literals enclosed in double quotes. This includes handling escape sequences like \n (newline), \t (tab), \" (quote), and \\ (backslash). The method extracts the string content and converts escape sequences to their actual characters.

Why This Method: Strings in source code are sequences of characters between quotes. The lexer needs to extract the string content, handle escape sequences, and create a STRING token. This is more complex than reading numbers or identifiers because of escape sequences.

How to Build It Step by Step:

Step 1.6.1: Initialize and Skip Opening Quote Start by skipping the opening double quote:

parseString() {
  // Step 1.6.1.1: Initialize empty string to accumulate characters
  let result = '';

  // Step 1.6.1.2: Skip opening quote
  // We know currentChar is '"' (that's how we got here)
  // Advance past it to start reading the string content
  this.advance();  // Skip opening quote

Why Skip Opening Quote: The opening quote is just a delimiter - it's not part of the string content. We skip it immediately so we can start reading the actual string characters.

Step 1.6.2: Read String Content with Escape Sequence Handling Loop through characters until we find the closing quote:

  // Step 1.6.2.1: Loop until closing quote or end of file
  // Continue reading as long as we haven't hit the closing quote
  while (this.currentChar !== null && this.currentChar !== '"') {
    // Step 1.6.2.2: Check for escape sequence
    // If current character is backslash, it's an escape sequence
    if (this.currentChar === '\\') {
      // Step 1.6.2.3: Skip the backslash
      this.advance();  // Skip backslash

      // Step 1.6.2.4: Handle escape sequences
      // The character after backslash determines what to escape to
      if (this.currentChar === 'n') {
        result += '\n';  // Newline character
      } else if (this.currentChar === 't') {
        result += '\t';  // Tab character
      } else if (this.currentChar === '"') {
        result += '"';   // Literal quote
      } else if (this.currentChar === '\\') {
        result += '\\';  // Literal backslash
      } else {
        // Step 1.6.2.5: Unknown escape sequence
        // If we don't recognize it, use the character as-is
        // This is lenient - some lexers would error here
        result += this.currentChar;
      }
      this.advance();  // Move past the escape character
    } else {
      // Step 1.6.2.6: Regular character (not escaped)
      // Just add it to the result string
      result += this.currentChar;
      this.advance();
    }
  }

Why Handle Escape Sequences: Escape sequences allow users to include special characters in strings. \n becomes a newline, \" becomes a literal quote (so you can have quotes inside strings), etc. Without escape sequences, you couldn't have quotes or newlines in strings.

Step 1.6.3: Validate Closing Quote and Return Token Check that we found a closing quote (not end of file):

  // Step 1.6.3.1: Check if we found closing quote
  // If currentChar is '"', we successfully found the end
  if (this.currentChar === '"') {
    this.advance();  // Skip closing quote
  } else {
    // Step 1.6.3.2: Error - string never closed
    // If we reached end of file without finding closing quote, it's an error
    this.error('Unterminated string literal');
  }

  // Step 1.6.3.3: Create and return STRING token
  // The result string contains the fully parsed string (with escape sequences resolved)
  return new Token('STRING', result, this.line, this.column);
}

Why Validate Closing Quote: If we reach the end of the file without finding a closing quote, the string is unterminated - a syntax error. We need to report this error so the user knows their code is malformed.

The Complete Method:

parseString() {
  let result = '';
  this.advance();  // Skip opening quote

  while (this.currentChar !== null && this.currentChar !== '"') {
    if (this.currentChar === '\\') {
      // Escape sequence
      this.advance();  // Skip backslash
      if (this.currentChar === 'n') {
        result += '\n';
      } else if (this.currentChar === 't') {
        result += '\t';
      } else if (this.currentChar === '"') {
        result += '"';
      } else if (this.currentChar === '\\') {
        result += '\\';
      } else {
        result += this.currentChar;  // Unknown escape, use as-is
      }
      this.advance();
    } else {
      result += this.currentChar;
      this.advance();
    }
  }

  if (this.currentChar === '"') {
    this.advance();  // Skip closing quote
  } else {
    this.error('Unterminated string literal');
  }

  return new Token('STRING', result, this.line, this.column);
}

Building This Step by Step:

Create parseString() method in the Lexer class
Initialize empty result string
Call advance() to skip opening quote
Add while loop that continues until closing quote or end of file
Check if current character is backslash (escape sequence)
If backslash, advance past it and check next character
Handle known escape sequences (\n, \t, \", \\)
For unknown escapes, use character as-is
For regular characters, add to result
Advance after handling each character
After loop, check if we found closing quote
If yes, advance past it; if no, throw error
Create and return STRING token with parsed result
This method correctly handles strings with escape sequences

Step 1.7: Implement the Main Tokenization Loop

What You're Building: The main getNextToken() method that orchestrates the entire tokenization process. This method is called repeatedly to get the next token from the source code. It uses all the helper methods we've built to recognize different token types.

Why This Method: This is the heart of the lexer. It continuously loops through the source code, recognizing different token types and delegating to the appropriate parsing methods. The order of checks is important - some patterns need to be checked before others (e.g., == before =).

How to Build It Step by Step:

Step 1.7.1: Create the Main Loop Structure Start with a loop that continues until end of file:

getNextToken() {
  // Step 1.7.1.1: Main loop - continue until end of input
  // currentChar is null when we've reached the end
  while (this.currentChar !== null) {

Why While Loop: We loop until we've processed all characters. Each iteration produces one token (or skips whitespace/comments). The loop continues until currentChar is null (end of file).

Step 1.7.2: Skip Whitespace and Comments First Handle whitespace and comments before checking for tokens:

    // Step 1.7.2.1: Skip whitespace
    // Whitespace doesn't produce tokens, so skip it and continue
    if (/\s/.test(this.currentChar)) {
      this.skipWhitespace();
      continue;  // Skip to next iteration
    }

    // Step 1.7.2.2: Skip comments
    // Comments also don't produce tokens
    // Check for '//' by looking at current char and next char
    if (this.currentChar === '/' && this.peek() === '/') {
      this.skipComment();
      continue;  // Skip to next iteration
    }

Why Check These First: Whitespace and comments don't produce tokens - they're just formatting. We check for them first and skip them immediately using continue. This keeps the main logic clean.

Step 1.7.3: Handle Numbers, Identifiers, and Strings Check for tokens that need special parsing:

    // Step 1.7.3.1: Numbers
    // If current character is a digit, it's the start of a number
    if (/\d/.test(this.currentChar)) {
      return this.number();  // Parse and return NUMBER token
    }

    // Step 1.7.3.2: Identifiers and keywords
    // If current character is a letter or underscore, it's an identifier/keyword
    if (/[a-zA-Z_]/.test(this.currentChar)) {
      return this.identifier();  // Parse and return IDENTIFIER or keyword token
    }

    // Step 1.7.3.3: Strings
    // If current character is a double quote, it's a string literal
    if (this.currentChar === '"') {
      return this.parseString();  // Parse and return STRING token
    }

    // Step 1.7.3.4: Hex colors
    // If current character is '#', it's a hex color
    if (this.currentChar === '#') {
      return this.parseHexColor();  // Parse and return HEXCOLOR token
    }

Why Return Immediately: These methods (number(), identifier(), parseString(), parseHexColor()) handle the entire token parsing and return a complete token. We return immediately because we've found and parsed a token.

Step 1.7.4: Handle Multi-Character Operators Check for operators that need lookahead:

    // Step 1.7.4.1: Check for == before =
    // Order matters! If we check = first, == becomes two ASSIGN tokens
    if (this.currentChar === '=' && this.peek() === '=') {
      this.advance();  // Skip first =
      this.advance();  // Skip second =
      return new Token('EQUALS', '==', this.line, this.column);
    }

    // Step 1.7.4.2: Single = (assignment)
    if (this.currentChar === '=') {
      this.advance();
      return new Token('ASSIGN', '=', this.line, this.column);
    }

Why Check == Before =: If we checked = first, == would be tokenized as two ASSIGN tokens instead of one EQUALS token. By checking == first, we correctly recognize the equality operator.

Step 1.7.5: Handle Single-Character Tokens Use a lookup table for simple single-character tokens:

    // Step 1.7.5.1: Single character tokens
    // These are simple - one character, one token type
    const singleCharTokens = {
      '{': 'LBRACE',
      '}': 'RBRACE',
      '(': 'LPAREN',
      ')': 'RPAREN',
      '[': 'LBRACKET',
      ']': 'RBRACKET',
      ',': 'COMMA',
      ':': 'COLON',
      ';': 'SEMICOLON',
      '+': 'PLUS',
      '-': 'MINUS',
      '*': 'MULTIPLY',
      '/': 'DIVIDE',
      '%': 'MODULO',
      '<': 'LESS',
      '>': 'GREATER',
      '!': 'NOT'
    };

    // Step 1.7.5.2: Check if current character is a single-char token
    if (singleCharTokens[this.currentChar]) {
      const tokenType = singleCharTokens[this.currentChar];
      const value = this.currentChar;
      this.advance();  // Consume the character
      return new Token(tokenType, value, this.line, this.column);
    }

Why Lookup Table: A lookup table is cleaner than a long chain of if statements. It makes it easy to add new single-character tokens and keeps the code readable.

Step 1.7.6: Handle Unknown Characters and End of File Report errors for unknown characters and return EOF token:

    // Step 1.7.6.1: Unknown character
    // If we get here, we don't recognize this character
    // This is a syntax error
    this.error(`Unexpected character: ${this.currentChar}`);
  }

  // Step 1.7.6.2: End of file
  // If we exit the loop, we've reached the end of input
  // Return EOF token to signal no more tokens
  return new Token('EOF', null, this.line, this.column);
}

Why EOF Token: The parser needs to know when there are no more tokens. The EOF (End Of File) token signals this. It's returned when the loop exits (meaning currentChar is null).

The Complete Method:

getNextToken() {
  while (this.currentChar !== null) {
    // Skip whitespace
    if (/\s/.test(this.currentChar)) {
      this.skipWhitespace();
      continue;
    }

    // Skip comments
    if (this.currentChar === '/' && this.peek() === '/') {
      this.skipComment();
      continue;
    }

    // Numbers
    if (/\d/.test(this.currentChar)) {
      return this.number();
    }

    // Identifiers and keywords
    if (/[a-zA-Z_]/.test(this.currentChar)) {
      return this.identifier();
    }

    // Strings
    if (this.currentChar === '"') {
      return this.parseString();
    }

    // Hex colors
    if (this.currentChar === '#') {
      return this.parseHexColor();
    }

    // Operators and punctuation
    if (this.currentChar === '=' && this.peek() === '=') {
      this.advance();
      this.advance();
      return new Token('EQUALS', '==', this.line, this.column - 1);
    }

    if (this.currentChar === '=') {
      this.advance();
      return new Token('ASSIGN', '=', this.line, this.column);
    }

    // Single character tokens
    const singleCharTokens = {
      '{': 'LBRACE',
      '}': 'RBRACE',
      '(': 'LPAREN',
      ')': 'RPAREN',
      '[': 'LBRACKET',
      ']': 'RBRACKET',
      ',': 'COMMA',
      ':': 'COLON',
      ';': 'SEMICOLON',
      '+': 'PLUS',
      '-': 'MINUS',
      '*': 'MULTIPLY',
      '/': 'DIVIDE',
      '%': 'MODULO',
      '<': 'LESS',
      '>': 'GREATER',
      '!': 'NOT'
    };

    if (singleCharTokens[this.currentChar]) {
      const tokenType = singleCharTokens[this.currentChar];
      const value = this.currentChar;
      this.advance();
      return new Token(tokenType, value, this.line, this.column);
    }

    // Unknown character
    this.error(`Unexpected character: ${this.currentChar}`);
  }

  // End of file
  return new Token('EOF', null, this.line, this.column);
}

Building This Step by Step:

Create getNextToken() method in the Lexer class
Add while loop that continues until currentChar is null
Check for whitespace first, skip it and continue
Check for comments, skip them and continue
Check for numbers (digits), return number() result
Check for identifiers/keywords (letters/underscore), return identifier() result
Check for strings (double quote), return parseString() result
Check for hex colors (hash), return parseHexColor() result
Check for multi-character operators (== before =)
Create single-character tokens lookup table
Check lookup table for single-character tokens
If found, create token and return
If unknown character, throw error
After loop, return EOF token
This method orchestrates the entire tokenization process

Test the complete lexer:

const code = 'shape circle c1 { radius: 50 }';
const lexer = new Lexer(code);
let token = lexer.getNextToken();
while (token.type !== 'EOF') {
  console.log(token);
  token = lexer.getNextToken();
}
// Should output: SHAPE, IDENTIFIER(circle), IDENTIFIER(c1), LBRACE, 
//                IDENTIFIER(radius), COLON, NUMBER(50), RBRACE

Part 2: Building the Parser

Step 2.1: Create the Basic Parser Structure

File: src/parser.mjs

What You're Building: The Parser class is the foundation of the parsing system. It takes tokens from the lexer and builds an Abstract Syntax Tree (AST). This step creates the basic structure with token lookahead, error handling, and the eat() method for consuming tokens.

Why This Structure: The parser uses a "lookahead" approach - it always has the next token ready in currentToken. This allows it to make decisions based on what's coming next. The eat() method ensures tokens are consumed in the correct order according to the grammar.

How to Build It Step by Step:

Step 2.1.1: Create the Parser Class and Constructor Start with the class definition and constructor:

import { Lexer } from './lexer.mjs';

export class Parser {
  constructor(lexer) {
    // Step 2.1.1.1: Store the lexer
    // The parser needs the lexer to get tokens
    this.lexer = lexer;

    // Step 2.1.1.2: Get the first token (lookahead)
    // We always keep one token ahead - this is called "lookahead"
    // This allows us to peek at the next token without consuming it
    this.currentToken = this.lexer.getNextToken();
  }

Why Lookahead: The parser needs to know what token is coming next to make parsing decisions. For example, to parse shape circle c1, we need to see the SHAPE token first, then the IDENTIFIER(circle), etc. By keeping currentToken always set to the next token, we can check it before consuming it.

Step 2.1.2: Implement the error() Method Create a method for reporting parsing errors:

  error(message) {
    // Step 2.1.2.1: Get position information from current token
    // If currentToken exists, use its line/column
    // Otherwise, use 'unknown' (shouldn't happen, but defensive)
    const line = this.currentToken ? this.currentToken.line : 'unknown';
    const column = this.currentToken ? this.currentToken.column : 'unknown';

    // Step 2.1.2.2: Throw error with position information
    throw new Error(`Parser error at line ${line}, col ${column}: ${message}`);
  }

Why This Error Method: Parsing errors need to include position information so users know where the problem is. We get the position from currentToken, which is the token that caused the error.

Step 2.1.3: Implement the eat() Method This is the core method for consuming tokens:

  eat(tokenType) {
    // Step 2.1.3.1: Check if current token matches expected type
    // The tokenType parameter is what we expect to see
    if (this.currentToken.type === tokenType) {
      // Step 2.1.3.2: Token matches - consume it
      // Store the token (we might need its value)
      const token = this.currentToken;

      // Step 2.1.3.3: Move to next token
      // Get the next token from the lexer and update currentToken
      this.currentToken = this.lexer.getNextToken();

      // Step 2.1.3.4: Return the consumed token
      // Sometimes we need the token's value, so we return it
      return token;
    } else {
      // Step 2.1.3.5: Token doesn't match - syntax error
      // This means the code doesn't match the grammar
      this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
    }
  }
}

Why This Method: The eat() method is the workhorse of the parser. It ensures tokens are consumed in the correct order. If the current token doesn't match what we expect, it's a syntax error. This enforces the grammar rules strictly.

The Complete Basic Structure:

import { Lexer } from './lexer.mjs';

export class Parser {
  constructor(lexer) {
    this.lexer = lexer;
    this.currentToken = this.lexer.getNextToken();  // Look ahead one token
  }

  error(message) {
    const line = this.currentToken ? this.currentToken.line : 'unknown';
    const column = this.currentToken ? this.currentToken.column : 'unknown';
    throw new Error(`Parser error at line ${line}, col ${column}: ${message}`);
  }

  eat(tokenType) {
    if (this.currentToken.type === tokenType) {
      const token = this.currentToken;
      this.currentToken = this.lexer.getNextToken();  // Move to next token
      return token;
    } else {
      this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
    }
  }
}

Building This Step by Step:

Create new file src/parser.mjs
Import Lexer class
Export Parser class
Create constructor that takes lexer parameter
Store lexer as instance property
Get first token and store in currentToken (lookahead)
Create error() method that throws error with position
Get line and column from currentToken
Create eat() method that consumes tokens
Check if currentToken matches expected type
If matches, store token, advance to next token, return token
If doesn't match, throw error
This basic structure provides the foundation for all parsing

Step 2.2: Implement the Main Parse Method

What You're Building: The main parse() method that orchestrates the parsing of an entire program. It repeatedly calls parseStatement() to parse each statement in the program, building an array of AST nodes that represent the complete program structure.

Why This Method: A program is a sequence of statements. This method loops through all statements, parsing each one, and returns the complete AST. This is the entry point for parsing - you call parse() on a parser instance to get the AST for a program.

How to Build It Step by Step:

Step 2.2.1: Initialize Statements Array Start with an empty array to collect parsed statements:

parse() {
  // Step 2.2.1.1: Initialize statements array
  // This will hold all the AST nodes representing statements
  const statements = [];

Why Array: A program consists of multiple statements. We need an array to collect all of them. Each statement becomes an AST node in this array.

Step 2.2.2: Loop Through All Statements Parse statements until end of file:

  // Step 2.2.2.1: Loop until end of file
  // Continue parsing as long as we haven't reached EOF token
  while (this.currentToken.type !== 'EOF') {
    // Step 2.2.2.2: Parse one statement
    // parseStatement() will determine what type of statement it is
    // and call the appropriate parsing method
    statements.push(this.parseStatement());
  }

Why While Loop: We continue parsing statements until we reach the EOF token. Each call to parseStatement() parses one complete statement and advances the token stream. The loop continues until all statements are parsed.

Step 2.2.3: Return Complete AST Return the array of statement AST nodes:

  // Step 2.2.3.1: Return array of AST nodes
  // This represents the complete program structure
  return statements;  // Array of AST nodes
}

Why Return Array: The array of AST nodes represents the complete program. Each node is a statement (shape definition, parameter, etc.). The interpreter will use this AST to execute the program.

The Complete Method:

parse() {
  const statements = [];

  while (this.currentToken.type !== 'EOF') {
    statements.push(this.parseStatement());
  }

  return statements;  // Array of AST nodes
}

Building This Step by Step:

Create parse() method in the Parser class
Initialize empty statements array
Add while loop that continues until EOF token
Call parseStatement() to parse one statement
Push parsed statement AST node to array
After loop, return statements array
This method orchestrates the parsing of an entire program

Step 2.3: Implement Statement Parsing

What You're Building: A dispatcher method that determines which type of statement to parse based on the current token. This method routes to the appropriate parsing method for each statement type.

Why This Method: Different statements start with different tokens. param starts with PARAM, shape starts with SHAPE, etc. This method checks the current token and routes to the correct parsing method. This keeps the parsing logic organized and modular.

How to Build It Step by Step:

Step 2.3.1: Create the Dispatcher Method Use a switch statement to route based on token type:

parseStatement() {
  // Step 2.3.1.1: Check current token type
  // The token type tells us what kind of statement this is
  switch (this.currentToken.type) {
    // Step 2.3.1.2: Handle PARAM statements
    // param name = value
    case 'PARAM':
      return this.parseParam();

    // Step 2.3.1.3: Handle SHAPE statements
    // shape circle c1 { radius: 50 }
    case 'SHAPE':
      return this.parseShape();

    // Step 2.3.1.4: Handle boolean operations
    // union shape1 shape2
    case 'UNION':
    case 'DIFFERENCE':
    case 'INTERSECTION':
      return this.parseBooleanOperation();

    // Step 2.3.1.5: Handle IF statements
    // if condition { ... }
    case 'IF':
      return this.parseIfStatement();

    // Step 2.3.1.6: Handle FOR loops
    // for i from 1 to 10 { ... }
    case 'FOR':
      return this.parseForLoop();

    // Step 2.3.1.7: Unknown statement type
    // If we don't recognize the token type, it's a syntax error
    default:
      this.error(`Unexpected token: ${this.currentToken.type}`);
  }
}

Why Switch Statement: A switch statement is clean and efficient for routing based on token type. Each case handles a different statement type. The default case catches syntax errors (unexpected tokens).

Why Return Immediately: Each parsing method (parseParam(), parseShape(), etc.) parses the complete statement and returns an AST node. We return immediately because we've successfully parsed a statement.

The Complete Method:

parseStatement() {
  switch (this.currentToken.type) {
    case 'PARAM':
      return this.parseParam();
    case 'SHAPE':
      return this.parseShape();
    case 'UNION':
    case 'DIFFERENCE':
    case 'INTERSECTION':
      return this.parseBooleanOperation();
    case 'IF':
      return this.parseIfStatement();
    case 'FOR':
      return this.parseForLoop();
    default:
      this.error(`Unexpected token: ${this.currentToken.type}`);
  }
}

Building This Step by Step:

Create parseStatement() method in the Parser class
Add switch statement on currentToken.type
Add case for 'PARAM', call parseParam()
Add case for 'SHAPE', call parseShape()
Add cases for boolean operations, call parseBooleanOperation()
Add case for 'IF', call parseIfStatement()
Add case for 'FOR', call parseForLoop()
Add default case that throws error for unknown tokens
This method routes to the appropriate statement parser

Step 2.4: Implement Shape Parsing

What You're Building: A method that parses shape definitions like shape circle c1 { radius: 50, x: 0 }. This method consumes tokens in a specific order, extracts the shape type, name, and parameters, and returns an AST node representing the shape.

Why This Method: Shape definitions have a specific grammar: shape keyword, shape type, name, opening brace, properties (key: value pairs), closing brace. This method enforces that grammar by consuming tokens in the correct order.

How to Build It Step by Step:

Step 2.4.1: Consume Shape Keyword and Extract Type Start by consuming the shape keyword and getting the shape type:

parseShape() {
  // Step 2.4.1.1: Consume 'shape' keyword
  // This ensures we're actually parsing a shape statement
  this.eat('SHAPE');

  // Step 2.4.1.2: Get shape type
  // The next token should be an identifier (like 'circle', 'rectangle')
  const shapeType = this.currentToken.value;

  // Step 2.4.1.3: Consume the shape type identifier
  this.eat('IDENTIFIER');

Why Get Value Before Eating: We need to capture the token's value before consuming it. Once we call eat(), the token is consumed and we move to the next token. So we get the value first, then consume.

Step 2.4.2: Extract Shape Name Get the shape name (the identifier after the shape type):

  // Step 2.4.2.1: Get shape name
  // The next token should be an identifier (like 'c1', 'r1')
  const name = this.currentToken.value;

  // Step 2.4.2.2: Consume the shape name identifier
  this.eat('IDENTIFIER');

Why Two Identifiers: The first identifier is the shape type (circle, rectangle). The second identifier is the shape name (c1, r1). Both are required by the grammar.

Step 2.4.3: Consume Opening Brace and Parse Properties Start parsing the parameter list:

  // Step 2.4.3.1: Consume opening brace
  // The '{' marks the start of the parameter list
  this.eat('LBRACE');

  // Step 2.4.3.2: Initialize parameters object
  // This will hold all the shape's properties
  const params = {};

  // Step 2.4.3.3: Loop through properties
  // Continue until we hit the closing brace
  while (this.currentToken.type !== 'RBRACE') {
    // Step 2.4.3.4: Get property key (name)
    const key = this.currentToken.value;
    this.eat('IDENTIFIER');  // Consume property name

    // Step 2.4.3.5: Consume colon
    // Properties use key: value format
    this.eat('COLON');

    // Step 2.4.3.6: Parse property value
    // Values can be expressions (numbers, identifiers, etc.)
    const value = this.parseExpression();

    // Step 2.4.3.7: Store property in params object
    params[key] = value;

    // Step 2.4.3.8: Optional comma
    // Properties can be separated by commas (optional)
    if (this.currentToken.type === 'COMMA') {
      this.eat('COMMA');
    }
  }

Why While Loop: Properties continue until the closing brace. We loop, parsing each property (key: value pair), until we hit RBRACE. The comma is optional - it's just for readability.

Step 2.4.4: Consume Closing Brace and Return AST Node Finish parsing and return the shape AST node:

  // Step 2.4.4.1: Consume closing brace
  // This marks the end of the parameter list
  this.eat('RBRACE');

  // Step 2.4.4.2: Return shape AST node
  // This node contains all the information about the shape
  return {
    type: 'shape',        // Node type
    shapeType: shapeType, // Shape type (circle, rectangle, etc.)
    name: name,           // Shape name (c1, r1, etc.)
    params: params        // Parameters object (radius: 50, etc.)
  };
}

Why This AST Structure: The AST node contains all the information needed to create the shape. The interpreter will use this node to create the actual shape object. The structure is clear and easy to work with.

The Complete Method:

parseShape() {
  this.eat('SHAPE');  // Consume 'shape' keyword

  const shapeType = this.currentToken.value;
  this.eat('IDENTIFIER');  // Consume shape type

  const name = this.currentToken.value;
  this.eat('IDENTIFIER');  // Consume shape name

  this.eat('LBRACE');  // Consume '{'

  // Parse properties
  const params = {};
  while (this.currentToken.type !== 'RBRACE') {
    const key = this.currentToken.value;
    this.eat('IDENTIFIER');  // Property name
    this.eat('COLON');       // Consume ':'
    const value = this.parseExpression();  // Parse value
    params[key] = value;

    // Optional comma
    if (this.currentToken.type === 'COMMA') {
      this.eat('COMMA');
    }
  }

  this.eat('RBRACE');  // Consume '}'

  return {
    type: 'shape',
    shapeType: shapeType,
    name: name,
    params: params
  };
}

Building This Step by Step:

Create parseShape() method in the Parser class
Call eat('SHAPE') to consume shape keyword
Get shape type from currentToken.value
Call eat('IDENTIFIER') to consume shape type
Get shape name from currentToken.value
Call eat('IDENTIFIER') to consume shape name
Call eat('LBRACE') to consume opening brace
Initialize empty params object
Add while loop that continues until RBRACE
Get property key from currentToken.value
Call eat('IDENTIFIER') to consume property name
Call eat('COLON') to consume colon
Call parseExpression() to parse property value
Store key-value pair in params object
Check for optional comma, consume if present
After loop, call eat('RBRACE') to consume closing brace
Return shape AST node with type, shapeType, name, and params
This method correctly parses shape definitions

Step 2.5: Implement Expression Parsing with Precedence

parseExpression() {
  let node = this.parseTerm();  // Start with terms (higher precedence)

  // Handle + and - (lowest precedence)
  while (this.currentToken.type === 'PLUS' || this.currentToken.type === 'MINUS') {
    const operator = this.currentToken.type;
    this.eat(operator);
    node = {
      type: 'binary_op',
      operator: operator.toLowerCase(),
      left: node,
      right: this.parseTerm()
    };
  }

  return node;
}

parseTerm() {
  let node = this.parseFactor();  // Start with factors (highest precedence)

  // Handle * and / (higher precedence than + and -)
  while (this.currentToken.type === 'MULTIPLY' || this.currentToken.type === 'DIVIDE') {
    const operator = this.currentToken.type;
    this.eat(operator);
    node = {
      type: 'binary_op',
      operator: operator.toLowerCase(),
      left: node,
      right: this.parseFactor()
    };
  }

  return node;
}

parseFactor() {
  const token = this.currentToken;

  // Numbers
  if (token.type === 'NUMBER') {
    this.eat('NUMBER');
    return { type: 'number', value: token.value };
  }

  // Identifiers (parameters, shape references)
  if (token.type === 'IDENTIFIER') {
    this.eat('IDENTIFIER');
    return { type: 'identifier', value: token.value };
  }

  // Strings
  if (token.type === 'STRING') {
    this.eat('STRING');
    return { type: 'string', value: token.value };
  }

  // Parentheses
  if (token.type === 'LPAREN') {
    this.eat('LPAREN');
    const expr = this.parseExpression();
    this.eat('RPAREN');
    return expr;
  }

  // Unary minus
  if (token.type === 'MINUS') {
    this.eat('MINUS');
    return {
      type: 'unary_op',
      operator: 'minus',
      operand: this.parseFactor()
    };
  }

  this.error(`Unexpected token in expression: ${token.type}`);
}

Test the parser:

const code = 'shape circle c1 { radius: 50 + 10 }';
const lexer = new Lexer(code);
const parser = new Parser(lexer);
const ast = parser.parse();
console.log(JSON.stringify(ast, null, 2));
// Should output AST with shape node containing binary_op expression

Part 3: Building the Interpreter

Step 3.1: Create the Environment

File: src/environment.mjs

export class Environment {
  constructor() {
    this.parameters = new Map();  // Parameter name → value
    this.shapes = new Map();      // Shape name → shape object
    this.layers = new Map();       // Layer name → layer object
    this.functions = new Map();    // Function name → function definition
  }

  setParameter(name, value) {
    this.parameters.set(name, value);
  }

  getParameter(name) {
    if (!this.parameters.has(name)) {
      throw new Error(`Parameter not found: ${name}`);
    }
    return this.parameters.get(name);
  }

  createShapeWithName(type, name, params) {
    const shape = {
      type: type,
      shapeType: type,
      params: params,
      transform: {
        position: params.position || [params.x || 0, params.y || 0],
        rotation: params.rotation || 0,
        scale: [1, 1]
      }
    };
    this.shapes.set(name, shape);
    return shape;
  }
}

Step 3.2: Create the Basic Interpreter Structure

File: src/interpreter.mjs

import { Environment } from './environment.mjs';

export class Interpreter {
  constructor() {
    this.env = new Environment();
    this.constraints = [];
    this.currentLoopCounter = undefined;
  }

  interpret(ast) {
    for (const node of ast) {
      this.evaluateNode(node);
    }

    return {
      parameters: this.env.parameters,
      shapes: this.env.shapes,
      layers: this.env.layers,
      functions: this.env.functions,
      constraints: this.constraints
    };
  }

  evaluateNode(node) {
    switch (node.type) {
      case 'param':
        return this.evaluateParam(node);
      case 'shape':
        return this.evaluateShape(node);
      case 'if_statement':
        return this.evaluateIfStatement(node);
      case 'for_loop':
        return this.evaluateForLoop(node);
      default:
        throw new Error(`Unknown node type: ${node.type}`);
    }
  }
}

Step 3.3: Implement Parameter Evaluation

evaluateParam(node) {
  const value = this.evaluateExpression(node.value);
  this.env.setParameter(node.name, value);
  return value;
}

Step 3.4: Implement Expression Evaluation

evaluateExpression(node) {
  switch (node.type) {
    case 'number':
      return node.value;

    case 'string':
      return node.value;

    case 'identifier':
      // Check if it's a parameter
      if (this.env.parameters.has(node.value)) {
        return this.env.getParameter(node.value);
      }
      // Check if it's a shape reference
      if (this.env.shapes.has(node.value)) {
        return node.value;  // Return name as string for boolean ops
      }
      throw new Error(`Undefined identifier: ${node.value}`);

    case 'binary_op':
      const left = this.evaluateExpression(node.left);
      const right = this.evaluateExpression(node.right);
      return this.applyBinaryOperator(node.operator, left, right);

    case 'unary_op':
      const operand = this.evaluateExpression(node.operand);
      if (node.operator === 'minus') {
        return -operand;
      }
      throw new Error(`Unknown unary operator: ${node.operator}`);

    default:
      throw new Error(`Unknown expression type: ${node.type}`);
  }
}

applyBinaryOperator(op, left, right) {
  switch (op) {
    case '+': return left + right;
    case '-': return left - right;
    case '*': return left * right;
    case '/':
      if (right === 0) throw new Error('Division by zero');
      return left / right;
    case '%': return left % right;
    case '==': return left === right;
    case '!=': return left !== right;
    case '<': return left < right;
    case '<=': return left <= right;
    case '>': return left > right;
    case '>=': return left >= right;
    case 'and': return left && right;
    case 'or': return left || right;
    default:
      throw new Error(`Unknown operator: ${op}`);
  }
}

Step 3.5: Implement Shape Evaluation

evaluateShape(node) {
  // Generate unique name (handle loops)
  let shapeName = node.name;
  if (this.currentLoopCounter !== undefined) {
    shapeName = `${shapeName}_${this.currentLoopCounter}`;
  }

  // Evaluate all parameter expressions
  const params = {};
  for (const [key, expr] of Object.entries(node.params)) {
    const evaluatedValue = this.evaluateExpression(expr);
    params[key] = evaluatedValue;
  }

  // Create the shape
  const shape = this.env.createShapeWithName(node.shapeType, shapeName, params);
  return shape;
}

Test the complete system:

const code = 'param size 100\nshape circle c1 { radius: size }';
const lexer = new Lexer(code);
const parser = new Parser(lexer);
const ast = parser.parse();
const interpreter = new Interpreter();
const result = interpreter.interpret(ast);

console.log('Parameters:', Array.from(result.parameters.entries()));
console.log('Shapes:', Array.from(result.shapes.entries()));
// Should show parameter 'size' = 100 and shape 'c1' with radius 100

Common Issues and Fixes

Issue: Lexer stops early

Check advance() is called after reading each character
Check loop conditions (should continue until null)

Issue: Parser doesn't handle precedence

Verify parseExpression() calls parseTerm(), which calls parseFactor()
Check operator handling is at correct precedence level

Issue: Interpreter can't find parameters

Check parameters are stored before they're used
Check parameter lookup in evaluateExpression()
Verify parameter names match exactly

Issue: Shapes not created

Check evaluateShape() is being called
Check shapes are stored in env.shapes
Verify shape names are unique