The Language System
Building the Lexer From Scratch
The lexer's job is simple: read characters and group them into meaningful chunks called tokens. But let's actually build it step by step.
What It Does
You give it text, it gives you tokens. That's it.
For example, shape circle c1 { radius: 50 } becomes:
SHAPEtoken (the keyword "shape")IDENTIFIERtoken with value "circle"IDENTIFIERtoken with value "c1"LBRACEtoken (the "{")IDENTIFIERtoken with value "radius"COLONtokenNUMBERtoken with value 50RBRACEtoken (the "}")
Building the Basic Structure From Scratch
To build a lexer from scratch, you need to track where you are in the input string and what character you're currently looking at. Here's how to set it up step by step:
Step 1: Create the Lexer Class Start with a class that holds the input and tracks position:
export class Lexer {
constructor(input) {
// Step 1.1: Store the input string
// This is the source code we're going to tokenize.
// We store it as a property so all methods can access it.
this.input = input; // The source code string
// Step 1.2: Track position in the string
// position is the index into the input string (0-based).
// We start at 0 because we haven't read anything yet.
this.position = 0; // Current character position
// Step 1.3: Track line number for error messages
// When the lexer encounters an error, we need to tell the user
// where it happened. Line numbers start at 1 (not 0) because that's
// what users expect ("error on line 5" not "error on line 4").
this.line = 1; // Current line number
// Step 1.4: Track column number for error messages
// Column tells us which character on the line has the error.
// Like line, it starts at 1 (first character is column 1).
this.column = 1; // Current column number
// Step 1.5: Store the current character
// Instead of always accessing this.input[this.position], we store
// the current character in a property. This makes the code cleaner
// and faster (one property access instead of array access).
// If input is empty, this.input[0] is undefined, so we use || null
// to convert undefined to null (which we use to mean "end of input").
this.currentChar = this.input[0] || null; // The character we're looking at
}
}
Why Track Line/Column? When parsing fails, you need to tell the user "error on line 5, column 12". Without this, errors are useless. Users can't fix errors if they don't know where they are. The line/column tracking is essential for good error messages.
How It Works:
inputis the source code string (e.g.,"shape circle c1 { radius: 50 }")positionis the index into that string (0 = first character, 1 = second character, etc.)lineandcolumntrack position for error messagescurrentCharis a convenience - instead of writingthis.input[this.position]everywhere, we store it
Building This Step by Step:
- Create a new file
lexer.mjs - Export a class called
Lexer - Add a constructor that takes
inputas a parameter - Store
inputasthis.input - Initialize
positionto 0 - Initialize
lineto 1 - Initialize
columnto 1 - Set
currentCharto the first character (or null if input is empty)
Building the Main Tokenization Loop From Scratch
The core method is getNextToken(). This is the heart of the lexer - it reads characters and returns tokens. Here's how to build it step by step:
Step 1: Create the Main Loop Structure
The loop continues until we reach the end of input (when currentChar is null):
getNextToken() {
// Step 1.1: Loop while there are characters to read
// currentChar is null when we've reached the end of the input string.
// We keep looping until we've processed everything.
while (this.currentChar !== null) {
// Inside the loop, we check what kind of character we're looking at
// and handle it appropriately. Each check is in order of likelihood
// (most common first) for performance.
// Step 1.2: Skip whitespace (spaces, tabs, newlines)
// Whitespace doesn't create tokens - it just separates other tokens.
// We check for whitespace first because it's very common.
// /\s/ is a regex that matches any whitespace character.
if (/\s/.test(this.currentChar)) {
this.skipWhitespace(); // Skip all consecutive whitespace
continue; // Go to next iteration (don't create a token)
// continue skips the rest of the loop body and starts the next
// iteration. This means we don't try to create a token from whitespace.
}
// Step 1.3: Skip comments (lines starting with //)
// Comments also don't create tokens. We check if current character
// is '/' and the next character (peek) is also '/'.
// peek() looks at the next character without consuming it.
if (this.currentChar === '/' && this.peek() === '/') {
this.skipComment(); // Skip the entire comment line
continue; // Go to next iteration (don't create a token)
}
// Step 1.4: Handle numbers (digits 0-9)
// If we see a digit, we know we're starting a number token.
// /\d/ matches any digit (0-9).
if (/\d/.test(this.currentChar)) {
return this.number(); // Read the number and return a NUMBER token
// We return immediately because number() handles reading all digits
// and returns a complete token.
}
// Step 1.5: Handle identifiers and keywords (letters and underscores)
// Identifiers are names like "circle", "c1", "radius".
// Keywords are special identifiers like "shape", "param".
// We check for letters or underscore first character.
// /[a-zA-Z_]/ matches any letter (upper or lower) or underscore.
if (/[a-zA-Z_]/.test(this.currentChar)) {
return this.identifier(); // Read the identifier and return a token
// identifier() will check if it's a keyword or regular identifier
// and return the appropriate token type.
}
// Step 1.6: Handle strings (text in quotes)
// Strings start with a double quote character.
if (this.currentChar === '"') {
return this.parseString(); // Read the string and return a STRING token
}
// Step 1.7: Handle hex colors (like #FF0000)
// Hex colors start with a '#' character.
if (this.currentChar === '#') {
return this.parseHexColor(); // Read the hex color and return a COLOR token
}
// Step 1.8: Handle operators and punctuation
// This includes things like '{', '}', ':', ',', etc.
// We'll implement this next, but for now we'll handle the common ones.
// ... (we'll get to this in detail)
}
// Step 1.9: Return EOF token when input is exhausted
// When the loop exits (currentChar is null), we've read everything.
// Return an EOF (End Of File) token to signal we're done.
return new Token('EOF', null, this.line, this.column);
// EOF token has no value (null) but has line/column for consistency.
}
The Pattern Explained: The lexer follows a simple pattern:
- Check what kind of character we're looking at
- Call the appropriate method to read that token type
- Return the token immediately
- Skip whitespace/comments without creating tokens (use
continue)
Why This Order Matters:
- Whitespace is checked first because it's most common
- Comments are checked early because they're also common
- Numbers, identifiers, strings are checked in order of likelihood
- Operators come last because they're single characters (fast to check)
Building This Step by Step:
- Create the
getNextToken()method in your Lexer class - Add the
whileloop that continues whilecurrentChar !== null - Add whitespace check first (most common case)
- Add comment check second
- Add number check (if digit, call
number()) - Add identifier check (if letter/underscore, call
identifier()) - Add string check (if quote, call
parseString()) - Add hex color check (if '#', call
parseHexColor()) - Add EOF return at the end (when loop exits)
- Implement each helper method (
skipWhitespace,number,identifier, etc.) one by one
Building Helper Methods From Scratch
You need several helper methods to make the lexer work. Here's how to build each one step by step:
Step 1: Build the advance() Method
This is the most important helper - it moves forward one character and updates all tracking:
advance() {
// Step 1.1: Move position forward by one
// This consumes the current character and moves to the next one.
this.position++;
// Step 1.2: Check if we've reached the end of input
// If position is >= input.length, we've read all characters.
if (this.position >= this.input.length) {
this.currentChar = null; // End of input - set to null to signal EOF
// We don't update column here because we're at EOF.
} else {
// Step 1.3: We're not at EOF, so get the next character
// Read the character at the new position.
this.currentChar = this.input[this.position];
// Step 1.4: Update column number
// Column tracks horizontal position on the current line.
// We increment it because we moved one character to the right.
this.column++;
}
}
Why advance() is Critical:
Every time you consume a character (read it and process it), you must call advance() to move forward. Without it, you'd be stuck reading the same character forever. The method also handles end-of-input detection by setting currentChar to null.
Step 2: Build the peek() Method
This looks ahead without consuming the character:
peek() {
// Step 2.1: Check if there's a next character
// We look at position + 1 (the next character) without moving.
// If position + 1 is >= input.length, there's no next character.
if (this.position + 1 >= this.input.length) {
return null; // No next character - return null
}
// Step 2.2: Return the next character without consuming it
// We read input[position + 1] but don't call advance().
// This lets us "look ahead" to decide what to do next.
return this.input[this.position + 1];
}
Why peek() is Useful:
Sometimes you need to check the next character before deciding what to do. For example, to distinguish = from ==, you peek at the next character. If it's also =, you have ==. If not, you have just =. This is called "lookahead" in parsing.
Step 3: Build the skipWhitespace() Method
This skips all consecutive whitespace characters:
skipWhitespace() {
// Step 3.1: Loop while current character is whitespace
// /\s/ matches any whitespace: space, tab, newline, etc.
// We continue until we hit a non-whitespace character.
while (this.currentChar && /\s/.test(this.currentChar)) {
// Step 3.2: Handle newlines specially
// Newlines change both line and column.
if (this.currentChar === '\n') {
this.line++; // Move to next line
this.column = 1; // Reset column to 1 (start of new line)
}
// Step 3.3: Move forward one character
// advance() updates position, currentChar, and column.
// For newlines, we already updated line and reset column above.
this.advance();
}
// When loop exits, currentChar is either null (EOF) or a non-whitespace character.
}
Why Skip Whitespace:
Whitespace doesn't create tokens - it just separates them. shape circle c1 has whitespace between tokens, but we don't want whitespace tokens. We skip all whitespace and continue to the next meaningful character.
Step 4: Build the skipComment() Method
This skips single-line comments (// style):
skipComment() {
// Step 4.1: Skip the first '/' character
// We already know currentChar is '/' (checked in getNextToken).
// We need to consume it.
this.advance(); // Skip first /
// Step 4.2: Skip the second '/' character
// peek() already confirmed the next char is '/'.
// Now we consume it.
this.advance(); // Skip second /
// Step 4.3: Skip everything until newline or EOF
// Comments run until the end of the line.
// We loop until we hit '\n' (newline) or null (EOF).
while (this.currentChar !== null && this.currentChar !== '\n') {
this.advance(); // Skip each character in the comment
}
// When loop exits, we're at the newline (or EOF).
// The newline will be handled by skipWhitespace() if called next.
}
Why Comments Need Special Handling: Comments are like whitespace - they don't create tokens. But they're more complex because they can span multiple characters. We need to skip everything until the end of the line.
Building These Methods Step by Step:
- Start with
advance()- it's the foundation - Add
peek()- needed for lookahead - Add
skipWhitespace()- needed to skip spaces - Add
skipComment()- needed to skip comments
Each method builds on the previous ones. advance() is used by all the others.
Building the Identifier and Keyword Reader From Scratch
When you see a letter or underscore, you need to read until you hit something that can't be part of an identifier. This method handles both regular identifiers (like variable names) and keywords (like "shape", "param").
How to Build It Step by Step:
Step 1: Create the Method Structure Start with an empty method that will accumulate characters:
identifier() {
let result = '';
// We'll build the identifier string character by character
// result starts empty and we'll append characters to it
}
Step 2: Read Valid Identifier Characters Identifiers can contain letters (a-z, A-Z), digits (0-9), and underscores (_). Keep reading while the current character matches this pattern:
identifier() {
let result = '';
// Keep reading while it's a valid identifier character
// /[a-zA-Z0-9_]/ matches: letters (upper or lower), digits, or underscore
// The loop continues until we hit a character that's not valid for identifiers
// (like a space, operator, punctuation, etc.)
while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
result += this.currentChar; // Append current character to result
this.advance(); // Move to next character
}
// When loop exits, we've read the complete identifier
// currentChar is now something that can't be part of an identifier
}
Why This Pattern Works:
- We start with an empty string
- Each iteration adds one character and moves forward
- The loop stops when we hit an invalid character (space, operator, etc.)
- At the end,
resultcontains the complete identifier
Step 3: Check if It's a Keyword After reading the identifier, check if it matches a keyword. Keywords are special identifiers that have meaning in the language:
identifier() {
let result = '';
// Read the identifier (steps 1-2 above)
while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check if it's a keyword
// Keywords are special identifiers that have language meaning
const keywords = {
'shape': 'SHAPE', // Keyword for creating shapes
'param': 'PARAM', // Keyword for defining parameters
'if': 'IF', // Keyword for conditional statements
'for': 'FOR', // Keyword for loops
// ... add all keywords your language supports
};
// Determine token type
// If it's a keyword, return the keyword token type
// Otherwise, it's a regular identifier
const lowerResult = result.toLowerCase(); // Convert to lowercase for comparison
const tokenType = keywords[lowerResult] || 'IDENTIFIER';
// If lowerResult is in keywords map, use that token type
// Otherwise, default to 'IDENTIFIER'
return new Token(tokenType, result, this.line, this.column);
}
Why Case-Insensitive for Keywords:
Users type SHAPE, Shape, shape - they should all work. The language should be forgiving about case for keywords. But regular identifiers like myShape vs myshape are different (case-sensitive). This gives flexibility: keywords work in any case, but variable names are case-sensitive.
The Complete Method:
identifier() {
let result = '';
// Keep reading while it's a valid identifier character
while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check if it's a keyword
const keywords = {
'shape': 'SHAPE',
'param': 'PARAM',
'if': 'IF',
'for': 'FOR',
// ... all keywords
};
// Case-insensitive check
const lowerResult = result.toLowerCase();
const tokenType = keywords[lowerResult] || 'IDENTIFIER';
return new Token(tokenType, result, this.line, this.column);
}
Why Read First, Then Check: This approach is simpler than trying to match keywords as you go. By reading the whole identifier first, you can then do a simple dictionary lookup. If you tried to match keywords character-by-character, you'd need complex state machines and backtracking. This way is cleaner and easier to extend with new keywords.
Building This Method:
- Create
identifier()method - Add empty
resultstring - Add while loop that reads valid identifier characters
- Add keyword dictionary
- Add case-insensitive keyword lookup
- Return token with appropriate type
Building the Number Reader From Scratch
Numbers can be integers (50) or decimals (3.14). You need to read all digits, handle decimal points, and convert the string to an actual number.
How to Build It Step by Step:
Step 1: Create the Method and Initialize Start with an empty string to accumulate digits:
number() {
let result = '';
// We'll build the number string digit by digit
// Then convert it to an actual number at the end
}
Step 2: Read Integer Part (Digits Before Decimal Point) Read all consecutive digits. This handles both integers and the integer part of decimals:
number() {
let result = '';
// Read digits
// /\d/ matches any digit (0-9)
// Keep reading while we see digits
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar; // Append digit to result
this.advance(); // Move to next character
}
// When loop exits, we've read all consecutive digits
// currentChar is now either a decimal point, or something else
}
Why Read Digits First:
This approach handles both integers (50) and decimals (3.14) with the same initial logic. The first loop reads the whole number part (before decimal), then we check if there's more.
Step 3: Check for Decimal Point After reading digits, check if there's a decimal point. If yes, read the fractional part:
number() {
let result = '';
// Read integer part (step 2)
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check for decimal point
// If currentChar is '.', we have a decimal number
if (this.currentChar === '.') {
result += '.'; // Add decimal point to result
this.advance(); // Move past the decimal point
// Read fractional digits (after decimal point)
// Same pattern as integer part
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Now we have the complete decimal number in result
}
// If no decimal point, result already contains the integer
}
Why This Order Works:
- Read integer digits first (handles
50and the3in3.14) - Then check for decimal point
- If decimal point exists, read fractional digits (the
14in3.14) - This handles both cases with the same code structure
Step 4: Convert String to Number After reading the number string, convert it to an actual JavaScript number:
number() {
let result = '';
// Read integer part
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check for decimal point and read fractional part
if (this.currentChar === '.') {
result += '.';
this.advance();
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
}
// Convert to actual number
// parseFloat() converts string to number
// '50' becomes 50, '3.14' becomes 3.14
const numValue = parseFloat(result);
// Return token with number value (not string)
return new Token('NUMBER', numValue, this.line, this.column);
}
Why Convert to Number:
The token value must be a number (not a string) so the parser and interpreter can do math with it. parseFloat('50') returns the number 50, not the string '50'. This is essential for arithmetic operations later.
The Complete Method:
number() {
let result = '';
// Read digits
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check for decimal point
if (this.currentChar === '.') {
result += '.';
this.advance();
// Read more digits after decimal
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
}
// Convert to actual number
const numValue = parseFloat(result);
return new Token('NUMBER', numValue, this.line, this.column);
}
Why This Pattern Works:
- We stop reading digits when we hit a non-digit character (space, operator, etc.)
- This naturally handles both integers and decimals
parseFloat()handles the conversion automatically- The token value is a number, ready for arithmetic operations
Building This Method:
- Create
number()method with emptyresultstring - Add while loop to read integer digits
- Add check for decimal point
- If decimal point exists, add it to result and read fractional digits
- Convert result string to number using
parseFloat() - Return NUMBER token with numeric value
Why not handle negative numbers here: If we tried to parse -50 as one token, we'd need to backtrack when we see 50 - 20 (is it minus or subtraction?). Instead, -50 becomes two tokens: MINUS followed by NUMBER(50). The parser then treats the MINUS as a unary operator when it appears before a number. This separation keeps the lexer simple - it just tokenizes, it doesn't understand operator precedence.
Building the String Parser From Scratch
Strings are text between double quotes and can contain escape sequences like \n for newline. You need to read everything between the quotes and handle escape sequences properly.
How to Build It Step by Step:
Step 1: Skip the Opening Quote
When you see a ", you know a string is starting. Skip past it:
parseString() {
let result = '';
this.advance(); // Skip opening quote
// We've already seen the opening quote in getNextToken()
// Now we need to read everything until the closing quote
}
Step 2: Read Characters Until Closing Quote Loop through characters, stopping when you hit the closing quote or end of file:
parseString() {
let result = '';
this.advance(); // Skip opening quote
// Read characters until closing quote or EOF
// Loop continues while currentChar is not null (EOF) and not '"' (closing quote)
while (this.currentChar !== null && this.currentChar !== '"') {
// We'll handle the character here
result += this.currentChar;
this.advance();
}
// When loop exits, we've either hit the closing quote or EOF
}
Step 3: Handle Escape Sequences When you see a backslash, the next character is special. Handle escape sequences:
parseString() {
let result = '';
this.advance(); // Skip opening quote
while (this.currentChar !== null && this.currentChar !== '"') {
// Check if this is an escape sequence
if (this.currentChar === '\\') {
this.advance(); // Skip the backslash
// Handle escape sequences
// The character after backslash tells us what to do
if (this.currentChar === 'n') {
result += '\n'; // Newline character
} else if (this.currentChar === 't') {
result += '\t'; // Tab character
} else if (this.currentChar === '"') {
result += '"'; // Escaped quote (literal quote in string)
} else if (this.currentChar === '\\') {
result += '\\'; // Escaped backslash (literal backslash)
} else {
// Unknown escape sequence - just use the character as-is
result += this.currentChar;
}
this.advance(); // Move past the escape sequence character
} else {
// Normal character - just add it
result += this.currentChar;
this.advance();
}
}
}
Why Escape Sequences Matter:
\nbecomes a newline character (ASCII 10) - allows multi-line strings\tbecomes a tab character (ASCII 9) - allows indentation in strings\"becomes a literal quote - allows quotes inside strings\\becomes a literal backslash - allows backslashes in strings
Step 4: Validate Closing Quote After the loop, check that we actually found a closing quote:
parseString() {
let result = '';
this.advance(); // Skip opening quote
// Read characters and handle escapes (steps 2-3)
while (this.currentChar !== null && this.currentChar !== '"') {
if (this.currentChar === '\\') {
this.advance();
// Handle escape sequences...
if (this.currentChar === 'n') {
result += '\n';
} // ... etc
this.advance();
} else {
result += this.currentChar;
this.advance();
}
}
// Validate we found closing quote
if (this.currentChar === '"') {
this.advance(); // Skip closing quote
} else {
// We hit EOF before finding closing quote - error!
this.error('Unterminated string literal');
}
return new Token('STRING', result, this.line, this.column);
}
Why Check for Closing Quote:
The loop exits when currentChar is null (EOF) or '"' (closing quote). If it's null, the string was never closed - that's an error. If it's '"', we successfully found the closing quote and can continue.
The Complete Method:
parseString() {
let result = '';
this.advance(); // Skip opening quote
while (this.currentChar !== null && this.currentChar !== '"') {
if (this.currentChar === '\\') {
// Escape sequence
this.advance(); // Skip the backslash
if (this.currentChar === 'n') {
result += '\n';
} else if (this.currentChar === 't') {
result += '\t';
} else if (this.currentChar === '"') {
result += '"'; // Escaped quote
} else if (this.currentChar === '\\') {
result += '\\'; // Escaped backslash
} else {
result += this.currentChar; // Unknown escape, just use the char
}
this.advance();
} else {
result += this.currentChar;
this.advance();
}
}
if (this.currentChar === '"') {
this.advance(); // Skip closing quote
} else {
this.error('Unterminated string literal');
}
return new Token('STRING', result, this.line, this.column);
}
How Escape Sequences Work:
When we see a backslash, we know the next character is special. We skip the backslash, check what follows, and convert it to the actual character. The backslash acts as an escape character - it tells the parser "the next character has special meaning, don't treat it literally." The loop condition checks for both null (end of file) and '"' (closing quote) - if we hit null before a quote, the string is unterminated and we error. The final check ensures we actually consumed the closing quote - if we didn't, the loop ended because we hit the end of file, which means the string was never closed.
Building This Method:
- Create
parseString()method - Skip opening quote with
advance() - Add while loop that continues until closing quote or EOF
- Inside loop, check for backslash (escape sequence)
- If backslash, handle escape sequences (
\n,\t,\",\\) - If normal character, add it to result
- After loop, validate closing quote exists
- Return STRING token with the parsed string value
Building the Hex Color Parser From Scratch
Hex colors start with # and can be 3, 4, 6, or 8 hex digits. You need to read the hex digits and validate the format.
How to Build It Step by Step:
Step 1: Start with the Hash Symbol
Hex colors always start with #. We've already seen it in getNextToken(), so skip past it:
parseHexColor() {
let result = '#';
this.advance(); // Skip the #
// We start result with '#' because hex colors include it
// Now we need to read the hex digits
}
Step 2: Read Hex Digits Read all consecutive hexadecimal digits (0-9, a-f, A-F):
parseHexColor() {
let result = '#';
this.advance(); // Skip the #
// Read hex digits
// /[0-9a-fA-F]/ matches any hexadecimal digit
// Keep reading while we see valid hex characters
while (this.currentChar && /[0-9a-fA-F]/.test(this.currentChar)) {
result += this.currentChar; // Append hex digit to result
this.advance(); // Move to next character
}
// When loop exits, we've read all hex digits
// currentChar is now something that's not a hex digit
}
Why Hex Digits: Hexadecimal uses base-16, so digits are 0-9 and letters A-F (or a-f). This allows values from 0-15 per digit, which is perfect for color values (0-255 in decimal = 00-FF in hex).
Step 3: Validate the Length
After reading digits, check that the length is valid. Hex colors must be 3, 4, 6, or 8 digits (not counting the #):
parseHexColor() {
let result = '#';
this.advance(); // Skip the #
// Read hex digits (step 2)
while (this.currentChar && /[0-9a-fA-F]/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Validate length
// result includes the '#', so we subtract 1 to get digit count
const hexLength = result.length - 1; // Minus the #
// Valid lengths: 3, 4, 6, or 8 digits
if (hexLength === 3 || hexLength === 4 || hexLength === 6 || hexLength === 8) {
return new Token('HEXCOLOR', result, this.line, this.column);
} else {
// Invalid length - error!
this.error(`Invalid hex color format: ${result}`);
}
}
Why Validate Length:
#FF(2 digits) is invalid - too short#FFF(3 digits) is valid - RGB shorthand, gets expanded to#FFFFFF#FFFF(4 digits) is valid - RGBA shorthand with alpha#FFFFFF(6 digits) is valid - full RGB#FFFFFFFF(8 digits) is valid - full RGBA with alpha- Other lengths are invalid
The Complete Method:
parseHexColor() {
let result = '#';
this.advance(); // Skip the #
// Read hex digits
while (this.currentChar && /[0-9a-fA-F]/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Validate length
const hexLength = result.length - 1; // Minus the #
if (hexLength === 3 || hexLength === 4 || hexLength === 6 || hexLength === 8) {
return new Token('HEXCOLOR', result, this.line, this.column);
} else {
this.error(`Invalid hex color format: ${result}`);
}
}
Why Validate:
#FF is invalid (only 2 digits). #FFF is valid (3 digits, gets expanded to #FFFFFF). #FFFF is valid (4 digits with alpha). #FFFFFF is valid (6 digits). #FFFFFFFF is valid (8 digits with alpha). Validation ensures we only accept properly formatted hex colors.
Building This Method:
- Create
parseHexColor()method - Start result with
'#'and skip past it - Add while loop to read hex digits (0-9, a-f, A-F)
- Calculate hex length (result.length - 1, excluding the
#) - Validate length is 3, 4, 6, or 8
- Return HEXCOLOR token or error if invalid
Handling Operators and Punctuation
Single characters are straightforward:
// In getNextToken(), after checking for identifiers, numbers, etc.
switch (this.currentChar) {
case '{':
this.advance();
return new Token('LBRACE', '{', this.line, this.column);
case '}':
this.advance();
return new Token('RBRACE', '}', this.line, this.column);
case ':':
this.advance();
return new Token('COLON', ':', this.line, this.column);
case ',':
this.advance();
return new Token('COMMA', ',', this.line, this.column);
// ... etc
}
Multi-character operators need special handling:
// Check for == before =
if (this.currentChar === '=' && this.peek() === '=') {
this.advance(); // Skip first =
this.advance(); // Skip second =
return new Token('EQUALS', '==', this.line, this.column - 1);
}
if (this.currentChar === '=') {
this.advance();
return new Token('ASSIGN', '=', this.line, this.column);
}
Order matters: Check == before =, otherwise == becomes two ASSIGN tokens.
Error Handling
When something goes wrong, throw an error with position info:
error(message) {
throw new Error(`Lexer error at line ${this.line}, col ${this.column}: ${message}`);
}
Why include position? "Error: Unexpected character" is useless. "Error at line 5, col 12: Unexpected character '&'" is helpful.
This should output an array of tokens. If it doesn't, you've got a bug.
Common Issues
Tokenizing stops early:
- Check that you're calling
advance()after reading each character - Make sure you're not skipping valid characters
Keywords not recognized:
- Check the keywords object has the right case
- Verify the case-insensitive comparison works
Numbers parsed wrong:
- Make sure you're using
parseFloat(), not keeping as string - Check decimal point handling
Strings break:
- Verify escape sequence handling
- Check that you're consuming the closing quote
The lexer is the simplest part. Get this right, and the parser becomes much easier.
The Token Class
Every token has:
type- What kind of token (IDENTIFIER, NUMBER, SHAPE, etc.)value- The actual content ("circle", 50, etc.)lineandcolumn- Where it came from (for error messages)
The line/column tracking is crucial. When something breaks, you need to tell the user where.
Keywords
All the reserved words are in a big object in identifier(). When you see a letter, the lexer reads until it can't anymore, then checks that object. If it finds a match, it's a keyword token. Otherwise, it's an IDENTIFIER.
Gotcha: The keyword check is case-insensitive (result.toLowerCase()), but the actual token value keeps the original casing. This matters for some edge cases.
Numbers
Numbers are straightforward - read digits, maybe a decimal point, maybe more digits. The lexer converts the string to an actual number using parseFloat().
Gotcha: Negative numbers aren't handled in the lexer. -50 becomes two tokens: MINUS and NUMBER(50). The parser handles the negation. This is actually fine - it keeps the lexer simple.
Strings and Colors
Strings are between double quotes. The lexer handles escape sequences (\n, \", etc.).
Hex colors start with # and can be 3, 4, 6, or 8 hex digits (the 4 and 8 include alpha). The lexer validates the length - if it's wrong, it throws an error.
There's also a COLORNAME token type for named colors like "red", "blue", "gray" (or "grey" - we support both spellings because people are inconsistent).
Adding New Keywords
Want to add a new keyword? Three steps:
Add it to the keywords object in
identifier():const keywords = { // ... existing stuff 'mykeyword': 'MYKEYWORD', };The parser needs to handle
MYKEYWORDtokens (see parser section)- The interpreter needs to do something with it (see interpreter section)
That's it. The lexer part is the easiest.
Building the Parser From Scratch
The parser takes tokens and builds an Abstract Syntax Tree (AST). This is where we figure out what the code actually means. Let's build it step by step.
What the Parser Does
The parser converts a flat list of tokens into a tree structure. For example, shape circle c1 { radius: 50 } becomes:
{
type: 'shape',
shapeType: 'circle',
name: 'c1',
params: {
radius: { type: 'number', value: 50 }
}
}
The interpreter doesn't care about the original text - it only works with AST nodes. The AST represents the structure of the program.
Building the Basic Parser Structure From Scratch
The parser takes tokens from the lexer and builds an Abstract Syntax Tree (AST). You need a class that holds the lexer and tracks the current token.
How to Build It Step by Step:
Step 1: Create the Parser Class Start with a class that holds the lexer and initializes the first token:
export class Parser {
constructor(lexer) {
// Step 1.1: Store the lexer reference
// The parser needs the lexer to get tokens
this.lexer = lexer;
// Step 1.2: Get the first token (look ahead one token)
// We need to start with one token already loaded
// This is called "lookahead" - we always have one token ahead
// This is LL(1) parsing - we only need to look at one token to decide what to do next
this.currentToken = this.lexer.getNextToken();
}
}
Why Load First Token: We need to know what token we're currently looking at before we can parse. By loading the first token in the constructor, we're ready to parse immediately. This is a common pattern in recursive descent parsers. LL(1) means "Left-to-right, Leftmost derivation, 1 token lookahead" - we only need to peek at one token to decide what to parse next.
Step 2: Build the eat() Method
This is the core method - it consumes a token if it matches what you expect:
eat(tokenType) {
// Step 2.1: Check if current token matches expected type
// If it matches, we can consume it
if (this.currentToken.type === tokenType) {
// Step 2.2: Save the token (in case caller needs it)
const token = this.currentToken;
// Step 2.3: Consume the token and get the next one
// Move forward by getting the next token from the lexer
this.currentToken = this.lexer.getNextToken(); // Move to next token
// Step 2.4: Return the consumed token
// Some parsing methods need the token value
return token;
} else {
// Step 2.5: Token doesn't match - error!
// This is a syntax error - the code doesn't match the grammar
this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
}
}
Why eat() is Essential:
This is the core of the parser. It checks if the current token matches what we expect, consumes it, and moves forward. If it doesn't match, we throw an error. This is called "consuming" a token. This pattern makes parsing code cleaner - instead of checking and advancing everywhere, you just call eat('SHAPE') and it handles both the check and the advancement.
Step 3: Build the Error Method When parsing fails, you need to report an error with position information:
error(message) {
// Throw an error with position information
// Include line and column from the current token for better error messages
throw new Error(`Parser error at line ${this.currentToken.line}, col ${this.currentToken.column}: ${message}`);
}
Why Include Position: Error messages are much more helpful when they include where the error occurred. Users can quickly find and fix the problem.
The Complete Basic Structure:
export class Parser {
constructor(lexer) {
this.lexer = lexer;
this.currentToken = this.lexer.getNextToken(); // Look ahead one token
}
error(message) {
throw new Error(`Parser error at line ${this.currentToken.line}, col ${this.currentToken.column}: ${message}`);
}
eat(tokenType) {
if (this.currentToken.type === tokenType) {
const token = this.currentToken;
this.currentToken = this.lexer.getNextToken(); // Move to next token
return token;
} else {
this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
}
}
}
How It Works:
- Constructor stores lexer and loads first token (lookahead)
eat()consumes expected tokens and errors on unexpected oneserror()throws parsing errors with position information- This foundation supports all parsing methods
Why Look Ahead:
We always have currentToken set to the next token we're about to process. This is LL(1) parsing - we only need to look at one token to decide what to do next. This makes the parser simple and efficient.
Building This Step by Step:
- Create
Parserclass with constructor - Store lexer reference in constructor
- Load first token in constructor (
this.currentToken = this.lexer.getNextToken()) - Add
eat()method that checks token type, saves token, advances, and returns token - Add
error()method for error reporting with position information - This structure is the foundation for all parsing methods
The Main Parse Method
The entry point parses a whole program:
parse() {
const statements = [];
while (this.currentToken.type !== 'EOF') {
statements.push(this.parseStatement());
}
return statements; // Array of AST nodes
}
A program is just a list of statements. We parse each one until we hit End-Of-File.
Parsing Statements
The parseStatement() method dispatches based on the current token:
parseStatement() {
switch (this.currentToken.type) {
case 'PARAM':
return this.parseParam();
case 'SHAPE':
return this.parseShape();
case 'UNION':
case 'DIFFERENCE':
case 'INTERSECTION':
return this.parseBooleanOperation();
case 'IF':
return this.parseIfStatement();
case 'FOR':
return this.parseForLoop();
case 'DEF':
return this.parseFunctionDefinition();
default:
this.error(`Unexpected token: ${this.currentToken.type}`);
}
}
How it works: Look at the current token type, call the appropriate parsing method. Each method knows how to parse its specific construct.
Gotcha: The order in the switch doesn't matter, but make sure you're checking for keywords before falling through to generic cases. If IF is a keyword, it should be handled here, not as an identifier.
Parsing a Shape Statement
Let's parse shape circle c1 { radius: 50, x: 0 }:
parseShape() {
this.eat('SHAPE'); // Consume 'shape' keyword
const shapeType = this.currentToken.value;
this.eat('IDENTIFIER'); // Consume shape type ('circle')
const name = this.currentToken.value;
this.eat('IDENTIFIER'); // Consume shape name ('c1')
this.eat('LBRACE'); // Consume '{'
// Parse properties until we see '}'
const params = {};
while (this.currentToken.type !== 'RBRACE') {
const key = this.currentToken.value;
this.eat('IDENTIFIER'); // Property name ('radius')
this.eat('COLON'); // Consume ':'
const value = this.parseExpression(); // Parse the value
params[key] = value;
// Optional comma (we allow trailing commas)
if (this.currentToken.type === 'COMMA') {
this.eat('COMMA');
}
}
this.eat('RBRACE'); // Consume '}'
return {
type: 'shape',
shapeType: shapeType,
name: name,
params: params
};
}
Step by step:
- Eat the
SHAPEtoken (verify it's there) - Get the shape type (next identifier)
- Get the shape name (next identifier)
- Eat the opening brace
- Parse properties in a loop until we see the closing brace
- Each property is
key: value - Eat the closing brace
- Return the AST node
The loop: We keep parsing properties until we hit RBRACE. This handles zero or more properties. The while loop naturally handles the "zero or more" part of the grammar.
Parsing Expressions
Expressions are where it gets interesting. You need to handle operator precedence. 2 + 3 * 4 should be 2 + (3 * 4), not (2 + 3) * 4.
The trick is to use separate methods for each precedence level:
parseExpression() {
let node = this.parseTerm(); // Start with terms (higher precedence)
// Handle + and - (lowest precedence)
while (this.currentToken.type === 'PLUS' || this.currentToken.type === 'MINUS') {
const operator = this.currentToken.type;
this.eat(operator);
node = {
type: 'binary_op',
operator: operator.toLowerCase(),
left: node,
right: this.parseTerm() // Right side is also a term
};
}
return node;
}
parseTerm() {
let node = this.parseFactor(); // Start with factors (highest precedence)
// Handle * and / (higher precedence than + and -)
while (this.currentToken.type === 'MULTIPLY' || this.currentToken.type === 'DIVIDE') {
const operator = this.currentToken.type;
this.eat(operator);
node = {
type: 'binary_op',
operator: operator.toLowerCase(),
left: node,
right: this.parseFactor() // Right side is also a factor
};
}
return node;
}
parseFactor() {
const token = this.currentToken;
// Numbers
if (token.type === 'NUMBER') {
this.eat('NUMBER');
return { type: 'number', value: token.value };
}
// Identifiers (parameters, shape references)
if (token.type === 'IDENTIFIER') {
this.eat('IDENTIFIER');
return { type: 'identifier', value: token.value };
}
// Strings
if (token.type === 'STRING') {
this.eat('STRING');
return { type: 'string', value: token.value };
}
// Parentheses
if (token.type === 'LPAREN') {
this.eat('LPAREN');
const expr = this.parseExpression();
this.eat('RPAREN');
return expr;
}
// Unary minus (negative numbers)
if (token.type === 'MINUS') {
this.eat('MINUS');
return {
type: 'unary_op',
operator: 'minus',
operand: this.parseFactor()
};
}
this.error(`Unexpected token in expression: ${token.type}`);
}
How precedence works:
parseExpression()callsparseTerm()- so terms (multiplication/division) are evaluated firstparseTerm()callsparseFactor()- so factors (numbers, identifiers, parentheses) are evaluated first- When you see
2 + 3 * 4:parseExpression()sees+, so it callsparseTerm()for the left (2) and right (3 * 4)parseTerm()sees*, so it callsparseFactor()for left (3) and right (4)- Result:
2 + (3 * 4)- correct!
The while loops: They handle left-associativity. 10 - 5 - 2 becomes (10 - 5) - 2, not 10 - (5 - 2).
Parsing Parameters
Parameters are simple: param size 100
parseParam() {
this.eat('PARAM');
const name = this.currentToken.value;
this.eat('IDENTIFIER');
const value = this.parseExpression();
return {
type: 'param',
name: name,
value: value
};
}
The value can be any expression - a number, a calculation, a function call, etc.
Parsing If Statements
If statements: if condition { ... } else { ... }
parseIfStatement() {
this.eat('IF');
const condition = this.parseCondition(); // Parse the condition
this.eat('LBRACE');
const thenBody = [];
while (this.currentToken.type !== 'RBRACE') {
thenBody.push(this.parseStatement());
}
this.eat('RBRACE');
let elseBody = null;
if (this.currentToken.type === 'ELSE') {
this.eat('ELSE');
this.eat('LBRACE');
elseBody = [];
while (this.currentToken.type !== 'RBRACE') {
elseBody.push(this.parseStatement());
}
this.eat('RBRACE');
}
return {
type: 'if_statement',
condition: condition,
thenBody: thenBody,
elseBody: elseBody
};
}
The else clause: It's optional. Check if the next token is ELSE, and only parse it if it is.
Parsing the body: The body is a list of statements. We parse them in a loop until we hit the closing brace.
Parsing For Loops
For loops: for i from 0 to 10 step 2 { ... }
parseForLoop() {
this.eat('FOR');
const variable = this.currentToken.value;
this.eat('IDENTIFIER');
this.eat('FROM');
const from = this.parseExpression();
this.eat('TO');
const to = this.parseExpression();
let step = null;
if (this.currentToken.type === 'STEP') {
this.eat('STEP');
step = this.parseExpression();
}
this.eat('LBRACE');
const body = [];
while (this.currentToken.type !== 'RBRACE') {
body.push(this.parseStatement());
}
this.eat('RBRACE');
return {
type: 'for_loop',
variable: variable,
from: from,
to: to,
step: step,
body: body
};
}
The step is optional: If there's no STEP keyword, we use null and the interpreter defaults to 1.
Parsing Boolean Operations
Boolean operations: union u1 { add c1, add r1 }
parseBooleanOperation() {
const operation = this.currentToken.type.toLowerCase(); // 'union', 'difference', etc.
this.eat(this.currentToken.type);
const name = this.currentToken.value;
this.eat('IDENTIFIER');
this.eat('LBRACE');
const shapes = [];
while (this.currentToken.type !== 'RBRACE') {
if (this.currentToken.type === 'ADD' || this.currentToken.type === 'SUBTRACT') {
const op = this.currentToken.type.toLowerCase();
this.eat(this.currentToken.type);
const shapeRef = this.parseExpression(); // Could be identifier or expression
shapes.push({ op: op, shape: shapeRef });
} else {
this.error(`Expected 'add' or 'subtract' in boolean operation`);
}
}
this.eat('RBRACE');
return {
type: 'boolean_operation',
operation: operation,
name: name,
shapes: shapes
};
}
The shapes array: Each entry has an operation (add or subtract) and a shape reference. The shape reference is usually an identifier, but we parse it as an expression to be flexible.
Error Recovery
When parsing fails, you want helpful errors:
error(message) {
const line = this.currentToken ? this.currentToken.line : 'unknown';
const column = this.currentToken ? this.currentToken.column : 'unknown';
const tokenType = this.currentToken ? this.currentToken.type : 'EOF';
throw new Error(`Parser error at line ${line}, col ${column}: ${message}. Got token: ${tokenType}`);
}
Include context: Tell the user what token you expected and what you got. This makes debugging much easier.
Common Issues
Parser stops early:
- Check that you're eating all required tokens
- Verify you're not skipping tokens accidentally
Wrong precedence:
- Make sure
parseExpression()callsparseTerm(), which callsparseFactor() - Check that operators are handled at the right level
Infinite loops:
- Make sure loops have exit conditions
- Verify you're consuming tokens in loops (not just checking them)
AST structure wrong:
- Check that you're returning objects with
typefields - Verify nested structures match what the interpreter expects
The parser is the most complex part. Get the expression parsing right, and the rest follows naturally.
The Main Loop
parse() {
const statements = [];
while (this.currentToken.type !== 'EOF') {
statements.push(this.parseStatement());
}
return statements;
}
A program is just a list of statements. We parse each one until we hit the end of the file.
Statement Parsing
The parseStatement() method is a big switch statement. It looks at the current token type and calls the appropriate parsing method:
PARAM→parseParam()SHAPE→parseShape()UNION/DIFFERENCE/INTERSECTION→parseBooleanOperation()IF→parseIfStatement()- etc.
Gotcha: The order matters. If you have if as a keyword, make sure it's checked before generic identifiers, or you'll get weird errors.
Shape Parsing
Shapes follow the pattern: shape <type> <name> { <properties> }
The parser:
- Eats the
SHAPEtoken - Gets the shape type (identifier like "circle")
- Gets the shape name (another identifier)
- Eats the
{ - Parses properties until it sees
} - Eats the
}
Properties are key: value pairs. The key is an identifier, then a colon, then an expression (which can be a number, string, parameter reference, math expression, etc.).
Expression Parsing
This is where it gets interesting. Expressions need to respect operator precedence - 2 + 3 * 4 should be 2 + (3 * 4), not (2 + 3) * 4.
The parser handles this with separate methods:
parseExpression()handles+and-parseTerm()handles*and/parseFactor()handles the base cases (numbers, identifiers, parentheses, etc.)
The trick is that parseExpression() calls parseTerm(), which calls parseFactor(). This creates the right precedence automatically.
How it works: When you see 2 + 3 * 4, parseExpression() sees the +, so it:
- Takes what's on the left (2, from
parseTerm()) - Eats the
+ - Takes what's on the right (3 4, from
parseTerm()which handles the ``)
So the * gets grouped before the +, which is what we want.
AST Node Structure
Different node types have different structures, but they all have a type field. Here are the common ones:
Shape node:
{
type: 'shape',
shapeType: 'circle',
name: 'c1',
params: { radius: { type: 'number', value: 50 } }
}
Parameter node:
{
type: 'param',
name: 'size',
value: { type: 'number', value: 100 }
}
Binary operation:
{
type: 'binary_op',
operator: '+',
left: { type: 'number', value: 10 },
right: { type: 'number', value: 20 }
}
If statement:
{
type: 'if_statement',
condition: { /* expression */ },
thenBody: [ /* statements */ ],
elseBody: [ /* statements */ ] // optional
}
The structure is pretty straightforward - it mirrors the code structure.
Adding New Syntax
To add a new language construct:
- Add the keyword to the lexer (if it's a keyword)
- Add a parsing method like
parseMyNewThing() - Add a case to
parseStatement()that calls it - Make sure it returns an AST node with a
typefield - Add interpreter support (see below)
The parser is pretty modular - adding new constructs is usually straightforward.
Building the Interpreter From Scratch
The interpreter is where things actually happen. It walks through the AST and creates shapes, sets parameters, runs loops, etc. Let's build it step by step.
The Environment
The interpreter needs somewhere to store runtime state. That's the Environment:
export class Environment {
constructor() {
this.parameters = new Map(); // Parameter name → value
this.shapes = new Map(); // Shape name → shape object
this.layers = new Map(); // Layer name → layer object
this.functions = new Map(); // Function name → function definition
}
setParameter(name, value) {
this.parameters.set(name, value);
}
getParameter(name) {
if (!this.parameters.has(name)) {
throw new Error(`Parameter not found: ${name}`);
}
return this.parameters.get(name);
}
createShapeWithName(type, name, params) {
const shape = {
type: type,
shapeType: type,
params: params,
transform: {
position: params.position || [0, 0],
rotation: params.rotation || 0,
scale: [1, 1]
}
};
this.shapes.set(name, shape);
return shape;
}
}
Why Maps? Fast lookups. When you reference param.size, we need to find it quickly. Maps are O(1) lookup.
The shape structure: Shapes have type, params, and transform. The renderer uses this structure. Don't change it without updating the renderer.
Basic Interpreter Structure
Start with a class that holds the environment:
export class Interpreter {
constructor() {
this.env = new Environment();
this.functions = new Map();
this.constraints = [];
this.currentLoopCounter = undefined; // For loop name mangling
this.currentFunctionContext = null; // For function name mangling
}
interpret(ast) {
let result = null;
for (const node of ast) {
result = this.evaluateNode(node);
}
return {
parameters: this.env.parameters,
shapes: this.env.shapes,
layers: this.env.layers,
functions: this.functions,
constraints: this.constraints,
result: result
};
}
}
The main loop: Walk through each AST node, evaluate it, return everything at the end. The result object contains all the runtime state that other systems need.
Evaluating Nodes
The evaluateNode() method dispatches to specific evaluators:
evaluateNode(node) {
// Handle name mangling for loops
if (node.type === 'shape' && this.currentLoopCounter !== undefined) {
node = {
...node,
name: `${node.name}_${this.currentLoopCounter}`
};
}
switch (node.type) {
case 'param':
return this.evaluateParam(node);
case 'shape':
return this.evaluateShape(node);
case 'boolean_operation':
return this.evaluateBooleanOperation(node);
case 'if_statement':
return this.evaluateIfStatement(node);
case 'for_loop':
return this.evaluateForLoop(node);
case 'function_definition':
return this.evaluateFunctionDefinition(node);
case 'function_call':
return this.evaluateFunctionCall(node);
default:
throw new Error(`Unknown node type: ${node.type}`);
}
}
Name mangling: If we're in a loop, we append the loop counter to shape names. This prevents name collisions when the same shape name is used in multiple loop iterations.
Evaluating Parameters
Parameters are simple: store the value in the environment.
evaluateParam(node) {
const value = this.evaluateExpression(node.value);
this.env.setParameter(node.name, value);
return value;
}
The value is an expression: It could be 100, or 50 + 50, or param.otherParam * 2. We evaluate it first, then store the result.
Evaluating Shapes
This is where shapes get created:
evaluateShape(node) {
// Generate unique name
let shapeName = node.name;
if (this.currentFunctionContext) {
shapeName = `${shapeName}_${this.currentFunctionContext.name}_${this.currentFunctionContext.callId}`;
} else if (this.currentLoopCounter !== undefined) {
shapeName = `${shapeName}_${this.currentLoopCounter}`;
}
// Evaluate all parameter expressions
const params = {};
for (const [key, expr] of Object.entries(node.params)) {
const evaluatedValue = this.evaluateExpression(expr);
params[key] = this.processShapeParameter(key, evaluatedValue);
}
// Apply defaults and process special parameters
this.processShapeFillParameters(node.shapeType, params);
// Create the shape
const shape = this.env.createShapeWithName(node.shapeType, shapeName, params);
return shape;
}
Step by step:
- Generate a unique name (handles loops/functions)
- Evaluate each parameter expression
- Process and validate parameters
- Apply shape-specific defaults
- Create the shape object
- Store it in the environment
- Return it
Parameter evaluation: Each property value is an expression. radius: 50 is easy, but radius: param.size * 2 needs evaluation. We evaluate each one.
Evaluating Expressions
Expressions can be literals, identifiers, binary operations, function calls, etc.:
evaluateExpression(node) {
switch (node.type) {
case 'number':
return node.value;
case 'string':
return node.value;
case 'identifier':
// Check if it's a parameter
if (this.env.parameters.has(node.value)) {
return this.env.getParameter(node.value);
}
// Check if it's a shape reference
if (this.env.shapes.has(node.value)) {
return node.value; // Return name as string for boolean ops
}
throw new Error(`Undefined identifier: ${node.value}`);
case 'binary_op':
const left = this.evaluateExpression(node.left);
const right = this.evaluateExpression(node.right);
return this.applyBinaryOperator(node.operator, left, right);
case 'unary_op':
const operand = this.evaluateExpression(node.operand);
if (node.operator === 'minus') {
return -operand;
}
throw new Error(`Unknown unary operator: ${node.operator}`);
case 'array':
return node.elements.map(el => this.evaluateExpression(el));
case 'function_call':
return this.evaluateFunctionCall(node);
default:
throw new Error(`Unknown expression type: ${node.type}`);
}
}
Recursive evaluation: Expressions can contain other expressions. 10 + param.size has a binary operation with a number and an identifier. We evaluate recursively.
Identifier lookup: First check parameters, then shapes. If neither, it's an error. This means you can't have a parameter and shape with the same name (the parameter wins).
Binary Operators
Apply operators to evaluated operands:
applyBinaryOperator(op, left, right) {
switch (op) {
case '+': return left + right;
case '-': return left - right;
case '*': return left * right;
case '/':
if (right === 0) throw new Error('Division by zero');
return left / right;
case '%': return left % right;
case '==': return left === right;
case '!=': return left !== right;
case '<': return left < right;
case '<=': return left <= right;
case '>': return left > right;
case '>=': return left >= right;
case 'and': return left && right;
case 'or': return left || right;
default:
throw new Error(`Unknown operator: ${op}`);
}
}
Type coercion: JavaScript does this automatically. "5" + 3 becomes "53" (string concatenation), "5" * 3 becomes 15 (numeric multiplication). This is usually what you want, but be aware of it.
Division by zero: Check for this explicitly. JavaScript returns Infinity, but that's probably not what you want.
Evaluating If Statements
If statements evaluate conditionally:
evaluateIfStatement(node) {
const condition = this.evaluateExpression(node.condition);
if (condition) {
// Evaluate then body
for (const stmt of node.thenBody) {
this.evaluateNode(stmt);
}
} else if (node.elseBody) {
// Evaluate else body
for (const stmt of node.elseBody) {
this.evaluateNode(stmt);
}
}
return null; // If statements don't return values
}
The condition: Can be any expression that evaluates to a truthy/falsy value. param.size > 50, true, param.enabled and param.visible, etc.
The bodies: Arrays of statements. We evaluate each one in order.
Evaluating For Loops
For loops create multiple shapes:
evaluateForLoop(node) {
const from = this.evaluateExpression(node.from);
const to = this.evaluateExpression(node.to);
const step = node.step ? this.evaluateExpression(node.step) : 1;
// Set loop counter for name mangling
this.currentLoopCounter = from;
while (this.currentLoopCounter <= to) {
// Evaluate loop body
for (const stmt of node.body) {
this.evaluateNode(stmt);
}
this.currentLoopCounter += step;
}
// Clear loop counter
this.currentLoopCounter = undefined;
}
How it works:
- Evaluate the range (from, to, step)
- Set
currentLoopCounterto the starting value - Loop while counter <= to
- Evaluate the body (shapes created here get the counter appended to their names)
- Increment counter
- Clear the counter when done
Name mangling: When evaluateNode() sees a shape and currentLoopCounter is set, it appends the counter to the name. So shape circle c1 in a loop becomes c1_0, c1_1, c1_2, etc.
Gotcha: Nested loops overwrite currentLoopCounter. If you have nested loops with the same shape names, they'll conflict. This is a known limitation.
Evaluating Boolean Operations
Boolean operations combine shapes:
evaluateBooleanOperation(node) {
// Get shape names from the AST
const shapeNames = node.shapes.map(s => {
if (typeof s === 'string') {
return s;
} else if (s.shape) {
// It's an object with op and shape
return this.evaluateExpression(s.shape);
} else {
return this.evaluateExpression(s);
}
});
// Get actual shape objects
const shapes = shapeNames.map(name => {
if (!this.env.shapes.has(name)) {
throw new Error(`Shape not found: ${name}`);
}
return this.env.shapes.get(name);
});
// Perform the boolean operation (uses ClipperLib)
const result = this.booleanOperator.perform(
node.operation,
shapes
);
// Mark original shapes as consumed
shapes.forEach(shape => {
shape._consumedByBoolean = true;
});
// Create result shape
const resultShape = {
type: node.operation,
shapeType: node.operation,
params: {
operation: node.operation,
shapes: shapeNames
},
_consumedByBoolean: false
};
// Store result
this.env.shapes.set(node.name, resultShape);
return resultShape;
}
Shape references: The AST might have shape names as strings, or as identifier expressions. We evaluate them to get the actual names.
Boolean operations: Use ClipperLib (loaded from CDN) to compute the actual geometry. This is expensive - polygon clipping is complex.
Marking consumed: Original shapes are marked _consumedByBoolean = true. The renderer skips them. We don't delete them because other code might reference them.
Evaluating Functions
Functions are stored separately from the environment:
evaluateFunctionDefinition(node) {
this.functions.set(node.name, {
parameters: node.parameters,
body: node.body
});
return null; // Function definitions don't return values
}
evaluateFunctionCall(node) {
const func = this.functions.get(node.name);
if (!func) {
throw new Error(`Function not found: ${node.name}`);
}
// Save current parameter map
const oldParams = new Map(this.env.parameters);
// Bind arguments to parameters
for (let i = 0; i < func.parameters.length; i++) {
const argValue = this.evaluateExpression(node.arguments[i]);
this.env.setParameter(func.parameters[i], argValue);
}
// Execute function body
let result = null;
this.currentFunctionContext = {
name: node.name,
callId: this.functionCallCounters.get(node.name) || 0
};
this.functionCallCounters.set(node.name, (this.functionCallCounters.get(node.name) || 0) + 1);
for (const stmt of func.body) {
result = this.evaluateNode(stmt);
if (this.currentReturn !== null) {
result = this.currentReturn;
this.currentReturn = null;
break;
}
}
// Restore parameter map
this.env.parameters = oldParams;
this.currentFunctionContext = null;
return result;
}
Scope management: Functions create a new scope. Parameters defined in the function don't leak out. Parameters from outside are still accessible (unless shadowed by function parameters).
Name mangling: Function calls also do name mangling. If a function creates a shape named c1 and you call it twice, you get c1_myFunc_0 and c1_myFunc_1. This prevents collisions.
Return values: Functions can return values. The return statement sets this.currentReturn, which we check after each statement.
Processing Shape Parameters
Shape parameters need special handling:
processShapeParameter(key, value) {
// Handle position arrays
if (key === 'position' && Array.isArray(value)) {
return value;
}
// Handle color names
if ((key === 'color' || key === 'fillColor' || key === 'strokeColor') && typeof value === 'string') {
return this.resolveColorName(value);
}
// Everything else is passed through
return value;
}
resolveColorName(colorName) {
const colorMap = {
'red': '#FF0000',
'green': '#008000',
'blue': '#0000FF',
// ... etc
};
return colorMap[colorName.toLowerCase()] || colorName;
}
Color resolution: Named colors like "red" need to be converted to hex. We do this here.
Position arrays: Position is [x, y]. We validate that it's an array with two numbers.
What Gets Returned
At the end, the interpreter returns everything:
return {
parameters: this.env.parameters, // All parameters
shapes: this.env.shapes, // All shapes (this is what renderer uses)
layers: this.env.layers, // All layers
functions: this.functions, // Function definitions
constraints: this.constraints, // Constraint definitions
result: result // Last evaluated value
};
The renderer uses result.shapes: This is a Map of shape name → shape object. The renderer iterates over it and draws each shape.
This should output the parameters and shapes. If shapes aren't created, check:
- Are you evaluating expressions correctly?
- Are shapes being stored in the environment?
- Are parameter lookups working?
Common Issues
Shapes not created:
- Check that
evaluateShape()is being called - Verify shapes are stored in
env.shapes - Make sure shape names are unique
Parameters not found:
- Check that parameters are stored before they're used
- Verify parameter lookup in
evaluateExpression() - Make sure parameter names match
Wrong values:
- Check expression evaluation
- Verify operator application
- Make sure type coercion is working as expected
Name collisions:
- Check name mangling in loops/functions
- Verify shape names are unique
- Make sure you're not overwriting shapes accidentally
The interpreter is the execution engine. Get this right, and your language works. The lexer and parser just prepare the data - the interpreter actually does things.
Shape Objects
Shapes are just JavaScript objects. They look like:
{
type: 'circle',
shapeType: 'circle', // Sometimes both, for compatibility
params: {
radius: 50,
x: 0,
y: 0,
fill: true,
color: '#FF0000'
},
transform: {
position: [0, 0],
rotation: 0,
scale: [1, 1]
}
}
The renderer uses these objects to draw things. The interpreter's job is to create them.
The Main Loop
interpret(ast) {
let result = null;
for (const node of ast) {
result = this.evaluateNode(node);
}
return {
parameters: this.env.parameters,
shapes: this.env.shapes,
// ... other stuff
};
}
Simple: walk through each AST node, evaluate it, return everything at the end.
Evaluating Nodes
evaluateNode() is a big switch statement that dispatches to specific evaluators:
evaluateNode(node) {
switch (node.type) {
case 'param':
return this.evaluateParam(node);
case 'shape':
return this.evaluateShape(node);
case 'boolean_operation':
return this.evaluateBooleanOperation(node);
// ... etc
}
}
Each evaluator knows how to handle its specific node type.
Parameter Evaluation
When you see param size 100:
- Evaluate the value expression (in this case, just 100)
- Store it in
env.parameterswith key "size"
Later, when the interpreter sees param.size in an expression, it looks it up in the parameters map.
Gotcha: Parameters are evaluated eagerly. If you do param x 10 + 5, the value stored is 15, not the expression 10 + 5.
Shape Evaluation
This is where shapes get created:
- Generate a unique name (handles loops/functions - more on that later)
- Evaluate all the parameter expressions
- Process and validate the parameters
- Create the shape object
- Store it in
env.shapes
Important: Shape names need to be unique. If you're in a loop, the same shape name gets used multiple times, so we append the loop counter: c1_0, c1_1, etc.
Expression Evaluation
Expressions can be:
- Literals (numbers, strings)
- Identifiers (parameters, shape references)
- Binary operations (
+,-,*,/, etc.) - Function calls
- Arrays
The evaluator recursively evaluates sub-expressions. For 10 + param.size, it:
- Evaluates
10→ 10 - Evaluates
param.size→ looks up "size" in parameters → 100 - Applies
+operator → 110
Gotcha: Shape references in expressions return the shape name as a string, not the shape object. This is for boolean operations - you reference shapes by name.
Boolean Operations
Boolean operations (union, difference, intersection) are interesting:
- Get the shape names from the AST
- Look up the actual shape objects
- Call the boolean operator (uses Vatti clipping algorithm)
- Mark the original shapes as
_consumedByBoolean = true(so they don't render) - Create a new result shape
- Store it
The result shape has type set to the operation name ("union", "difference", etc.) and the renderer knows how to handle it.
Important: The original shapes are still in the map, but they're marked as consumed. The renderer skips them. This is simpler than deleting them, which would break references.
Control Flow
If statements: Evaluate the condition, if true evaluate the then body, else evaluate the else body (if it exists). Pretty standard.
For loops: This is where the name mangling happens. When you're in a loop, this.currentLoopCounter is set. When creating shapes, the name gets the counter appended: c1_0, c1_1, etc.
Gotcha: The loop counter is a property on the interpreter instance. If you have nested loops, the inner one overwrites the outer one. This is a known limitation - nested loops with the same shape names will conflict.
Functions
Functions are stored in this.functions (separate from the environment). When you call a function:
- Save the current parameter map
- Bind the arguments to parameter names
- Execute the function body
- Restore the old parameter map
This creates a new scope. Parameters defined in the function don't leak out, and parameters from outside are still accessible (unless shadowed).
Gotcha: Function calls also do name mangling. If a function creates a shape named c1 and you call it twice, you get c1_myFunc_0 and c1_myFunc_1. This prevents name collisions.
What Gets Returned
At the end, the interpreter returns an object with:
parameters- all the parametersshapes- all the shapes (this is what the renderer uses)layers- all the layersfunctions- function definitionsconstraints- constraint definitionsresult- the last evaluated value
The renderer takes result.shapes and draws them.
Common Gotchas
Name Collisions
Shape names need to be unique. The interpreter handles this with name mangling in loops and functions, but if you manually create shapes with the same name, the later one overwrites the earlier one. No error is thrown - it just silently overwrites.
Parameter Lookup
When evaluating an identifier, the interpreter checks:
- Is it a parameter? → return its value
- Is it a shape? → return its name (as string)
- Otherwise → error
This means you can't have a parameter and a shape with the same name. The parameter wins.
Expression Evaluation Order
Expressions are evaluated left-to-right, but operator precedence is respected. So 2 + 3 * 4 is 2 + (3 * 4) = 14, not (2 + 3) * 4 = 20.
Boolean Operations
After a boolean operation, the original shapes are still in the map but marked as consumed. Don't try to use them in another boolean operation - use the result shape instead.
Error Messages
All errors should include line and column numbers. The lexer tracks this, the parser passes it through, and the interpreter should preserve it. If you're adding new code, make sure errors are helpful.
How to Add Features
Adding a New Shape Type
The lexer and parser already handle shape <type> generically, so you usually don't need to change them. You need to:
- Add shape creation logic (usually in
Shapes.mjsor the interpreter) - Add rendering support (in the renderer)
The shape type is just a string - "circle", "rectangle", etc. The interpreter and renderer need to know what to do with it.
Adding a New Operator
- Lexer: Add character handling (if it's a single char) or a parsing method (if it's multi-char)
- Parser: Add to expression parsing with the right precedence
- Interpreter: Add evaluation logic in
applyBinaryOperator()or wherever makes sense
For example, to add ^ for exponentiation:
- Lexer: handle
^character →POWERtoken - Parser: add to
parseFactor()or create a new precedence level - Interpreter:
case 'power': return Math.pow(left, right)
Adding a New Control Structure
- Lexer: Add keyword
- Parser: Add
parseMyNewThing()method - Parser: Add case to
parseStatement() - Interpreter: Add case to
evaluateNode()and implement the logic
For example, to add while loops:
- Lexer:
'while': 'WHILE' - Parser:
parseWhileStatement()that parses condition and body - Interpreter:
evaluateWhileStatement()that loops while condition is true
How to Build the Language System - Complete Step-by-Step Guide
This section provides a complete, step-by-step guide for building the entire language system (Lexer, Parser, Interpreter) from scratch.
Prerequisites
Before building the language system, you need:
- Basic JavaScript knowledge
- Understanding of tokenization, parsing, and interpretation concepts
- A text editor and browser for testing
Part 1: Building the Lexer
Step 1.1: Create the Token Class
File: src/lexer.mjs
What You're Building: The Token class represents a single token in the source code. Every piece of code (keywords, identifiers, numbers, operators) becomes a Token object. This class stores the token's type, value, and position information.
Why This Class Exists: Tokens are the building blocks of parsing. Instead of working with raw characters, the lexer converts characters into tokens, which are easier for the parser to work with. The Token class provides a structured way to represent these tokens with all necessary information.
Understanding Each Property:
type: The category of token (e.g., 'IDENTIFIER', 'NUMBER', 'SHAPE', 'LBRACE'). This tells the parser what kind of token it is.value: The actual content of the token (e.g., "circle" for an identifier, 50 for a number, "{" for a brace). This is the data the parser needs.line: The line number where this token appears in the source code. Critical for error messages - users need to know where errors occurred.column: The column number where this token starts. Also critical for precise error reporting.
Why Store Position: When the parser encounters an error, it needs to tell the user exactly where the problem is. "Error at line 5, column 12" is much more helpful than "Error somewhere in your code". The line and column information comes from the lexer's position tracking.
How to Build It:
Step 1.1.1: Create the Class Structure Start by creating a class that will hold all token information:
export class Token {
constructor(type, value, line, column) {
this.type = type; // Token type: 'IDENTIFIER', 'NUMBER', etc.
this.value = value; // Token value: actual string/number
this.line = line; // Line number (for error messages)
this.column = column; // Column number (for error messages)
}
Step 1.1.2: Add toString Method (Optional but Helpful) Add a method to convert the token to a string for debugging:
toString() {
return `Token(${this.type}, ${this.value}, ${this.line}:${this.column})`;
}
}
Why toString(): This method is helpful for debugging. When you log a token, you'll see a readable representation like "Token(NUMBER, 50, 1:10)" instead of "[object Object]".
The Complete Token Class:
export class Token {
constructor(type, value, line, column) {
this.type = type; // Token type: 'IDENTIFIER', 'NUMBER', etc.
this.value = value; // Token value: actual string/number
this.line = line; // Line number (for error messages)
this.column = column; // Column number (for error messages)
}
toString() {
return `Token(${this.type}, ${this.value}, ${this.line}:${this.column})`;
}
}
Building This Step by Step:
- Create a new file
src/lexer.mjs - Export a class called
Token - Add constructor with four parameters:
type,value,line,column - Store each parameter as an instance property (
this.type,this.value, etc.) - Add optional
toString()method for debugging - This class will be used throughout the lexer to create token objects
Test:
const token = new Token('NUMBER', 50, 1, 10);
console.log(token.toString()); // Token(NUMBER, 50, 1:10)
Step 1.2: Create the Basic Lexer Structure
What You're Building: The Lexer class is the foundation of the tokenization system. It reads characters from the source code one at a time and converts them into tokens. This step creates the basic structure with position tracking and helper methods.
Why This Structure: The lexer needs to track its position in the source code, know what character it's currently looking at, and be able to move forward. It also needs helper methods to peek ahead, advance position, and report errors with precise location information.
How to Build It Step by Step:
Step 1.2.1: Create the Lexer Class and Constructor Start with the class definition and constructor that initializes all tracking variables:
export class Lexer {
constructor(input) {
// Step 1.2.1.1: Store the input string
// This is the entire source code that needs to be tokenized
this.input = input;
// Step 1.2.1.2: Initialize position tracking
// Position is a zero-based index into the input string
// We start at position 0 (first character)
this.position = 0;
// Step 1.2.1.3: Initialize line and column tracking
// Line and column start at 1 (human-readable, not zero-based)
// These are used for error messages
this.line = 1;
this.column = 1;
// Step 1.2.1.4: Get the current character
// If input is empty, currentChar will be null
// Otherwise, it's the character at position 0
this.currentChar = this.input[0] || null;
}
Why These Properties:
input: Stores the entire source code. The lexer needs to read through this character by character.position: Zero-based index tracking where we are in the string. Used to access characters viainput[position].lineandcolumn: Human-readable position (starting at 1). Essential for error messages that users can understand.currentChar: The character we're currently examining.nullmeans we've reached the end of input.
Step 1.2.2: Implement the advance() Method
This method moves the lexer forward by one character:
advance() {
// Step 1.2.2.1: Move position forward
// Increment the position index to point to the next character
this.position++;
// Step 1.2.2.2: Check if we've reached the end
// If position is beyond the input length, we're at end of file
if (this.position >= this.input.length) {
this.currentChar = null; // Signal end of input
} else {
// Step 1.2.2.3: Update current character
// Get the character at the new position
this.currentChar = this.input[this.position];
// Step 1.2.2.4: Update column number
// Moving forward horizontally increases the column
this.column++;
}
}
Why This Method: Every time the lexer consumes a character, it needs to move forward. This method handles that movement and updates all tracking variables. It's called constantly throughout tokenization.
Important Note About Line Tracking:
Notice that advance() doesn't update line. That's because line increments happen in skipWhitespace() when a newline character is encountered. This separation keeps the logic clear.
Step 1.2.3: Implement the peek() Method
This method looks ahead at the next character without consuming it:
peek() {
// Step 1.2.3.1: Check if there's a next character
// If position + 1 is beyond input length, there's nothing ahead
if (this.position + 1 >= this.input.length) {
return null; // No next character
}
// Step 1.2.3.2: Return the next character
// Return the character at position + 1 without moving position
return this.input[this.position + 1];
}
Why This Method:
Sometimes you need to look ahead to decide what to do. For example, = could be assignment or == could be equality. By peeking ahead, you can check if the next character is also = before deciding which token to create. The key is that peek() doesn't call advance(), so it doesn't consume the character.
Step 1.2.4: Implement the error() Method
This method reports errors with precise location information:
error(message) {
// Step 1.2.4.1: Throw error with position information
// Include line and column so users know exactly where the problem is
throw new Error(`Lexer error at line ${this.line}, col ${this.column}: ${message}`);
}
}
Why This Method: When the lexer encounters something it can't handle (like an unexpected character), it needs to report an error. Including line and column information makes debugging much easier. Users can go directly to the problematic location in their code.
The Complete Basic Structure:
export class Lexer {
constructor(input) {
this.input = input; // Source code string
this.position = 0; // Current character position
this.line = 1; // Current line number
this.column = 1; // Current column number
this.currentChar = this.input[0] || null; // Current character
}
advance() {
this.position++;
if (this.position >= this.input.length) {
this.currentChar = null; // End of input
} else {
this.currentChar = this.input[this.position];
this.column++;
}
}
peek() {
// Look ahead one character without consuming it
if (this.position + 1 >= this.input.length) {
return null;
}
return this.input[this.position + 1];
}
error(message) {
throw new Error(`Lexer error at line ${this.line}, col ${this.column}: ${message}`);
}
}
Building This Step by Step:
- Add the Lexer class to
src/lexer.mjs(same file as Token class) - Create constructor that takes
inputparameter - Initialize
inputproperty with the source code string - Initialize
positionto 0 (start at beginning) - Initialize
lineto 1 (first line) - Initialize
columnto 1 (first column) - Initialize
currentCharto first character (or null if empty) - Create
advance()method that moves position forward - Update
currentCharwhen advancing - Update
columnwhen advancing (but notline- that's handled elsewhere) - Set
currentCharto null when reaching end of input - Create
peek()method that returns next character without consuming - Return null if no next character exists
- Create
error()method that throws error with line/column information - This basic structure provides the foundation for all tokenization
Test:
const lexer = new Lexer('hello');
console.log(lexer.currentChar); // 'h'
lexer.advance();
console.log(lexer.currentChar); // 'e'
Step 1.3: Implement Whitespace and Comment Skipping
What You're Building: Methods to skip over whitespace characters and comments. These don't produce tokens - they're just ignored during tokenization. However, they're important for tracking line numbers correctly.
Why These Methods: Whitespace and comments are not meaningful tokens - they're just formatting. The lexer needs to skip over them without creating tokens. However, newlines in whitespace are important because they affect line number tracking.
How to Build It Step by Step:
Step 1.3.1: Implement skipWhitespace() Method
This method consumes all consecutive whitespace characters:
skipWhitespace() {
// Step 1.3.1.1: Loop while current character is whitespace
// /\s/ matches any whitespace: space, tab, newline, etc.
while (this.currentChar && /\s/.test(this.currentChar)) {
// Step 1.3.1.2: Check for newline character
// Newlines are special - they increment the line number
if (this.currentChar === '\n') {
this.line++; // Move to next line
this.column = 1; // Reset column to 1 (start of new line)
}
// Step 1.3.1.3: Advance past the whitespace character
// This consumes the character and moves to the next one
this.advance();
}
}
Why Handle Newlines Separately:
When a newline is encountered, we need to increment the line number and reset the column to 1. This ensures that line/column tracking stays accurate. The advance() method increments column, but for newlines, we want to reset it to 1 instead.
Step 1.3.2: Implement skipComment() Method
This method consumes single-line comments (starting with //):
skipComment() {
// Step 1.3.2.1: Skip the first forward slash
// We know currentChar is '/' and peek() shows another '/'
// So we consume the first one
this.advance(); // Skip first /
// Step 1.3.2.2: Skip the second forward slash
// Now we're at the second '/', consume it too
this.advance(); // Skip second /
// Step 1.3.2.3: Skip all characters until newline
// Comments continue until the end of the line
// We loop until we hit a newline or end of file
while (this.currentChar !== null && this.currentChar !== '\n') {
this.advance(); // Skip until newline
}
// Note: The newline itself is NOT consumed here
// It will be handled by skipWhitespace() if called next
}
Why This Approach:
Comments start with // and continue until the end of the line. We consume both slashes, then skip all characters until we hit a newline. The newline itself is not consumed - it will be handled by skipWhitespace() if that's called next, which ensures line tracking works correctly.
The Complete Methods:
skipWhitespace() {
while (this.currentChar && /\s/.test(this.currentChar)) {
if (this.currentChar === '\n') {
this.line++;
this.column = 1; // Reset column on newline
}
this.advance();
}
}
skipComment() {
// Skip // comments
this.advance(); // Skip first /
this.advance(); // Skip second /
while (this.currentChar !== null && this.currentChar !== '\n') {
this.advance(); // Skip until newline
}
}
Building This Step by Step:
- Create
skipWhitespace()method in the Lexer class - Add while loop that continues while current character is whitespace
- Check if current character is newline
- If newline, increment line and reset column to 1
- Call
advance()to consume the whitespace character - Create
skipComment()method - Call
advance()twice to skip both forward slashes - Add while loop that continues until newline or end of file
- Call
advance()to skip each comment character - These methods ensure whitespace and comments don't create tokens
Step 1.4: Implement Number Reading
What You're Building:
A method that reads numeric literals from the source code. This handles both integers (like 50) and floating-point numbers (like 3.14). The method accumulates digits, optionally handles a decimal point, and converts the string to a number.
Why This Method: Numbers in source code are sequences of digits, possibly with a decimal point. The lexer needs to recognize these sequences and convert them into NUMBER tokens. This method handles the reading and conversion process.
How to Build It Step by Step:
Step 1.4.1: Initialize Result String and Read Integer Part Start by reading all consecutive digits:
number() {
// Step 1.4.1.1: Initialize empty string to accumulate digits
// We'll build the number string character by character
let result = '';
// Step 1.4.1.2: Read all consecutive digits
// /\d/ matches any digit (0-9)
// Continue reading as long as we have digits
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar; // Add digit to result string
this.advance(); // Move to next character
}
Why Build String First: We accumulate digits into a string first, then convert to a number at the end. This is simpler than trying to build the number mathematically, and handles both integers and decimals uniformly.
Step 1.4.2: Handle Decimal Point (Optional) Check if there's a decimal point and read fractional digits:
// Step 1.4.2.1: Check for decimal point
// If the next character is '.', we have a floating-point number
if (this.currentChar === '.') {
result += '.'; // Add decimal point to result
this.advance(); // Move past the decimal point
// Step 1.4.2.2: Read fractional digits
// After the decimal point, read all consecutive digits
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar; // Add digit to result
this.advance(); // Move to next character
}
}
Why Two-Stage Reading: We read the integer part first, then check for a decimal point. If there's a decimal point, we read the fractional part. This two-stage approach correctly handles both integers (no decimal point) and floats (with decimal point).
Step 1.4.3: Convert to Number and Return Token Convert the accumulated string to a number and create a token:
// Step 1.4.3.1: Convert string to number
// parseFloat() converts the string representation to an actual number
// This handles both integers and floating-point numbers
const numValue = parseFloat(result);
// Step 1.4.3.2: Create and return NUMBER token
// The token contains the numeric value, not the string
return new Token('NUMBER', numValue, this.line, this.column);
}
Why parseFloat():
parseFloat() converts the string to an actual JavaScript number. This ensures the token's value is a number type, not a string. The parser and interpreter can then use it in mathematical operations.
Important Note About Negative Numbers:
This method doesn't handle negative numbers. -50 would be tokenized as two tokens: MINUS and NUMBER(50). The parser handles the negation. This keeps the lexer simple and follows common language design patterns.
The Complete Method:
number() {
let result = '';
// Read digits
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check for decimal point
if (this.currentChar === '.') {
result += '.';
this.advance();
// Read fractional digits
while (this.currentChar && /\d/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
}
// Convert to number
const numValue = parseFloat(result);
return new Token('NUMBER', numValue, this.line, this.column);
}
Building This Step by Step:
- Create
number()method in the Lexer class - Initialize empty
resultstring - Add while loop to read consecutive digits
- Append each digit to result and advance
- Check if current character is decimal point
- If yes, append decimal point and advance
- Add while loop to read fractional digits
- Append each fractional digit and advance
- Convert result string to number using
parseFloat() - Create and return NUMBER token with numeric value
- This method correctly handles both integers and floating-point numbers
Test:
const lexer = new Lexer('123 45.67');
const token1 = lexer.number(); // NUMBER, 123
lexer.skipWhitespace();
const token2 = lexer.number(); // NUMBER, 45.67
Step 1.5: Implement Identifier and Keyword Reading
What You're Building:
A method that reads identifiers (variable names, shape names) and keywords (reserved words like shape, param, if). Identifiers can contain letters, numbers, and underscores. Keywords are special identifiers that have meaning in the language.
Why This Method: Identifiers and keywords both start with a letter or underscore. The lexer reads the entire sequence first, then checks if it's a keyword. If it's a keyword, it returns a keyword token. Otherwise, it returns an IDENTIFIER token.
How to Build It Step by Step:
Step 1.5.1: Read Identifier Characters Start by reading all characters that can be part of an identifier:
identifier() {
// Step 1.5.1.1: Initialize empty string to accumulate characters
let result = '';
// Step 1.5.1.2: Read identifier characters
// Identifiers can contain: letters (a-z, A-Z), digits (0-9), underscores (_)
// Continue reading as long as we have valid identifier characters
while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
result += this.currentChar; // Add character to result
this.advance(); // Move to next character
}
Why This Pattern:
The regex /[a-zA-Z0-9_]/ matches any letter (uppercase or lowercase), any digit, or an underscore. This is the standard pattern for identifiers in most programming languages. The loop continues until we hit a character that can't be part of an identifier (like a space, operator, or punctuation).
Step 1.5.2: Check if It's a Keyword After reading the identifier, check if it matches a keyword:
// Step 1.5.2.1: Define keywords object
// This maps keyword strings (lowercase) to their token types
const keywords = {
'shape': 'SHAPE',
'param': 'PARAM',
'if': 'IF',
'else': 'ELSE',
'for': 'FOR',
'from': 'FROM',
'to': 'TO',
'step': 'STEP',
'union': 'UNION',
'difference': 'DIFFERENCE',
'intersection': 'INTERSECTION',
'add': 'ADD',
'subtract': 'SUBTRACT',
'true': 'TRUE',
'false': 'FALSE',
'and': 'AND',
'or': 'OR',
'not': 'NOT'
};
Why Keywords Object: This object maps keyword strings to their token types. When we read an identifier, we check if it matches a keyword. If it does, we return a keyword token. Otherwise, we return an IDENTIFIER token.
Step 1.5.3: Determine Token Type and Return Check if the identifier is a keyword, then create the appropriate token:
// Step 1.5.3.1: Convert to lowercase for comparison
// Keywords are case-insensitive, so we compare in lowercase
const lowerResult = result.toLowerCase();
// Step 1.5.3.2: Check if it's a keyword
// If it's in the keywords object, use the keyword token type
// Otherwise, it's a regular identifier
const tokenType = keywords[lowerResult] || 'IDENTIFIER';
// Step 1.5.3.3: Create and return token
// Use the original result (preserving case) as the value
// This allows identifiers to be case-sensitive while keywords are case-insensitive
return new Token(tokenType, result, this.line, this.column);
}
Why Case-Insensitive Keywords:
Keywords are case-insensitive (shape, Shape, SHAPE all mean the same thing), but identifiers are case-sensitive (circle and Circle are different). We convert to lowercase for keyword comparison, but preserve the original case in the token value.
The Complete Method:
identifier() {
let result = '';
// Read identifier characters
while (this.currentChar && /[a-zA-Z0-9_]/.test(this.currentChar)) {
result += this.currentChar;
this.advance();
}
// Check if it's a keyword
const keywords = {
'shape': 'SHAPE',
'param': 'PARAM',
'if': 'IF',
'else': 'ELSE',
'for': 'FOR',
'from': 'FROM',
'to': 'TO',
'step': 'STEP',
'union': 'UNION',
'difference': 'DIFFERENCE',
'intersection': 'INTERSECTION',
'add': 'ADD',
'subtract': 'SUBTRACT',
'true': 'TRUE',
'false': 'FALSE',
'and': 'AND',
'or': 'OR',
'not': 'NOT'
};
const lowerResult = result.toLowerCase();
const tokenType = keywords[lowerResult] || 'IDENTIFIER';
return new Token(tokenType, result, this.line, this.column);
}
Building This Step by Step:
- Create
identifier()method in the Lexer class - Initialize empty
resultstring - Add while loop to read identifier characters (letters, digits, underscores)
- Append each character to result and advance
- Define keywords object mapping strings to token types
- Convert result to lowercase for keyword comparison
- Check if lowercase result is in keywords object
- If yes, use keyword token type; otherwise use 'IDENTIFIER'
- Create and return token with determined type and original case value
- This method correctly distinguishes keywords from identifiers
Step 1.6: Implement String Reading
What You're Building:
A method that reads string literals enclosed in double quotes. This includes handling escape sequences like \n (newline), \t (tab), \" (quote), and \\ (backslash). The method extracts the string content and converts escape sequences to their actual characters.
Why This Method: Strings in source code are sequences of characters between quotes. The lexer needs to extract the string content, handle escape sequences, and create a STRING token. This is more complex than reading numbers or identifiers because of escape sequences.
How to Build It Step by Step:
Step 1.6.1: Initialize and Skip Opening Quote Start by skipping the opening double quote:
parseString() {
// Step 1.6.1.1: Initialize empty string to accumulate characters
let result = '';
// Step 1.6.1.2: Skip opening quote
// We know currentChar is '"' (that's how we got here)
// Advance past it to start reading the string content
this.advance(); // Skip opening quote
Why Skip Opening Quote: The opening quote is just a delimiter - it's not part of the string content. We skip it immediately so we can start reading the actual string characters.
Step 1.6.2: Read String Content with Escape Sequence Handling Loop through characters until we find the closing quote:
// Step 1.6.2.1: Loop until closing quote or end of file
// Continue reading as long as we haven't hit the closing quote
while (this.currentChar !== null && this.currentChar !== '"') {
// Step 1.6.2.2: Check for escape sequence
// If current character is backslash, it's an escape sequence
if (this.currentChar === '\\') {
// Step 1.6.2.3: Skip the backslash
this.advance(); // Skip backslash
// Step 1.6.2.4: Handle escape sequences
// The character after backslash determines what to escape to
if (this.currentChar === 'n') {
result += '\n'; // Newline character
} else if (this.currentChar === 't') {
result += '\t'; // Tab character
} else if (this.currentChar === '"') {
result += '"'; // Literal quote
} else if (this.currentChar === '\\') {
result += '\\'; // Literal backslash
} else {
// Step 1.6.2.5: Unknown escape sequence
// If we don't recognize it, use the character as-is
// This is lenient - some lexers would error here
result += this.currentChar;
}
this.advance(); // Move past the escape character
} else {
// Step 1.6.2.6: Regular character (not escaped)
// Just add it to the result string
result += this.currentChar;
this.advance();
}
}
Why Handle Escape Sequences:
Escape sequences allow users to include special characters in strings. \n becomes a newline, \" becomes a literal quote (so you can have quotes inside strings), etc. Without escape sequences, you couldn't have quotes or newlines in strings.
Step 1.6.3: Validate Closing Quote and Return Token Check that we found a closing quote (not end of file):
// Step 1.6.3.1: Check if we found closing quote
// If currentChar is '"', we successfully found the end
if (this.currentChar === '"') {
this.advance(); // Skip closing quote
} else {
// Step 1.6.3.2: Error - string never closed
// If we reached end of file without finding closing quote, it's an error
this.error('Unterminated string literal');
}
// Step 1.6.3.3: Create and return STRING token
// The result string contains the fully parsed string (with escape sequences resolved)
return new Token('STRING', result, this.line, this.column);
}
Why Validate Closing Quote: If we reach the end of the file without finding a closing quote, the string is unterminated - a syntax error. We need to report this error so the user knows their code is malformed.
The Complete Method:
parseString() {
let result = '';
this.advance(); // Skip opening quote
while (this.currentChar !== null && this.currentChar !== '"') {
if (this.currentChar === '\\') {
// Escape sequence
this.advance(); // Skip backslash
if (this.currentChar === 'n') {
result += '\n';
} else if (this.currentChar === 't') {
result += '\t';
} else if (this.currentChar === '"') {
result += '"';
} else if (this.currentChar === '\\') {
result += '\\';
} else {
result += this.currentChar; // Unknown escape, use as-is
}
this.advance();
} else {
result += this.currentChar;
this.advance();
}
}
if (this.currentChar === '"') {
this.advance(); // Skip closing quote
} else {
this.error('Unterminated string literal');
}
return new Token('STRING', result, this.line, this.column);
}
Building This Step by Step:
- Create
parseString()method in the Lexer class - Initialize empty
resultstring - Call
advance()to skip opening quote - Add while loop that continues until closing quote or end of file
- Check if current character is backslash (escape sequence)
- If backslash, advance past it and check next character
- Handle known escape sequences (
\n,\t,\",\\) - For unknown escapes, use character as-is
- For regular characters, add to result
- Advance after handling each character
- After loop, check if we found closing quote
- If yes, advance past it; if no, throw error
- Create and return STRING token with parsed result
- This method correctly handles strings with escape sequences
Step 1.7: Implement the Main Tokenization Loop
What You're Building:
The main getNextToken() method that orchestrates the entire tokenization process. This method is called repeatedly to get the next token from the source code. It uses all the helper methods we've built to recognize different token types.
Why This Method:
This is the heart of the lexer. It continuously loops through the source code, recognizing different token types and delegating to the appropriate parsing methods. The order of checks is important - some patterns need to be checked before others (e.g., == before =).
How to Build It Step by Step:
Step 1.7.1: Create the Main Loop Structure Start with a loop that continues until end of file:
getNextToken() {
// Step 1.7.1.1: Main loop - continue until end of input
// currentChar is null when we've reached the end
while (this.currentChar !== null) {
Why While Loop:
We loop until we've processed all characters. Each iteration produces one token (or skips whitespace/comments). The loop continues until currentChar is null (end of file).
Step 1.7.2: Skip Whitespace and Comments First Handle whitespace and comments before checking for tokens:
// Step 1.7.2.1: Skip whitespace
// Whitespace doesn't produce tokens, so skip it and continue
if (/\s/.test(this.currentChar)) {
this.skipWhitespace();
continue; // Skip to next iteration
}
// Step 1.7.2.2: Skip comments
// Comments also don't produce tokens
// Check for '//' by looking at current char and next char
if (this.currentChar === '/' && this.peek() === '/') {
this.skipComment();
continue; // Skip to next iteration
}
Why Check These First:
Whitespace and comments don't produce tokens - they're just formatting. We check for them first and skip them immediately using continue. This keeps the main logic clean.
Step 1.7.3: Handle Numbers, Identifiers, and Strings Check for tokens that need special parsing:
// Step 1.7.3.1: Numbers
// If current character is a digit, it's the start of a number
if (/\d/.test(this.currentChar)) {
return this.number(); // Parse and return NUMBER token
}
// Step 1.7.3.2: Identifiers and keywords
// If current character is a letter or underscore, it's an identifier/keyword
if (/[a-zA-Z_]/.test(this.currentChar)) {
return this.identifier(); // Parse and return IDENTIFIER or keyword token
}
// Step 1.7.3.3: Strings
// If current character is a double quote, it's a string literal
if (this.currentChar === '"') {
return this.parseString(); // Parse and return STRING token
}
// Step 1.7.3.4: Hex colors
// If current character is '#', it's a hex color
if (this.currentChar === '#') {
return this.parseHexColor(); // Parse and return HEXCOLOR token
}
Why Return Immediately:
These methods (number(), identifier(), parseString(), parseHexColor()) handle the entire token parsing and return a complete token. We return immediately because we've found and parsed a token.
Step 1.7.4: Handle Multi-Character Operators Check for operators that need lookahead:
// Step 1.7.4.1: Check for == before =
// Order matters! If we check = first, == becomes two ASSIGN tokens
if (this.currentChar === '=' && this.peek() === '=') {
this.advance(); // Skip first =
this.advance(); // Skip second =
return new Token('EQUALS', '==', this.line, this.column);
}
// Step 1.7.4.2: Single = (assignment)
if (this.currentChar === '=') {
this.advance();
return new Token('ASSIGN', '=', this.line, this.column);
}
Why Check == Before =:
If we checked = first, == would be tokenized as two ASSIGN tokens instead of one EQUALS token. By checking == first, we correctly recognize the equality operator.
Step 1.7.5: Handle Single-Character Tokens Use a lookup table for simple single-character tokens:
// Step 1.7.5.1: Single character tokens
// These are simple - one character, one token type
const singleCharTokens = {
'{': 'LBRACE',
'}': 'RBRACE',
'(': 'LPAREN',
')': 'RPAREN',
'[': 'LBRACKET',
']': 'RBRACKET',
',': 'COMMA',
':': 'COLON',
';': 'SEMICOLON',
'+': 'PLUS',
'-': 'MINUS',
'*': 'MULTIPLY',
'/': 'DIVIDE',
'%': 'MODULO',
'<': 'LESS',
'>': 'GREATER',
'!': 'NOT'
};
// Step 1.7.5.2: Check if current character is a single-char token
if (singleCharTokens[this.currentChar]) {
const tokenType = singleCharTokens[this.currentChar];
const value = this.currentChar;
this.advance(); // Consume the character
return new Token(tokenType, value, this.line, this.column);
}
Why Lookup Table:
A lookup table is cleaner than a long chain of if statements. It makes it easy to add new single-character tokens and keeps the code readable.
Step 1.7.6: Handle Unknown Characters and End of File Report errors for unknown characters and return EOF token:
// Step 1.7.6.1: Unknown character
// If we get here, we don't recognize this character
// This is a syntax error
this.error(`Unexpected character: ${this.currentChar}`);
}
// Step 1.7.6.2: End of file
// If we exit the loop, we've reached the end of input
// Return EOF token to signal no more tokens
return new Token('EOF', null, this.line, this.column);
}
Why EOF Token:
The parser needs to know when there are no more tokens. The EOF (End Of File) token signals this. It's returned when the loop exits (meaning currentChar is null).
The Complete Method:
getNextToken() {
while (this.currentChar !== null) {
// Skip whitespace
if (/\s/.test(this.currentChar)) {
this.skipWhitespace();
continue;
}
// Skip comments
if (this.currentChar === '/' && this.peek() === '/') {
this.skipComment();
continue;
}
// Numbers
if (/\d/.test(this.currentChar)) {
return this.number();
}
// Identifiers and keywords
if (/[a-zA-Z_]/.test(this.currentChar)) {
return this.identifier();
}
// Strings
if (this.currentChar === '"') {
return this.parseString();
}
// Hex colors
if (this.currentChar === '#') {
return this.parseHexColor();
}
// Operators and punctuation
if (this.currentChar === '=' && this.peek() === '=') {
this.advance();
this.advance();
return new Token('EQUALS', '==', this.line, this.column - 1);
}
if (this.currentChar === '=') {
this.advance();
return new Token('ASSIGN', '=', this.line, this.column);
}
// Single character tokens
const singleCharTokens = {
'{': 'LBRACE',
'}': 'RBRACE',
'(': 'LPAREN',
')': 'RPAREN',
'[': 'LBRACKET',
']': 'RBRACKET',
',': 'COMMA',
':': 'COLON',
';': 'SEMICOLON',
'+': 'PLUS',
'-': 'MINUS',
'*': 'MULTIPLY',
'/': 'DIVIDE',
'%': 'MODULO',
'<': 'LESS',
'>': 'GREATER',
'!': 'NOT'
};
if (singleCharTokens[this.currentChar]) {
const tokenType = singleCharTokens[this.currentChar];
const value = this.currentChar;
this.advance();
return new Token(tokenType, value, this.line, this.column);
}
// Unknown character
this.error(`Unexpected character: ${this.currentChar}`);
}
// End of file
return new Token('EOF', null, this.line, this.column);
}
Building This Step by Step:
- Create
getNextToken()method in the Lexer class - Add while loop that continues until
currentCharis null - Check for whitespace first, skip it and continue
- Check for comments, skip them and continue
- Check for numbers (digits), return
number()result - Check for identifiers/keywords (letters/underscore), return
identifier()result - Check for strings (double quote), return
parseString()result - Check for hex colors (hash), return
parseHexColor()result - Check for multi-character operators (== before =)
- Create single-character tokens lookup table
- Check lookup table for single-character tokens
- If found, create token and return
- If unknown character, throw error
- After loop, return EOF token
- This method orchestrates the entire tokenization process
Test the complete lexer:
const code = 'shape circle c1 { radius: 50 }';
const lexer = new Lexer(code);
let token = lexer.getNextToken();
while (token.type !== 'EOF') {
console.log(token);
token = lexer.getNextToken();
}
// Should output: SHAPE, IDENTIFIER(circle), IDENTIFIER(c1), LBRACE,
// IDENTIFIER(radius), COLON, NUMBER(50), RBRACE
Part 2: Building the Parser
Step 2.1: Create the Basic Parser Structure
File: src/parser.mjs
What You're Building:
The Parser class is the foundation of the parsing system. It takes tokens from the lexer and builds an Abstract Syntax Tree (AST). This step creates the basic structure with token lookahead, error handling, and the eat() method for consuming tokens.
Why This Structure:
The parser uses a "lookahead" approach - it always has the next token ready in currentToken. This allows it to make decisions based on what's coming next. The eat() method ensures tokens are consumed in the correct order according to the grammar.
How to Build It Step by Step:
Step 2.1.1: Create the Parser Class and Constructor Start with the class definition and constructor:
import { Lexer } from './lexer.mjs';
export class Parser {
constructor(lexer) {
// Step 2.1.1.1: Store the lexer
// The parser needs the lexer to get tokens
this.lexer = lexer;
// Step 2.1.1.2: Get the first token (lookahead)
// We always keep one token ahead - this is called "lookahead"
// This allows us to peek at the next token without consuming it
this.currentToken = this.lexer.getNextToken();
}
Why Lookahead:
The parser needs to know what token is coming next to make parsing decisions. For example, to parse shape circle c1, we need to see the SHAPE token first, then the IDENTIFIER(circle), etc. By keeping currentToken always set to the next token, we can check it before consuming it.
Step 2.1.2: Implement the error() Method
Create a method for reporting parsing errors:
error(message) {
// Step 2.1.2.1: Get position information from current token
// If currentToken exists, use its line/column
// Otherwise, use 'unknown' (shouldn't happen, but defensive)
const line = this.currentToken ? this.currentToken.line : 'unknown';
const column = this.currentToken ? this.currentToken.column : 'unknown';
// Step 2.1.2.2: Throw error with position information
throw new Error(`Parser error at line ${line}, col ${column}: ${message}`);
}
Why This Error Method:
Parsing errors need to include position information so users know where the problem is. We get the position from currentToken, which is the token that caused the error.
Step 2.1.3: Implement the eat() Method
This is the core method for consuming tokens:
eat(tokenType) {
// Step 2.1.3.1: Check if current token matches expected type
// The tokenType parameter is what we expect to see
if (this.currentToken.type === tokenType) {
// Step 2.1.3.2: Token matches - consume it
// Store the token (we might need its value)
const token = this.currentToken;
// Step 2.1.3.3: Move to next token
// Get the next token from the lexer and update currentToken
this.currentToken = this.lexer.getNextToken();
// Step 2.1.3.4: Return the consumed token
// Sometimes we need the token's value, so we return it
return token;
} else {
// Step 2.1.3.5: Token doesn't match - syntax error
// This means the code doesn't match the grammar
this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
}
}
}
Why This Method:
The eat() method is the workhorse of the parser. It ensures tokens are consumed in the correct order. If the current token doesn't match what we expect, it's a syntax error. This enforces the grammar rules strictly.
The Complete Basic Structure:
import { Lexer } from './lexer.mjs';
export class Parser {
constructor(lexer) {
this.lexer = lexer;
this.currentToken = this.lexer.getNextToken(); // Look ahead one token
}
error(message) {
const line = this.currentToken ? this.currentToken.line : 'unknown';
const column = this.currentToken ? this.currentToken.column : 'unknown';
throw new Error(`Parser error at line ${line}, col ${column}: ${message}`);
}
eat(tokenType) {
if (this.currentToken.type === tokenType) {
const token = this.currentToken;
this.currentToken = this.lexer.getNextToken(); // Move to next token
return token;
} else {
this.error(`Expected ${tokenType} but got ${this.currentToken.type}`);
}
}
}
Building This Step by Step:
- Create new file
src/parser.mjs - Import Lexer class
- Export Parser class
- Create constructor that takes lexer parameter
- Store lexer as instance property
- Get first token and store in
currentToken(lookahead) - Create
error()method that throws error with position - Get line and column from currentToken
- Create
eat()method that consumes tokens - Check if currentToken matches expected type
- If matches, store token, advance to next token, return token
- If doesn't match, throw error
- This basic structure provides the foundation for all parsing
Step 2.2: Implement the Main Parse Method
What You're Building:
The main parse() method that orchestrates the parsing of an entire program. It repeatedly calls parseStatement() to parse each statement in the program, building an array of AST nodes that represent the complete program structure.
Why This Method:
A program is a sequence of statements. This method loops through all statements, parsing each one, and returns the complete AST. This is the entry point for parsing - you call parse() on a parser instance to get the AST for a program.
How to Build It Step by Step:
Step 2.2.1: Initialize Statements Array Start with an empty array to collect parsed statements:
parse() {
// Step 2.2.1.1: Initialize statements array
// This will hold all the AST nodes representing statements
const statements = [];
Why Array: A program consists of multiple statements. We need an array to collect all of them. Each statement becomes an AST node in this array.
Step 2.2.2: Loop Through All Statements Parse statements until end of file:
// Step 2.2.2.1: Loop until end of file
// Continue parsing as long as we haven't reached EOF token
while (this.currentToken.type !== 'EOF') {
// Step 2.2.2.2: Parse one statement
// parseStatement() will determine what type of statement it is
// and call the appropriate parsing method
statements.push(this.parseStatement());
}
Why While Loop:
We continue parsing statements until we reach the EOF token. Each call to parseStatement() parses one complete statement and advances the token stream. The loop continues until all statements are parsed.
Step 2.2.3: Return Complete AST Return the array of statement AST nodes:
// Step 2.2.3.1: Return array of AST nodes
// This represents the complete program structure
return statements; // Array of AST nodes
}
Why Return Array: The array of AST nodes represents the complete program. Each node is a statement (shape definition, parameter, etc.). The interpreter will use this AST to execute the program.
The Complete Method:
parse() {
const statements = [];
while (this.currentToken.type !== 'EOF') {
statements.push(this.parseStatement());
}
return statements; // Array of AST nodes
}
Building This Step by Step:
- Create
parse()method in the Parser class - Initialize empty
statementsarray - Add while loop that continues until EOF token
- Call
parseStatement()to parse one statement - Push parsed statement AST node to array
- After loop, return statements array
- This method orchestrates the parsing of an entire program
Step 2.3: Implement Statement Parsing
What You're Building: A dispatcher method that determines which type of statement to parse based on the current token. This method routes to the appropriate parsing method for each statement type.
Why This Method:
Different statements start with different tokens. param starts with PARAM, shape starts with SHAPE, etc. This method checks the current token and routes to the correct parsing method. This keeps the parsing logic organized and modular.
How to Build It Step by Step:
Step 2.3.1: Create the Dispatcher Method Use a switch statement to route based on token type:
parseStatement() {
// Step 2.3.1.1: Check current token type
// The token type tells us what kind of statement this is
switch (this.currentToken.type) {
// Step 2.3.1.2: Handle PARAM statements
// param name = value
case 'PARAM':
return this.parseParam();
// Step 2.3.1.3: Handle SHAPE statements
// shape circle c1 { radius: 50 }
case 'SHAPE':
return this.parseShape();
// Step 2.3.1.4: Handle boolean operations
// union shape1 shape2
case 'UNION':
case 'DIFFERENCE':
case 'INTERSECTION':
return this.parseBooleanOperation();
// Step 2.3.1.5: Handle IF statements
// if condition { ... }
case 'IF':
return this.parseIfStatement();
// Step 2.3.1.6: Handle FOR loops
// for i from 1 to 10 { ... }
case 'FOR':
return this.parseForLoop();
// Step 2.3.1.7: Unknown statement type
// If we don't recognize the token type, it's a syntax error
default:
this.error(`Unexpected token: ${this.currentToken.type}`);
}
}
Why Switch Statement: A switch statement is clean and efficient for routing based on token type. Each case handles a different statement type. The default case catches syntax errors (unexpected tokens).
Why Return Immediately:
Each parsing method (parseParam(), parseShape(), etc.) parses the complete statement and returns an AST node. We return immediately because we've successfully parsed a statement.
The Complete Method:
parseStatement() {
switch (this.currentToken.type) {
case 'PARAM':
return this.parseParam();
case 'SHAPE':
return this.parseShape();
case 'UNION':
case 'DIFFERENCE':
case 'INTERSECTION':
return this.parseBooleanOperation();
case 'IF':
return this.parseIfStatement();
case 'FOR':
return this.parseForLoop();
default:
this.error(`Unexpected token: ${this.currentToken.type}`);
}
}
Building This Step by Step:
- Create
parseStatement()method in the Parser class - Add switch statement on
currentToken.type - Add case for 'PARAM', call
parseParam() - Add case for 'SHAPE', call
parseShape() - Add cases for boolean operations, call
parseBooleanOperation() - Add case for 'IF', call
parseIfStatement() - Add case for 'FOR', call
parseForLoop() - Add default case that throws error for unknown tokens
- This method routes to the appropriate statement parser
Step 2.4: Implement Shape Parsing
What You're Building:
A method that parses shape definitions like shape circle c1 { radius: 50, x: 0 }. This method consumes tokens in a specific order, extracts the shape type, name, and parameters, and returns an AST node representing the shape.
Why This Method:
Shape definitions have a specific grammar: shape keyword, shape type, name, opening brace, properties (key: value pairs), closing brace. This method enforces that grammar by consuming tokens in the correct order.
How to Build It Step by Step:
Step 2.4.1: Consume Shape Keyword and Extract Type
Start by consuming the shape keyword and getting the shape type:
parseShape() {
// Step 2.4.1.1: Consume 'shape' keyword
// This ensures we're actually parsing a shape statement
this.eat('SHAPE');
// Step 2.4.1.2: Get shape type
// The next token should be an identifier (like 'circle', 'rectangle')
const shapeType = this.currentToken.value;
// Step 2.4.1.3: Consume the shape type identifier
this.eat('IDENTIFIER');
Why Get Value Before Eating:
We need to capture the token's value before consuming it. Once we call eat(), the token is consumed and we move to the next token. So we get the value first, then consume.
Step 2.4.2: Extract Shape Name Get the shape name (the identifier after the shape type):
// Step 2.4.2.1: Get shape name
// The next token should be an identifier (like 'c1', 'r1')
const name = this.currentToken.value;
// Step 2.4.2.2: Consume the shape name identifier
this.eat('IDENTIFIER');
Why Two Identifiers:
The first identifier is the shape type (circle, rectangle). The second identifier is the shape name (c1, r1). Both are required by the grammar.
Step 2.4.3: Consume Opening Brace and Parse Properties Start parsing the parameter list:
// Step 2.4.3.1: Consume opening brace
// The '{' marks the start of the parameter list
this.eat('LBRACE');
// Step 2.4.3.2: Initialize parameters object
// This will hold all the shape's properties
const params = {};
// Step 2.4.3.3: Loop through properties
// Continue until we hit the closing brace
while (this.currentToken.type !== 'RBRACE') {
// Step 2.4.3.4: Get property key (name)
const key = this.currentToken.value;
this.eat('IDENTIFIER'); // Consume property name
// Step 2.4.3.5: Consume colon
// Properties use key: value format
this.eat('COLON');
// Step 2.4.3.6: Parse property value
// Values can be expressions (numbers, identifiers, etc.)
const value = this.parseExpression();
// Step 2.4.3.7: Store property in params object
params[key] = value;
// Step 2.4.3.8: Optional comma
// Properties can be separated by commas (optional)
if (this.currentToken.type === 'COMMA') {
this.eat('COMMA');
}
}
Why While Loop:
Properties continue until the closing brace. We loop, parsing each property (key: value pair), until we hit RBRACE. The comma is optional - it's just for readability.
Step 2.4.4: Consume Closing Brace and Return AST Node Finish parsing and return the shape AST node:
// Step 2.4.4.1: Consume closing brace
// This marks the end of the parameter list
this.eat('RBRACE');
// Step 2.4.4.2: Return shape AST node
// This node contains all the information about the shape
return {
type: 'shape', // Node type
shapeType: shapeType, // Shape type (circle, rectangle, etc.)
name: name, // Shape name (c1, r1, etc.)
params: params // Parameters object (radius: 50, etc.)
};
}
Why This AST Structure: The AST node contains all the information needed to create the shape. The interpreter will use this node to create the actual shape object. The structure is clear and easy to work with.
The Complete Method:
parseShape() {
this.eat('SHAPE'); // Consume 'shape' keyword
const shapeType = this.currentToken.value;
this.eat('IDENTIFIER'); // Consume shape type
const name = this.currentToken.value;
this.eat('IDENTIFIER'); // Consume shape name
this.eat('LBRACE'); // Consume '{'
// Parse properties
const params = {};
while (this.currentToken.type !== 'RBRACE') {
const key = this.currentToken.value;
this.eat('IDENTIFIER'); // Property name
this.eat('COLON'); // Consume ':'
const value = this.parseExpression(); // Parse value
params[key] = value;
// Optional comma
if (this.currentToken.type === 'COMMA') {
this.eat('COMMA');
}
}
this.eat('RBRACE'); // Consume '}'
return {
type: 'shape',
shapeType: shapeType,
name: name,
params: params
};
}
Building This Step by Step:
- Create
parseShape()method in the Parser class - Call
eat('SHAPE')to consume shape keyword - Get shape type from currentToken.value
- Call
eat('IDENTIFIER')to consume shape type - Get shape name from currentToken.value
- Call
eat('IDENTIFIER')to consume shape name - Call
eat('LBRACE')to consume opening brace - Initialize empty params object
- Add while loop that continues until RBRACE
- Get property key from currentToken.value
- Call
eat('IDENTIFIER')to consume property name - Call
eat('COLON')to consume colon - Call
parseExpression()to parse property value - Store key-value pair in params object
- Check for optional comma, consume if present
- After loop, call
eat('RBRACE')to consume closing brace - Return shape AST node with type, shapeType, name, and params
- This method correctly parses shape definitions
Step 2.5: Implement Expression Parsing with Precedence
parseExpression() {
let node = this.parseTerm(); // Start with terms (higher precedence)
// Handle + and - (lowest precedence)
while (this.currentToken.type === 'PLUS' || this.currentToken.type === 'MINUS') {
const operator = this.currentToken.type;
this.eat(operator);
node = {
type: 'binary_op',
operator: operator.toLowerCase(),
left: node,
right: this.parseTerm()
};
}
return node;
}
parseTerm() {
let node = this.parseFactor(); // Start with factors (highest precedence)
// Handle * and / (higher precedence than + and -)
while (this.currentToken.type === 'MULTIPLY' || this.currentToken.type === 'DIVIDE') {
const operator = this.currentToken.type;
this.eat(operator);
node = {
type: 'binary_op',
operator: operator.toLowerCase(),
left: node,
right: this.parseFactor()
};
}
return node;
}
parseFactor() {
const token = this.currentToken;
// Numbers
if (token.type === 'NUMBER') {
this.eat('NUMBER');
return { type: 'number', value: token.value };
}
// Identifiers (parameters, shape references)
if (token.type === 'IDENTIFIER') {
this.eat('IDENTIFIER');
return { type: 'identifier', value: token.value };
}
// Strings
if (token.type === 'STRING') {
this.eat('STRING');
return { type: 'string', value: token.value };
}
// Parentheses
if (token.type === 'LPAREN') {
this.eat('LPAREN');
const expr = this.parseExpression();
this.eat('RPAREN');
return expr;
}
// Unary minus
if (token.type === 'MINUS') {
this.eat('MINUS');
return {
type: 'unary_op',
operator: 'minus',
operand: this.parseFactor()
};
}
this.error(`Unexpected token in expression: ${token.type}`);
}
Test the parser:
const code = 'shape circle c1 { radius: 50 + 10 }';
const lexer = new Lexer(code);
const parser = new Parser(lexer);
const ast = parser.parse();
console.log(JSON.stringify(ast, null, 2));
// Should output AST with shape node containing binary_op expression
Part 3: Building the Interpreter
Step 3.1: Create the Environment
File: src/environment.mjs
export class Environment {
constructor() {
this.parameters = new Map(); // Parameter name → value
this.shapes = new Map(); // Shape name → shape object
this.layers = new Map(); // Layer name → layer object
this.functions = new Map(); // Function name → function definition
}
setParameter(name, value) {
this.parameters.set(name, value);
}
getParameter(name) {
if (!this.parameters.has(name)) {
throw new Error(`Parameter not found: ${name}`);
}
return this.parameters.get(name);
}
createShapeWithName(type, name, params) {
const shape = {
type: type,
shapeType: type,
params: params,
transform: {
position: params.position || [params.x || 0, params.y || 0],
rotation: params.rotation || 0,
scale: [1, 1]
}
};
this.shapes.set(name, shape);
return shape;
}
}
Step 3.2: Create the Basic Interpreter Structure
File: src/interpreter.mjs
import { Environment } from './environment.mjs';
export class Interpreter {
constructor() {
this.env = new Environment();
this.constraints = [];
this.currentLoopCounter = undefined;
}
interpret(ast) {
for (const node of ast) {
this.evaluateNode(node);
}
return {
parameters: this.env.parameters,
shapes: this.env.shapes,
layers: this.env.layers,
functions: this.env.functions,
constraints: this.constraints
};
}
evaluateNode(node) {
switch (node.type) {
case 'param':
return this.evaluateParam(node);
case 'shape':
return this.evaluateShape(node);
case 'if_statement':
return this.evaluateIfStatement(node);
case 'for_loop':
return this.evaluateForLoop(node);
default:
throw new Error(`Unknown node type: ${node.type}`);
}
}
}
Step 3.3: Implement Parameter Evaluation
evaluateParam(node) {
const value = this.evaluateExpression(node.value);
this.env.setParameter(node.name, value);
return value;
}
Step 3.4: Implement Expression Evaluation
evaluateExpression(node) {
switch (node.type) {
case 'number':
return node.value;
case 'string':
return node.value;
case 'identifier':
// Check if it's a parameter
if (this.env.parameters.has(node.value)) {
return this.env.getParameter(node.value);
}
// Check if it's a shape reference
if (this.env.shapes.has(node.value)) {
return node.value; // Return name as string for boolean ops
}
throw new Error(`Undefined identifier: ${node.value}`);
case 'binary_op':
const left = this.evaluateExpression(node.left);
const right = this.evaluateExpression(node.right);
return this.applyBinaryOperator(node.operator, left, right);
case 'unary_op':
const operand = this.evaluateExpression(node.operand);
if (node.operator === 'minus') {
return -operand;
}
throw new Error(`Unknown unary operator: ${node.operator}`);
default:
throw new Error(`Unknown expression type: ${node.type}`);
}
}
applyBinaryOperator(op, left, right) {
switch (op) {
case '+': return left + right;
case '-': return left - right;
case '*': return left * right;
case '/':
if (right === 0) throw new Error('Division by zero');
return left / right;
case '%': return left % right;
case '==': return left === right;
case '!=': return left !== right;
case '<': return left < right;
case '<=': return left <= right;
case '>': return left > right;
case '>=': return left >= right;
case 'and': return left && right;
case 'or': return left || right;
default:
throw new Error(`Unknown operator: ${op}`);
}
}
Step 3.5: Implement Shape Evaluation
evaluateShape(node) {
// Generate unique name (handle loops)
let shapeName = node.name;
if (this.currentLoopCounter !== undefined) {
shapeName = `${shapeName}_${this.currentLoopCounter}`;
}
// Evaluate all parameter expressions
const params = {};
for (const [key, expr] of Object.entries(node.params)) {
const evaluatedValue = this.evaluateExpression(expr);
params[key] = evaluatedValue;
}
// Create the shape
const shape = this.env.createShapeWithName(node.shapeType, shapeName, params);
return shape;
}
Test the complete system:
const code = 'param size 100\nshape circle c1 { radius: size }';
const lexer = new Lexer(code);
const parser = new Parser(lexer);
const ast = parser.parse();
const interpreter = new Interpreter();
const result = interpreter.interpret(ast);
console.log('Parameters:', Array.from(result.parameters.entries()));
console.log('Shapes:', Array.from(result.shapes.entries()));
// Should show parameter 'size' = 100 and shape 'c1' with radius 100
Common Issues and Fixes
Issue: Lexer stops early
- Check
advance()is called after reading each character - Check loop conditions (should continue until null)
Issue: Parser doesn't handle precedence
- Verify
parseExpression()callsparseTerm(), which callsparseFactor() - Check operator handling is at correct precedence level
Issue: Interpreter can't find parameters
- Check parameters are stored before they're used
- Check parameter lookup in
evaluateExpression() - Verify parameter names match exactly
Issue: Shapes not created
- Check
evaluateShape()is being called - Check shapes are stored in
env.shapes - Verify shape names are unique