CSE 40243 Fall 2025

Compilers and Language Design Course at the University of Notre Dame

B-Minor 2025 Language Overview

Note: The B-Minor language used in this class changes each year in order to provide new challenges and opportunities. This document differs from the one in the textbook in the following ways:

Overview

This is an informal specification of B-Minor 2025, a C-like language for use in an undergraduate compilers class. B-minor includes expressions, basic control flow, recursive functions, and strict type checking. It is object-code compatible with ordinary C and thus can take advantage of the standard C library, within its defined types. It is similar enough to C to feel familiar, but different enough to give you some sense of alternatives.

This document is deliberately informal: the most precise specification of a language is the compiler itself, and it is your job to write the compiler! It is your job to read the document carefully and extract a formal specification. You will certainly find elements that are unclear or incompletely specified, and you are encouraged to raise questions in class.

Whitespace and Comments

In B-minor, whitespace is any combination of the following characters: tabs, spaces, linefeed (\n), and carriage return (\r). The placement of whitespace is not significant in B-minor. Both C-style and C++-style comments are valid in B-minor:

/* A C-style comment */
a=5; // A C++ style comment

Identifiers and Keywords

Identifiers (i.e. variable and function names) may contain letters, numbers, and underscores. Identifiers must begin with a letter or an underscore and may be up to 255 characters long. These are examples of valid identifiers:

i x mystr fog123 BigLongName55

The following strings are B-minor keywords (or reserved for future expansion) and may not be used as identifiers:

array auto boolean carray char else false float double for function if integer print return string true void while

Types

B-minor has five atomic types: integers, doubles, booleans, characters, and strings. A variable is declared as a name followed by a colon, then a type and an optional initializer. For example:

x: integer;
y: integer = 123;
f: double = 45.67;
b: boolean = false;
c: char    = 'q';
s: string  = "hello bminor\n";

Both char and string literals may contain any ASCII value between 32 and 127. Values outside that range can be represented by the following backslash codes:

Code Value Meaning
\a 7 Bell
\b 8 Backspace
\e 27 Escape
\f 12 Form Feed (clear)
\n 10 Newline
\r 13 Carriage Return
\t 9 Tab
\v 11 Vertical Tab
\\ 92 Backslash
\' 39 Single Quote
\" 34 Double Quote
\0xHH HH The byte value given by two characters HH in hexadecimal.

Any other character following a backslash represents exactly that character.

Arrays

B-minor supports local and global arrays of a fixed size. They may be declared with no value, which causes them to contain all zeros:

a: array [5] integer;

Or, the entire array may be given specific values:

a: array [5] integer = {1,2,3,4,5};

The length of an array is given by the # operator:

l: integer = # a;

Array accesses are bounds-checked at runtime. If a program attempts to access an illegal array element, the program will be aborted with a suitable error message, thus avoiding undefined behavior. For example, the following program will attempt to run off the end of the array, and be aborted at runtime:

for( i=0; i<=#a; i++ ) {
        print a[i];
}

To allow for compatibility with C and other languages that do not perform bounds checking, the type carray provides simple arrays that do not support the # operator, do not perform bounds checking, and are therefore unsafe. For example, this example runs off the end of the array and will have undefined behavior:

c: carray [5] integer = {1,2,3,4,5};
for( i=0; i<=5; i++ ) {
     print c[i];
}

Expressions

B-minor has many of the arithmetic operators found in C, along with a few additions, following a similar order of precedence:

Symbol Meaning
() [] f() grouping, array subscript, function call
++ -- postfix increment, decrement
# unary array length
- ! unary negation, logical not
^ exponentiation
* / % multiplication, division, remainder
+ - addition, subtraction
< <= > >= == != comparison
&& logical and
|| logical or
= assignment

B-minor is strictly typed. This means that you may only assign a value to a variable (or function parameter) if the types match exactly. You cannot perform many of the fast-and-loose conversions found in C.

Following are examples of some (but not all) type errors:

x: integer = 65;
y: char = 'A';
if(x>y) ... // error: x and y are of different types!

f: integer = 0;
if(f) ...      // error: f is not a boolean!

writechar: function void ( c: char );
a: integer = 65;
writechar(a);  // error: a is not a char!

b: array [2] boolean = {true,false};
x: integer = 0;
x = b[0];      // error: x is not a boolean!

x: integer = 0;
y: double = x;  // error: integer types can not be implicitly converted to double

Following are some (but not all) examples of correct type assignments:

b: boolean;
x: integer = 3;
y: integer = 5;
b = x<y;     // ok: the expression x<y is boolean

f: integer = 0;
if(f==0) ...    // ok: f==0 is a boolean expression

c: char = 'a';
if(c=='a') ...  // ok: c and 'a' are both chars

The auto type is used to declare a variable whose (fixed) type is to be inferred at compile-time:

s: auto = "hello"; // s will have type string

t: auto = x < y;   // t will have type boolean

f: function auto ( x: integer, y: integer ) =
{
	return x + y;  // f will have return type integer
}

Declarations and Statements

In B-minor, you may declare global variables with optional constant initializers, function prototypes, and function definitions. Within functions, you may declare local variables (including arrays) with optional initialization expressions. Scoping rules are identical to C. Function definitions may not be nested.

Within functions, basic statements may be arithmetic expressions, return statements, print statements, if and if-else statements, for loops, or code within inner { } groups:

// An arithmetic expression statement.
y = m*x + b;

// An if-else statement.
if( temp>100 ) {
    print "It's really hot!\n";
} else if( temp>70 ) {
    print "It's pretty warm.\n";
} else {
    print "It's not too bad.\n";
}

// A for loop statement.
for( i=0; i<100; i++ ) {
    print i;
}

// A return statement.
return (f-32)*5/9;

B-minor does not have switch statements, while-loops, or do-while loops. (But you could consider adding them as a little extra project.)

The print statement is a little unusual because it is a statement and not a function call like printf is in C. print takes a list of expressions separated by commas, and prints each out to the console, like this:

print "The temperature is: ", temp, " degrees\n";

Note that each expression in the list may have a distinct type. print will determine the proper type and print accordingly.

Functions

Functions are declared in the same way as variables, except giving a type of function followed by the return type, arguments, and code:

square: function integer ( x: integer ) = {
	return x^2;
}

The return type must be one of the four atomic types, auto to indicate inferred type, or void to indicate no return value. Function arguments may be of any type except auto or void. integer, boolean, and char arguments are passed by value, while string and array arguments are passed by reference. When used as a function parameter, the length of an array is omitted, and can be obtained at runtime with the # operator.

printarray: function void ( a: array [] integer ) = {
	i: integer;
	for( i=0;i<#a;i++) {
		print a[i], "\n";
	}
}

A function prototype may be given, which states the existence and type of the function, but includes no code. This must be done if the user wishes to call an external function linked by another library. For example, to invoke the C function puts:

puts: function void ( s: string );

main: function integer () = {
	puts("hello world");
}

A complete program must have a main function that returns an integer. The arguments to main have the same meaning as argc and argv in the C language: argc indicates the number of arguments, and argv is an (unsafe) array of strings:

main: function integer ( argc: integer, argv: carray [] string ) = {
        puts("hello world");
}

Standards Committee Clarifications

Working out all the corner cases of a language from a specification is a surprising amount of work. Even a good informal statement of a language can leave a number of details uncertain. At various points in the semester, we will hold “standards committee” meetings to discuss and vote on any ambiguous aspects of the specification.

Scanning (1.X)

A: Yes, two double quotes represents an empty string consisting only of the null terminator.

A: No, a newline in a string needs to be escaped, like this: "hello\nworld"

A: No, they are not part of B-minor.

A: Yes, it can have a leading sign. However, we recommend that you treat this as two separate tokens like TOKEN_MINUS TOKEN_INTEGER so as to avoid some parsing problems later on.

A: Decided: Hexadecimal numbers may use either upper or lower case. (Sep 22, 2025)

A: Decided: Hexadecimal numbers must be represented as 0x1234abcd. The x must be lower case, while the digits may be either case. Likewise, binary numbers must be represented as 0b010101, and the b must be lower case. (These were changed to be more consistent with escape codes.) (Sep 22, 2025)

A: Decided: A quotation mark within a string must be escaped, like "\"" (Sep 22, 2025)

Parsing (2.X)

A: Yes, it means to print out nothing.

A: Yes, it indicates a return with no value in a void function.

for(i=0;i<10,j<10;i++) { ... }

A: No, commas may only be used in print statements, function calls, function prototypes, and array expressions.

A: Yes, the following are valid statements, just as in C and C++:

for(i=0;i<10;i++) print i;
if(a) x=y; else z=w;

A: No.

A: No - An array must be declared with a positive length.

A: No - An initializer must either match the length of the array, or be omitted. It cannot be empty. (It also avoids the case of an empty initializer {} begin confused with an empty statement block {}.

A: Yes - An empty file should parse, typecheck, and compile to an empty assembly program. However, it won’t be a complete linkable program without a main function.

Name Resolution and Type Checking

A: No - The following statements are invalid:

if(x<10) a: integer = 5;
else     b: integer = 10;

for(i=0;i<10;i++) c: integer = 30;

You may detect this problem at any stage of the compiler where it is most convenient: parsing, name resolution, or typechecking.