Compilers and Language Design Course at the University of Notre Dame
Note: The B-Minor language used in this class changes each year in order to provide new challenges and opportunities. This document differs from the one in the textbook in the following ways:
# operator.auto indicates a variable type to be inferred from context.This is an informal specification of B-Minor 2025, a C-like language for use in an undergraduate compilers class. B-minor includes expressions, basic control flow, recursive functions, and strict type checking. It is object-code compatible with ordinary C and thus can take advantage of the standard C library, within its defined types. It is similar enough to C to feel familiar, but different enough to give you some sense of alternatives.
This document is deliberately informal: the most precise specification of a language is the compiler itself, and it is your job to write the compiler! It is your job to read the document carefully and extract a formal specification. You will certainly find elements that are unclear or incompletely specified, and you are encouraged to raise questions in class.
In B-minor, whitespace is any combination of the following characters: tabs, spaces, linefeed (\n), and carriage return (\r). The placement of whitespace is not significant in B-minor. Both C-style and C++-style comments are valid in B-minor:
/* A C-style comment */
a=5; // A C++ style comment
Identifiers (i.e. variable and function names) may contain letters, numbers, and underscores. Identifiers must begin with a letter or an underscore and may be up to 255 characters long. These are examples of valid identifiers:
i x mystr fog123 BigLongName55
The following strings are B-minor keywords (or reserved for future expansion) and may not be used as identifiers:
array auto boolean carray char else false float double for function if integer print return string true void while
B-minor has five atomic types: integers, doubles, booleans, characters, and strings. A variable is declared as a name followed by a colon, then a type and an optional initializer. For example:
x: integer;
y: integer = 123;
f: double = 45.67;
b: boolean = false;
c: char = 'q';
s: string = "hello bminor\n";
integer is a signed 64-bit value. An integer literal can be represented as a decimal (123), a hexidecimal value with an `H` suffix (`104A3FH`), or a binary value with a `B` suffix (`0101110101B`), a hexadecimal value like 0x104A3F, or a binary value like 0b0101110101. All three forms may omit leading zeroes as needed.
double is a floating point number that follows the IEEE 754 double-precision standard. double values can be represented in various ways. One method is to place the integer component, a period, and the fractional component all in a row, such as in 12.34. Another valid method is to use the scientific representation of the value, which involves having a floating point value as described above followed by an exponent (represented as either e or E). The exponent value may be preceded by an optional plus or minus sign. Some examples of this would be 5.67E1 which has the value of 56.7, or 89e-2 which has the value of .89.
boolean can take the literal values true or false.
char is a single 8-bit ASCII character enclosed in single quotes.
string is a double-quoted constant string of ASCII characters that is immutable and (internally) null-terminated. Strings may contain up to 255 characters. (Backslash codes noted below each indicate one character.)
Both char and string literals may contain any ASCII value between 32 and 127. Values outside that range can be represented by the following backslash codes:
| Code | Value | Meaning |
|---|---|---|
\a |
7 | Bell |
\b |
8 | Backspace |
\e |
27 | Escape |
\f |
12 | Form Feed (clear) |
\n |
10 | Newline |
\r |
13 | Carriage Return |
\t |
9 | Tab |
\v |
11 | Vertical Tab |
\\ |
92 | Backslash |
\' |
39 | Single Quote |
\" |
34 | Double Quote |
\0xHH |
HH | The byte value given by two characters HH in hexadecimal. |
Any other character following a backslash represents exactly that character.
B-minor supports local and global arrays of a fixed size. They may be declared with no value, which causes them to contain all zeros:
a: array [5] integer;
Or, the entire array may be given specific values:
a: array [5] integer = {1,2,3,4,5};
The length of an array is given by the # operator:
l: integer = # a;
Array accesses are bounds-checked at runtime. If a program attempts to access an illegal array element, the program will be aborted with a suitable error message, thus avoiding undefined behavior. For example, the following program will attempt to run off the end of the array, and be aborted at runtime:
for( i=0; i<=#a; i++ ) {
print a[i];
}
To allow for compatibility with C and other languages that do not perform bounds checking,
the type carray provides simple arrays that do not support the # operator,
do not perform bounds checking, and are therefore unsafe. For example, this example
runs off the end of the array and will have undefined behavior:
c: carray [5] integer = {1,2,3,4,5};
for( i=0; i<=5; i++ ) {
print c[i];
}
B-minor has many of the arithmetic operators found in C, along with a few additions, following a similar order of precedence:
| Symbol | Meaning |
|---|---|
() [] f() |
grouping, array subscript, function call |
++ -- |
postfix increment, decrement |
# |
unary array length |
- ! |
unary negation, logical not |
^ |
exponentiation |
* / % |
multiplication, division, remainder |
+ - |
addition, subtraction |
< <= > >= == != |
comparison |
&& |
logical and |
|| |
logical or |
= |
assignment |
B-minor is strictly typed. This means that you may only assign a value to a variable (or function parameter) if the types match exactly. You cannot perform many of the fast-and-loose conversions found in C.
Following are examples of some (but not all) type errors:
x: integer = 65;
y: char = 'A';
if(x>y) ... // error: x and y are of different types!
f: integer = 0;
if(f) ... // error: f is not a boolean!
writechar: function void ( c: char );
a: integer = 65;
writechar(a); // error: a is not a char!
b: array [2] boolean = {true,false};
x: integer = 0;
x = b[0]; // error: x is not a boolean!
x: integer = 0;
y: double = x; // error: integer types can not be implicitly converted to double
Following are some (but not all) examples of correct type assignments:
b: boolean;
x: integer = 3;
y: integer = 5;
b = x<y; // ok: the expression x<y is boolean
f: integer = 0;
if(f==0) ... // ok: f==0 is a boolean expression
c: char = 'a';
if(c=='a') ... // ok: c and 'a' are both chars
The auto type is used to declare a variable whose (fixed) type is to be inferred at compile-time:
s: auto = "hello"; // s will have type string
t: auto = x < y; // t will have type boolean
f: function auto ( x: integer, y: integer ) =
{
return x + y; // f will have return type integer
}
In B-minor, you may declare global variables with optional constant initializers, function prototypes, and function definitions. Within functions, you may declare local variables (including arrays) with optional initialization expressions. Scoping rules are identical to C. Function definitions may not be nested.
Within functions, basic statements may be arithmetic expressions, return statements, print statements, if and if-else statements, for loops, or code within inner { } groups:
// An arithmetic expression statement.
y = m*x + b;
// An if-else statement.
if( temp>100 ) {
print "It's really hot!\n";
} else if( temp>70 ) {
print "It's pretty warm.\n";
} else {
print "It's not too bad.\n";
}
// A for loop statement.
for( i=0; i<100; i++ ) {
print i;
}
// A return statement.
return (f-32)*5/9;
B-minor does not have switch statements, while-loops, or do-while loops. (But you could consider adding them as a little extra project.)
The print statement is a little unusual
because it is a statement and not a function call like printf is in C.
print takes a list of expressions separated by commas,
and prints each out to the console, like this:
print "The temperature is: ", temp, " degrees\n";
Note that each expression in the list may have a distinct type.
print will determine the proper type and print accordingly.
Functions are declared in the same way as variables,
except giving a type of function followed by
the return type, arguments, and code:
square: function integer ( x: integer ) = {
return x^2;
}
The return type must be one of the four atomic types,
auto to indicate inferred type, or void to indicate no return value.
Function arguments may be of any type except auto or void.
integer, boolean, and char arguments are passed by value, while
string and array arguments are passed by reference.
When used as a function parameter, the length of an array is omitted,
and can be obtained at runtime with the # operator.
printarray: function void ( a: array [] integer ) = {
i: integer;
for( i=0;i<#a;i++) {
print a[i], "\n";
}
}
A function prototype may be given, which states the existence
and type of the function, but includes no code. This must
be done if the user wishes to call an external function linked
by another library. For example, to invoke the C function puts:
puts: function void ( s: string );
main: function integer () = {
puts("hello world");
}
A complete program must have a main function that returns an integer.
The arguments to main have the same meaning as argc and argv
in the C language: argc indicates the number of arguments, and
argv is an (unsafe) array of strings:
main: function integer ( argc: integer, argv: carray [] string ) = {
puts("hello world");
}
Working out all the corner cases of a language from a specification is a surprising amount of work. Even a good informal statement of a language can leave a number of details uncertain. At various points in the semester, we will hold “standards committee” meetings to discuss and vote on any ambiguous aspects of the specification.
"" a valid string literal?A: Yes, two double quotes represents an empty string consisting only of the null terminator.
"hello
world"
A: No, a newline in a string needs to be escaped, like this: "hello\nworld"
A: No, they are not part of B-minor.
A: Yes, it can have a leading sign. However, we recommend that you
treat this as two separate tokens like TOKEN_MINUS TOKEN_INTEGER
so as to avoid some parsing problems later on.
af35h or F6AC92H or 1234deB5H?A: Decided: Hexadecimal numbers may use either upper or lower case. (Sep 22, 2025)
A123H be treated as an identifier or a hexadecimal constant?A: Decided: Hexadecimal numbers must be represented as 0x1234abcd. The x must be lower case, while the digits may be either case. Likewise, binary numbers must be represented as 0b010101, and the b must be lower case. (These were changed to be more consistent with escape codes.) (Sep 22, 2025)
"""?A: Decided: A quotation mark within a string must be escaped, like "\"" (Sep 22, 2025)
print; a valid statement?A: Yes, it means to print out nothing.
return; a valid statement?A: Yes, it indicates a return with no value in a void function.
for(i=0;i<10,j<10;i++) { ... }
A: No, commas may only be used in print statements,
function calls, function prototypes, and array expressions.
A: Yes, the following are valid statements, just as in C and C++:
for(i=0;i<10;i++) print i;
if(a) x=y; else z=w;
A: No.
A: No - An array must be declared with a positive length.
A: No - An initializer must either match the length of the array, or be omitted. It cannot be empty.
(It also avoids the case of an empty initializer {} begin confused with an
empty statement block {}.
A: Yes - An empty file should parse, typecheck, and compile to an empty assembly program. However, it won’t be a complete linkable program without a main function.
if or for statement?A: No - The following statements are invalid:
if(x<10) a: integer = 5;
else b: integer = 10;
for(i=0;i<10;i++) c: integer = 30;
You may detect this problem at any stage of the compiler where it is most convenient: parsing, name resolution, or typechecking.