Project Goals
The goals of this project are:
- to implement the grammar for our "simple" programming language
- to get familiar with front-end generators such as flex and bison
Administrative Information
This is an individual project.
The project is due on Thursday, November 2, 2023, 23:59:59 PST.
Project Introduction
We want you to build the first part of your compiler: the
scanner and parser. To do this, we will be using flex and
bison, common tools to build LR parsers. For the rest of
this class, we will be focusing on a new language, which we call
CSimple. You will be building a compiler for this new
programming language. The language manual can be
found here.
This manual is going to be used for the rest of this quarter
(and might be updated frequently). Always use this as the first
and last authority on what your grammar should be.
Tour of the Code
Again, we provide code that you should use as a starting point
for your project. You can find the
files here.
-
Makefile - Your make file. You don't need to edit this file.
-
main.cpp - The main C++ file. You don't need to edit this file.
-
parser.ypp - The bison file that contains your
grammar rules/productions. At this point, it contains the
grammar from Project 1. You will need to edit it and take
that out. Replace that grammar with the your own grammar for
the language that we have defined for you in the manual.
-
lexer.l - The flex file that contains the regular
expressions for recognizing your tokens. It now contains the
tokens from Project 1. Once again, you will have to edit this
file and replace these expressions with your own.
-
test.good.calc - A test file just so you can compile
and run this project as it is (this is not a valid CSimple
program, add your own test cases as you go along).
Steps to Solve the Challenge
- READ the manual for the language. Understand this language
and its specifications.
- Go over the small example we have included. Make sure you
understand how flex and bison work together in the example. To
familiarize yourself with these tools, you should read up on
them. Here is a decent tutorial
to get started. Also, use Google and read the man
pages and the official documentation.
- From the language specification, create a grammar which
accepts all valid programs for our language. This is the
crucial part. You must get the grammar correct here in this
first part of the core compiler project. You will be building
on top of this project, and your grammar must be correct. Test
this thoroughly.
- Implement the scanner (in flex) and make sure you account for
all of the lexical patterns. Ensure that your scanner gives an
error for dangling comments (comments not terminated before
the EOF is reached), and make sure that it handles characters and strings correctly.
- Implement your grammar (in bison). Save time for this part, since
you will likely have have to iteratively correct for errors.
- TEST: You can find some test
files here. You
will want to test your parser using these good and bad files
thoroughly. You must also create your own test files. Make
them as complete and complex as possible.
To test your Lexer, put printf statements before you
return something. This will tell you where your scanner
stopped working AND which token you just failed in parsing.
To test your Parser, put printf statements after each rule.
This will make it easier for you to trace what your parser is doing.
To run your program use:
./csimple < test.lang
where test.lang is a test file.
If you run bison with the -v flag, it will write the file y.output. It contains a
readable description of the parsing tables (more specifically, a description of the
LR(1) states and the items they contain). In addition, it will report where the conflicts
or problems in the grammar appear.
Make sure you get the Lexer working perfectly first! Flex allows you to execute C
code when it matches a rule (AFTER it matches the rule). Simply print to stdout
like you did for the previous project. You should get a stream of tokens.
What Your Parser Has to Do!
- Your parser should be able parse any valid input file from our language.
- You will need to catch ALL syntax errors.
- You will need to catch ALL program structure errors. By this I mean that
your parser has to know that the keyword "procedure" ALWAYS precedes a
procedure_id in a procedure declaration.
- You will NOT have to check that procedures and variables have been declared
before you use them.
- You will NOT have to check that there is one and only one Main(). Remember
that Main() is just a special procedure. At this point we don't care that
it is special.
- You will NOT have to check that procedure_ids and variable_ids are used multiple
times. So you could declare variable A multiple times and it would be okay
at this point.
- You will NOT have to check the return types of procedures.
- In a nutshell, your parser looks at each line of code individually. It does not
have global knowledge of variables or procedures ... yet.
-
When your parser encounters an error, please make sure that it
calls yyerror(). It should already do that, so don't make any
changes there. The reason is that the auto-grader is looking
for the error code 1 and the proper error message (with line
number) printed to stderr. If you make changes, we might not
correctly process your submission.
Deliverables
Like for the previous project, we are using Gradescope (and its
auto-grader feature) to grade this assignment and your submissions.
- Once you are done with your scanner/parser, go to the second assignment and submit your code.
- For this project, please just submit your "lexer.l" and "parser.ypp" files. We supply the rest and build your project.
- We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
- You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.