CS160: Project 2 - Scanner/Parser for CSimple (20% of project score)

Project Goals

The goals of this project are:

to implement the grammar for our "simple" programming language
to get familiar with front-end generators such as flex and bison

Administrative Information

This is an individual project.

The project is due on Thursday, November 2, 2023, 23:59:59 PST.

Project Introduction

We want you to build the first part of your compiler: the scanner and parser. To do this, we will be using flex and bison, common tools to build LR parsers. For the rest of this class, we will be focusing on a new language, which we call CSimple. You will be building a compiler for this new programming language. The language manual can be found here. This manual is going to be used for the rest of this quarter (and might be updated frequently). Always use this as the first and last authority on what your grammar should be.

Tour of the Code

Again, we provide code that you should use as a starting point for your project. You can find the files here.

Makefile - Your make file. You don't need to edit this file.
main.cpp - The main C++ file. You don't need to edit this file.
parser.ypp - The bison file that contains your grammar rules/productions. At this point, it contains the grammar from Project 1. You will need to edit it and take that out. Replace that grammar with the your own grammar for the language that we have defined for you in the manual.
lexer.l - The flex file that contains the regular expressions for recognizing your tokens. It now contains the tokens from Project 1. Once again, you will have to edit this file and replace these expressions with your own.
test.good.calc - A test file just so you can compile and run this project as it is (this is not a valid CSimple program, add your own test cases as you go along).

Steps to Solve the Challenge

READ the manual for the language. Understand this language and its specifications.
Go over the small example we have included. Make sure you understand how flex and bison work together in the example. To familiarize yourself with these tools, you should read up on them. Here is a decent tutorial to get started. Also, use Google and read the man pages and the official documentation.
From the language specification, create a grammar which accepts all valid programs for our language. This is the crucial part. You must get the grammar correct here in this first part of the core compiler project. You will be building on top of this project, and your grammar must be correct. Test this thoroughly.
Implement the scanner (in flex) and make sure you account for all of the lexical patterns. Ensure that your scanner gives an error for dangling comments (comments not terminated before the EOF is reached), and make sure that it handles characters and strings correctly.
Implement your grammar (in bison). Save time for this part, since you will likely have have to iteratively correct for errors.
TEST: You can find some test files here. You will want to test your parser using these good and bad files thoroughly. You must also create your own test files. Make them as complete and complex as possible. To test your Lexer, put printf statements before you return something. This will tell you where your scanner stopped working AND which token you just failed in parsing. To test your Parser, put printf statements after each rule. This will make it easier for you to trace what your parser is doing. To run your program use:
```
./csimple < test.lang
```
where test.lang is a test file. If you run bison with the -v flag, it will write the file y.output. It contains a readable description of the parsing tables (more specifically, a description of the LR(1) states and the items they contain). In addition, it will report where the conflicts or problems in the grammar appear. Make sure you get the Lexer working perfectly first! Flex allows you to execute C code when it matches a rule (AFTER it matches the rule). Simply print to stdout like you did for the previous project. You should get a stream of tokens.

What Your Parser Has to Do!

Your parser should be able parse any valid input file from our language.
You will need to catch ALL syntax errors.
You will need to catch ALL program structure errors. By this I mean that your parser has to know that the keyword "procedure" ALWAYS precedes a procedure_id in a procedure declaration.
You will NOT have to check that procedures and variables have been declared before you use them.
You will NOT have to check that there is one and only one Main(). Remember that Main() is just a special procedure. At this point we don't care that it is special.
You will NOT have to check that procedure_ids and variable_ids are used multiple times. So you could declare variable A multiple times and it would be okay at this point.
You will NOT have to check the return types of procedures.
In a nutshell, your parser looks at each line of code individually. It does not have global knowledge of variables or procedures ... yet.
When your parser encounters an error, please make sure that it calls yyerror(). It should already do that, so don't make any changes there. The reason is that the auto-grader is looking for the error code 1 and the proper error message (with line number) printed to stderr. If you make changes, we might not correctly process your submission.

Deliverables

Like for the previous project, we are using Gradescope (and its auto-grader feature) to grade this assignment and your submissions.

Once you are done with your scanner/parser, go to the second assignment and submit your code.
For this project, please just submit your "lexer.l" and "parser.ypp" files. We supply the rest and build your project.
We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.

Created by Christopher Kruegel (© 2008, using Apache Cocoon).