Using Java CUP
Tuesday, March 16th, 2010The following is a tutorial on using java CUP with JFlex for the project 2 assignment in CS4413/CS5413 at Lakehead University
Install and Setup
It is assumed that you have java and jflex (see http://hackerdan.com/programing/using-jflex/) installed and working.
Download CUP from http://www2.cs.tum.edu/projects/cup/ (direct link: http://www2.cs.tum.edu/projects/cup/java-cup-11a.jar)
Move java-cup-11a.jar onto your class path or add the directory java-cup-11a.jar is in to your class path.
For example, if java-cup-11a.jar is in /home/dan/comp_class/java-cup-11a, add /home/dan/comp_class/java-cup-11a to your class path. See http://hackerdan.com/programing/using-jflex/ for details on changing environment variables.
Create a directory named parser (this is where your code will go), and copy all the files included with the project from WebCT. The following files should be included:
lexer.flex
Driver.java
LEXSymbol.java
parser.cup
SrcLoc.java
test2.cmm
test1.cmm
Note that lexer.flex is different from project 1 and contains a complete lexer for use with this project. Also note that the Token class has been replaced by LEXSymbol.java which functions differently. You can use your solution from project 1 in place of lexer.flex but it will likely need some heavy modifications.
If you have JFlex installed from project 1 and a working java compiler everything should be ready to go.
Using Java CUP
First run JFlex on the lexer.java file to generate Lexer.java (see http://hackerdan.com/programing/using-jflex/ for details).
Add your changes to parser.cup (fill out the nonterminals, precedences and grammar sections, see the next section for tips) and run:
java -jar /home/dan/comp_class/java-cup-11a.jar parser.cup
This should generate parser.java and sym.java and have output similar to:
------- CUP v0.11a beta 20060608 Parser Generation Summary -------
0 errors and 0 warnings
42 terminals, 18 non-terminals, and 90 productions declared,
producing 192 unique parse states.
0 terminals declared but not used.
0 non-terminals declared but not used.
0 productions never reduced.
0 conflicts detected (0 expected).
Code written to "parser.java", and "sym.java".
---------------------------------------------------- (v0.11a beta 20060608)
If you get Errors that look like the following:
Error: Syntax error @ Symbol: LBRACK (unknown:82/19 - unknown:82/20)
Error : Illegal use of reserved word
......
It means there is a syntax error on the given line (in this case 82).
If you get a warning such as:
Warning : Scanner at 122(10): Unrecognized character '{' -- ignored
It means there is an illegal character in your .cup file (many of the common regex and EBNF characters are not supported in Java CUP).
If you get a warning such as:
Warning : Terminal "UMINUS" was declared but never used
It means one of your terminals or none terminal which was declared in your .cup was never used in your grammar. In some cases this is ok.
If you get a warning such as:
Warning : *** Reduce/Reduce conflict found in state #12
between lvalue ::= ID (*)
and type ::= ID (*)
under symbols: {ID}
Resolved in favor of the first production.
It means a conflict was found in your grammar where two rules have identical right parts and the parser is basically guessing which one to use. For example:
nterm1 ::= ID;
nterm2 ::= ID;
Will cause this error as the rules are the same.
If you get a warning such as:
Warning : *** Shift/Reduce conflict found in state #12
between lvalue ::= ID (*)
under symbol ID
Resolved in favor of shifting.
It means there is a conflict in your grammar where after shifting two rules are identical, for example:
nterm1 ::= TERM '+' nterm3;
nterm2 ::= TERM '+' ID;
nterm3 ::= ID;
In this case nterm1 and nterm2 are identical after ID is substituted in for nterm3.
Although grammar conflicts are warnings, they should not be ignored as they indicate there is likely a problem with your grammar despite the syntax being valid.
Warnings such as:
Warning : *** Production "formal ::= type ID " never reduced
mean that rule can never be reached from the given “start with” rule. This is a problem if you expect that rule to be part of your parser.
Keep fixing any warnings and errors until generation succeeds with no errors. Next we need to compile the generated code:
Compile LEXSymbol.java (from WebCT):
javac -classpath .:/home/dan/comp_class/java-cup-11a/java-cup-11a.jar parser/LEXSymbol.java
Replace /home/dan/comp_class/java-cup-11a/java-cup-11a.jar with the full path to java-cup-11a.jar on your system in the above and following examples (or set up your classpath so the -classpath argument is not needed).
Compile sym.java (generated by Java CUP):
javac -classpath .:/home/dan/comp_class/java-cup-11a/java-cup-11a.jar parser/sym.java
Compile parser.java (generated by Java CUP):
javac -classpath .:/home/dan/comp_class/java-cup-11a/java-cup-11a.jar parser/parser.java
Compile Lexer.java (generated by JFlex):
javac -classpath .:/home/dan/comp_class/java-cup-11a/java-cup-11a.jar parser/Lexer.java
Compile Driver.java (from WebCT):
javac -classpath .:/home/dan/comp_class/java-cup-11a/java-cup-11a.jar parser/Driver.java
If there were any errors or warnings while compiling, there is a issue with one of your specification files (.cup or .flex), something wrong with your environment variables (class path issue), or an issue relating to how your java package is set up.
Once everything is compiled you may run your parser like so:
java -classpath .:/home/dan/comp_class/java-cup-11a/java-cup-11a.jar parser.Driver parser/test.cmm
where the second argument is the path to your .cmm code file.
If there was an issue with your grammar or the code in the .cmm file is not valid, you will get error such as:
Syntax error parser/test.cmm:22:2 Got: 'else'
if the .cmm is valid you should go back to your .cup file and try to fix any issues and regenerate and recompile everything.
A successful run (assuming the code file is valid) will look like:
parsing [parser/test.cmm]
Lexeme for comment [/* Foo Class */]
Lexeme for comment [/* Main Class */]
Parsing Completed. Top symbol = #0
At this point you should have a parser capable of validating a C– file, which is all that is required for the project. However if you would like to use Java CUP to make a C– interpreter capable of running simple C– code there are a few more steps needed which you can read about in the PDFs on the following link: http://www.cs.auckland.ac.nz/~bruce-h/lectures/330ChaptersPDF/ (particularly chapter 9).
Parser Specification
Java CUP’s specification file has five main sections:
Preliminaries
Any settings or emended code that will affect how the parser.java file is generated. In our case this section is provided for us and should not need any changes.
Code between “parser code {:” and “:}” is placed in the parser class in the generated parser.java file.
Terminals
This section lists all terminals that may be used in the parsers’ grammar. These terminals are the tokens from our JFlex scanner and in this case are provided for us.
The format of the terminal line is as follows:
terminal [TYPE] TERM [, TERM]*;
Where the optional TYPE parameter specifies what type the terminal objects listed are of and the TERM parameters list the terminals that may be used in the grammar section.
Non Terminals
Non terminals are the rules of the grammar and follow the same format as in the terminal section. Each terminal must be defined before it may be used and as with the terminals the type parameter is optional.
Example:
nonterminal classList, classDed, deck, formals, statement, assignment;
Note that both terminal and non terminal names cannot be CUP reserved words (”code”, “action”, “parser”, “terminal”, “non”, “nonterminal”, etc).
Precedences
Often grammar is ambiguous and this section lets us state the precedences and associatively of terminals outside of our grammar (allowing us to keep the grammar simple, and still resolve ambiguity). This section is optional if your grammar has no ambiguity.
The format is as follows:
precedence (left | right | nonassoc) terminal [, terminal]*;
These lines set the order of precedence, from highest to lowest, from bottom to top. For example:
precedence left ADD, SUBTRACT;
precedence left TIMES, DIVIDE;
Resolves the ambiguity in grammar such as:
A ::= A ADD A | A SUBTRACT A | A DIVIDE A | A TIMES A | a
with ADD and SUBTRACT having a lower precedence than TIMES and DIVIDE (multiplication and division will be performed before the addition and subtraction).
Any terminal not listed will be given the lowest precedence. See http://www2.cs.tum.edu/projects/cup/manual.html for more information.
Grammar
Finally the grammar section allows the specification of the parsers’ grammar in a BNF like format. Note that this format is not the same as the EBNF format that the grammar for C– is given in, and some rules and changes will need to be added and made.
The starting rule is specified with the “start with” tag as shown below:
start with classList;
An example of a simple grammar is shown below:
start with A;
A ::= A ADD A
| A SUBTRACT A
| A DIVIDE A
| A TIMES A
| a;
This is a simplistic explanation of the grammars in CUP, for more info see: http://www2.cs.tum.edu/projects/cup/manual.html.
Project Hints
- Don’t modify the code in Driver.java, SrcLoc.java, or LEXSymbol.java. There should be no need for this project, however, looking over how they work will help in creating your parser and understanding how it works.
- Don’t modify the code in parser.java or Lexer.java manually, they should be generated via Java CUP and JFlex respectively. You can embed code in the specification file that will be placed in the generated .java file, however, there should be no need for this project.
- Make sure you have your class path settings right or give the full path to the Java CUP jar like in the examples given.
- Make sure you have the java package set up correctly. All class and java files should be in the parser package and in a directory with the same name.
- Don’t ignore warnings from Java CUP, they almost always indicate that something is wrong and needs fixing.
- If you use the given .jflex file the only file you need to submit is the .cup file, however, you may also include example outputs, test cases and screenshots.
- If you don’t use the given .jflex make sure to include yours and any .java files it uses. Also ensure that you note any changes you made. Your lexer from project 1 may need some changes to work with Java CUP, use the given .jflex files as an example.
- The grammar given in the project for C– will need changes to work with Java CUP, but should be equivalent.
- If you make any improvements to the given grammar, make sure to note them and why they were added so they are not marked as incorrect.
- Make sure to put your name and student number on everything (you would be surprised how many students forget this).
- Use Java CUP 11a beta 20060608 executable standalone Jar-package (this is what I will use to mark your project)
Resources
Java CUP
- Java CUP: http://www2.cs.tum.edu/projects/cup
- Java CUP Manual: http://www2.cs.tum.edu/projects/cup/manual.html
- Java CUP Download: http://www2.cs.tum.edu/projects/cup/java-cup-11a.jar
JFlex
- JFlex: http://jflex.de/
- JFlex Download: http://jflex.de/jflex-1.4.3.zip
- JFlex Manual: http://jflex.de/manual.html
Lectures/Tutorials
- Bruce Hutton’s Lecture Notes: http://www.cs.auckland.ac.nz/~bruce-h/lectures/330ChaptersPDF/
Examples
- CUP grammar for the Java programming language: http://www2.cs.tum.edu/projects/cup/javagrm.tar.gz
- CUP/JFLex interoperability: http://www2.cs.tum.edu/projects/cup/minimal.tar.gz
- CUP/JFLex sample project: http://www2.cs.tum.edu/projects/cup/demo.zip