CSE 3302 Programming Languages Syntax (cont.) Chengkai Li Fall 2007 Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 1 Review of Lecture 3 Scanning and Parsing Lexical Structure: The structure of tokens (words) Syntactical Structure: The structure of programs regular expressio n charact er stream
Lecture 3 - Syntax, Fall 2007 scanner (lexical analysis) grammar token strea m parser (syntax analysis) CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 pars e tree 2 Automatic Tools
regular expression description of the tokens (lex/flex) scanner of a language Example: Figure 4.1 (page 82) Context-Free Grammar (Yacc/Bison) parser Example: Figure 4.13 (page 112) Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 3 Review of Lecture 3 Tokens: Reserved words (keywords) Literals/constants Special symbols Identifiers Principle of Longest Substring The longest possible string of characters is collected into a single
token. The principle requires that tokens are separated by white spaces. Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 4 Review of Lecture 3 Context-Free Grammar (Backus-Naur Forms (BNF)) Grammar rules (productions): left-hand side: single nonterminal nonterminals (structured names) terminals (tokens) start symbol Language: the set of token strings that can be produced by derivation from the start symbol Derivation: number number digit number digit digit digit digit digit 2 digit digit 23 digit 234 Lecture 3 - Syntax, Fall 2007
CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 5 Review of Lecture 3 Derivations: different derivations for the same syntactic structure. Parse trees: capture intrinsic structure, more intuitive, still tedious. Abstract Syntax Tress (Syntax Trees): Remove ``unnecessary symbols, meanwhile completely determine syntactic structure. number number number digit digit 3 4 3
2 page 91 2 Lecture 3 - Syntax, Fall 2007 4 digit CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 6 Topics Today Topics: Ambiguities in Grammar Parsing Techniques Intuitive analysis and conclusion No formal theorems and rigorous proofs More details: compilers, automata theory Lecture 3 - Syntax, Fall 2007
CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 7 What is Parsing? Given a grammar and a token string: determine if the grammar can generate the token string? i.e., is the string a legal program in the language? In other words, to construct a parse tree for the token string. Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 8 Whats significant about parse tree? A parse tree gives a unique syntactic structure Leftmost, rightmost derivation
There is only one leftmost derivation for a parse tree, and symmetrically only one rightmost derivation for a parse tree. Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 9 Example expr expr + expr | expr expr | ( expr ) | number number number digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 parse tree expr expr leftmost derivation expr + number
expr digit number 3 * expr expr + expr number expr number
digit number digit digit 4 5 Lecture 3 - Syntax, Fall 2007 expr 3 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 * expr number
digit digit 4 5 10 Whats significant about parse tree? A parse tree has a unique meaning, thus provides basis for semantic analysis. (Syntax-directed semantics: semantics are attached to syntactic structure.) expr expr1 Lecture 3 - Syntax, Fall 2007 + expr.val = expr1.val + expr2.val
expr2 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 11 Relationship among language, grammar, parser Chomsky Hierarchy http://en.wikipedia.org/wiki/Chomsky_hierarchy A language can be described by multiple grammars, while a grammar defines one language. A grammar can be parsed by multiple parsers, while a parser accepts one grammar, thus one language. Should design a language that allows simple grammar and efficient parser For a language, we should construct a grammar that allows fast parsing Given a grammar, we should build an efficient parser Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 12
Ambiguity Ambiguous grammar: There can be multiple parse trees for the same sentence (program) In other words, multiple leftmost derivations. Why it is bad? Multiple meaning Multiple derivation can be ok, but multiple leftmost derivation is the same as multiple parse tree. Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 13 Ambiguity Was this ambiguous? number number digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 How about this? expr expr + expr | number
Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 14 Deal with Ambiguity disambiguating rules: Use semantics in determining which parse tree to construct or unambiguous grammar Rewrite the grammar to avoid ambiguity. Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 15 Example of Ambiguity: Precedence expr expr + expr | expr expr | ( expr ) | 0 | 1 | 2 | 3 | 4 |
5|6|7|8|9 Two different parse tress for expression 3+4*5 parse tree 2 parse tree 1 expr expr expr 3 expr + * expr expr 5
expr 4 Lecture 3 - Syntax, Fall 2007 * expr 5 expr + 3 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 expr 4 16 Eliminating Ambiguity for
Precedence Establish precedence cascade: using different structured names for different constructs, adding grammar rules. Higher precedence : lower in cascade expr expr + expr | expr expr | ( expr ) | number expr expr + expr | term term term term | ( expr ) | number Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 17 Example of Ambiguity: Associativity expr expr - expr | ( expr ) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |9 Two different parse tress for expression parse tree 1 expr expr 5
5-2-1 parse tree 2 expr expr.val = 2 expr.val = 4 expr - - expr expr 1 expr 2 - expr 1 - is right-associative, which is against
common practice in integer arithmetic Lecture 3 - Syntax, Fall 2007 expr - expr 5 2 - is left-associative, which is what we usually assume CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 18 Associativity Left-Associative: + - * / Right-Associative: = What is meant by a=b=c=1? Lecture 3 - Syntax, Fall 2007
CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 19 Eliminating Ambiguity for Associativity left-associativity: left-recursion expr expr - expr | ( expr ) | number expr expr - term | term term (expr) | number right-associativity: right-recursion expr expr= expr | a | b | c ? Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 20 Putting Together
expr expr + expr | expr expr | ( expr ) | number number number digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 We want to make + left-associative, and * has higher precedence than + ? Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 21 Example of Ambiguity: Dangling-Else stmt if( expr ) stmt | if( expr ) stmt else stmt | other-stmt Two different parse trees for if( expr ) if( expr ) stmt else parse tree 1 parse tree 2 stmt stmt stmt
if if if (expr) (expr) Lecture 3 - Syntax, Fall 2007 stmt else (expr) stmt stmt stmt else stmt if
(expr) CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 stmt 22 Eliminating Dangling-Else stmt matched_stmt | unmatched_stmt matched_stmt if( expr ) matched_stmt else matched_stmt | other-stmt unmatched_stmt if( expr ) stmt | if( expr ) matched_stmt else unmatched_stmt Lecture 3 - Syntax, Fall 2007 CSE3302 Programming Languages, UT-Arlington Chengkai Li, 2007 23