SI 413 Fall 2023 / Homeworks


This is the archived website of SI 413 from the Fall 2023 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Homework 4: Scanning

1 Homographs and Synonyms

Pick a pair of two programming languages that you know, and come up with an example of each of the following in your two languages. As always, you can work together, but everyone must turn in unique examples.

  1. A homograph is a code fragment that is the same syntactically between the two languages, but has different semantics in each.

  2. A synonym is a code fragment that is the same semantically between the two languages, but has different syntax.

2 Scanner DFA

C++ and Java support a few different kinds of numerical constants, or “literals”. The most basic are regular ints that you know and love like 15, 256, or 32. There are also floating-point numbers like 3.7 or .0684.

For this problem, consider an INT token to be any sequence of 1 or more digits [0-9], and a FLOAT token to be any sequence of 1 or more digits which contains exactly one decimal point [.].

Draw the DFA for a scanner that accepts FLOAT and INT tokens. Be sure to label each accepting state with the type of token, and put characters or character ranges on each transition.

3 Bigger Scanner DFA

  1. Modify your scanner DFA from the previous problem so that it also accepts an additional type of token, a HEX constant such as 0x3a5 or 0x7.

    For this problem, a HEX token contains the symbols 0x followed by zero or more digits or letters in the range a through f.

  1. Note that the previous definition allows for the string 0x by itself to be considered a HEX token. What problem would there be if we disallowed this, so that 0x is not a valid token but, for example, 0x3 is valid?

4 Ambiguous Grammar

Write a grammar that is ambiguous, and then show that it is ambiguous by coming up with a series of tokens that could be parsed in two different ways according to your grammar.