# Homework 4: Scanning

**Due Date**: Monday, September 25

# 1 Homographs and Synonyms

Pick a pair of two programming languages that you know, and come up with an example of each of the following in your two languages. As always, you can work together, but everyone must turn in unique examples.

A

**homograph**is a code fragment that is the same*syntactically*between the two languages, but has different*semantics*in each.A

**synonym**is a code fragment that is the same*semantically*between the two languages, but has different*syntax*.

# 2 Scanner DFA

C++ and Java support a few different kinds of numerical constants, or
“literals”. The most basic are regular ints that you know and love like
`15`

, `256`

, or `32`

. There are also
floating-point numbers like `3.7`

or `.0684`

.

For this problem, consider an `INT`

token to be any
sequence of 1 or more digits `[0-9]`

, and a
`FLOAT`

token to be any sequence of 1 or more digits which
contains exactly one decimal point `[.]`

.

Draw the DFA for a scanner that accepts `FLOAT`

and
`INT`

tokens. Be sure to label each accepting state with the
type of token, and put characters or character ranges on each
transition.

# 3 Bigger Scanner DFA

Modify your scanner DFA from the previous problem so that it also accepts an additional type of token, a

`HEX`

constant such as`0x3a5`

or`0x7`

.For this problem, a

`HEX`

token contains the symbols`0x`

followed by zero or more digits or letters in the range`a`

through`f`

.

- Note that the previous definition allows for the string
`0x`

by itself to be considered a`HEX`

token. What problem would there be if we disallowed this, so that`0x`

is not a valid token but, for example,`0x3`

is valid?

# 4 Ambiguous Grammar

Write a grammar that is ambiguous, and then show that it is ambiguous by coming up with a series of tokens that could be parsed in two different ways according to your grammar.