## Goal

Check if the word $w=baaba$ can be generated by the CNF grammar $G$:

We will fill the following table:

b | a | a | b | a | |
---|---|---|---|---|---|

1 |
|||||

2 |
x | ||||

3 |
x | x | |||

4 |
x | x | x | ||

5 |
x | x | x | x |

Each cell will tell us what variables can generate a substring, with length from 1 to $|w| = n$.

On the first row we have substrings of length 1 (letters themselves). On the second row, substrings of length 2: `ba`

, `aa`

, `ab`

, `ba`

. The next to last row is for strings of length $n-1$: `baab`

and `aaba`

. The last row is for the substring of length $n$, that is the whole word.

If $S$ is among the variables on the last row, it means $S$ can generate the whole word, ie the grammar can generate $w$.

## First row

Under each letter, write the variables that can directly generate it. `b`

can be generated by $B$ and `a`

can be generated by $A$ or $C$:

b | a | a | b | a | |
---|---|---|---|---|---|

1 |
B | A, C | A, C | B | A, C |

2 |
x |

## Second row

The first cell contains the variables that can generate the substring `ba`

, the second cell contains variables that can generate `aa`

.

From the first row, we know what can generate individual letters. In order to generate `ba`

, we need a variable that can either generate $BA$ or $BC$. The productions $S \rightarrow BC$ and $A \rightarrow BA$ are the one that have them on the right-hand side so we write $S, A$.

Variables in the second cell must generate any combination of $A, C$ and $A, C$ that is $AA, AC, CA, AC$. We fill $B$ because of the production $B \rightarrow CC$.

The third cell is for variables generating combinations of $A, C$ and $B$, meaning we look for productions with $AB$ or $CB$ on the right-hand side. $S$ and $C$ both generate $AB$.

The last cell we already computed (same as the first one) — $S, A$.

b | a | a | b | a | |
---|---|---|---|---|---|

1 |
B | A, C | A, C | B | A, C |

2 |
S, A | B | S, C | S, A | x |

3 |
x | x |

## Third row

The first cell will contain variables generating the substring `baa`

. It can be split into:

`ba`

,`a`

`b`

,`aa`

The substring `ba`

, having length 2 can be generated by $S$ or $A$ (from the second row). The substring `a`

can be generated by $A$ or $C$ (from the first row). So we look for a variable that can generate combinations of $S, A$ and $A, C$. No productions have $SA, SC, AA$ or $CC$ in their right-hand side.

Same for the alternative split, `b`

and `aa`

: variables that generate a combination of $B$ (row 1) and $B$ (row 2), that is $BB$ — none. We mark the cell as unfillable `-`

.

The second cell corresponds to `aab`

, which can be split into:

`aa`

,`b`

($B$ row 2 and $B$ row 1)`a`

,`ab`

($A, C$ row 1 and $S, C$ row 2)

Only $B$ can generate one of them, namely $B \rightarrow CC$.

Similarly for the last cell, string `aba`

can be generated by $B$.

b | a | a | b | a | |
---|---|---|---|---|---|

1 |
B | A, C | A, C | B | A, C |

2 |
S, A | B | S, C | S, A | x |

3 |
- | B | B | x | x |

4 |
x | x | x |

## Fourth row

Same as the third row, just that substrings of length 4 can be split into more ways:

The first cell, `baab`

:

`b`

,`aab`

`ba`

,`ab`

`baa`

,`b`

(the dash`-`

means`baa`

cannot be generated so this split is invalid)

b | a | a | b | a | |
---|---|---|---|---|---|

1 |
B | A, C | A, C | B | A, C |

2 |
S, A | B | S, C | S, A | x |

3 |
- | B | B | x | x |

4 |
- | S, A, C | x | x | x |

5 |
x | x | x | x |

# Final row

For the whole word, splits:

`b`

,`aaba`

(combinations of $B$ and $S, A, C$) — productions $A \rightarrow BA$ and $S \rightarrow BC$`ba`

,`aba`

(combinations of $S, A$ and $B$) — production $C \rightarrow AB$`baa`

,`ba`

(the dash`-`

on row 3 means`baa`

cannot be generated — split is invalid)`baab`

,`a`

(the dash`-`

on row 4 means`baab`

cannot be generated — invalid split)

b | a | a | b | a | |
---|---|---|---|---|---|

1 |
B | A, C | A, C | B | A, C |

2 |
S, A | B | S, C | S, A | x |

3 |
- | B | B | x | x |

4 |
- | S, A, C | x | x | x |

5 |
S, A, C | x | x | x | x |

The final cell contains $S, A, C$ which means the whole word can be generated either from $S$, from $A$ or from $C$. Since the starting symbol $S$ is among them, we know $G$ can generate $w$.

# IO

## Input `Grammar.in`

Each row contains one production.

```
S -> AB | BC
A -> BA | a
B -> CC | b
C -> AB | a
```

## Tests `words.txt`

and output

Each row (left side) contains one word to be checked. The corresponding row (right side) indicates whether the word can be generated by the grammar.

```
baaba
baaa
aaba
aba
bbb
```

```
1
1
1
0
0
```

Words to test on the converted automaton.

# Another Grammar

## Input `Grammar.in`

```
S -> XC
X -> aXbb | X
C -> Cc | c
```

## Tests `words.txt`

and output

```
abbc
aabbbc
abbccc
aaabbbbbbccccccc
cab
abc
aabbcc
ab
```

```
1
1
1
1
0
0
0
0
```

# Extended Example

Extend the algorithm presented above to not only tell whether the word $w$ can be generated, but also show the derivation tree. It should start with $S$ and end in $w$, showing what production was applied at each step:

```
S
BC (S -> BC)
bC (B -> b)
bAB (C -> AB)
baB (A -> a)
baCC (B -> CC)
baABC (C -> AB)
baaBC (A -> a)
baabC (B -> b)
baaba (C -> a)
```