Goal
Check if the word $w=baaba$ can be generated by the CNF grammar $G$:
We will fill the following table:
b | a | a | b | a | |
---|---|---|---|---|---|
1 | |||||
2 | x | ||||
3 | x | x | |||
4 | x | x | x | ||
5 | x | x | x | x |
Each cell will tell us what variables can generate a substring, with length from 1 to $|w| = n$.
On the first row we have substrings of length 1 (letters themselves). On the second row, substrings of length 2: ba
, aa
, ab
, ba
. The next to last row is for strings of length $n-1$: baab
and aaba
. The last row is for the substring of length $n$, that is the whole word.
If $S$ is among the variables on the last row, it means $S$ can generate the whole word, ie the grammar can generate $w$.
First row
Under each letter, write the variables that can directly generate it. b
can be generated by $B$ and a
can be generated by $A$ or $C$:
b | a | a | b | a | |
---|---|---|---|---|---|
1 | B | A, C | A, C | B | A, C |
2 | x |
Second row
The first cell contains the variables that can generate the substring ba
, the second cell contains variables that can generate aa
.
From the first row, we know what can generate individual letters. In order to generate ba
, we need a variable that can either generate $BA$ or $BC$. The productions $S \rightarrow BC$ and $A \rightarrow BA$ are the one that have them on the right-hand side so we write $S, A$.
Variables in the second cell must generate any combination of $A, C$ and $A, C$ that is $AA, AC, CA, AC$. We fill $B$ because of the production $B \rightarrow CC$.
The third cell is for variables generating combinations of $A, C$ and $B$, meaning we look for productions with $AB$ or $CB$ on the right-hand side. $S$ and $C$ both generate $AB$.
The last cell we already computed (same as the first one) — $S, A$.
b | a | a | b | a | |
---|---|---|---|---|---|
1 | B | A, C | A, C | B | A, C |
2 | S, A | B | S, C | S, A | x |
3 | x | x |
Third row
The first cell will contain variables generating the substring baa
. It can be split into:
ba
,a
b
,aa
The substring ba
, having length 2 can be generated by $S$ or $A$ (from the second row). The substring a
can be generated by $A$ or $C$ (from the first row). So we look for a variable that can generate combinations of $S, A$ and $A, C$. No productions have $SA, SC, AA$ or $CC$ in their right-hand side.
Same for the alternative split, b
and aa
: variables that generate a combination of $B$ (row 1) and $B$ (row 2), that is $BB$ — none. We mark the cell as unfillable -
.
The second cell corresponds to aab
, which can be split into:
aa
,b
($B$ row 2 and $B$ row 1)a
,ab
($A, C$ row 1 and $S, C$ row 2)
Only $B$ can generate one of them, namely $B \rightarrow CC$.
Similarly for the last cell, string aba
can be generated by $B$.
b | a | a | b | a | |
---|---|---|---|---|---|
1 | B | A, C | A, C | B | A, C |
2 | S, A | B | S, C | S, A | x |
3 | - | B | B | x | x |
4 | x | x | x |
Fourth row
Same as the third row, just that substrings of length 4 can be split into more ways:
The first cell, baab
:
b
,aab
ba
,ab
baa
,b
(the dash-
meansbaa
cannot be generated so this split is invalid)
b | a | a | b | a | |
---|---|---|---|---|---|
1 | B | A, C | A, C | B | A, C |
2 | S, A | B | S, C | S, A | x |
3 | - | B | B | x | x |
4 | - | S, A, C | x | x | x |
5 | x | x | x | x |
Final row
For the whole word, splits:
b
,aaba
(combinations of $B$ and $S, A, C$) — productions $A \rightarrow BA$ and $S \rightarrow BC$ba
,aba
(combinations of $S, A$ and $B$) — production $C \rightarrow AB$baa
,ba
(the dash-
on row 3 meansbaa
cannot be generated — split is invalid)baab
,a
(the dash-
on row 4 meansbaab
cannot be generated — invalid split)
b | a | a | b | a | |
---|---|---|---|---|---|
1 | B | A, C | A, C | B | A, C |
2 | S, A | B | S, C | S, A | x |
3 | - | B | B | x | x |
4 | - | S, A, C | x | x | x |
5 | S, A, C | x | x | x | x |
The final cell contains $S, A, C$ which means the whole word can be generated either from $S$, from $A$ or from $C$. Since the starting symbol $S$ is among them, we know $G$ can generate $w$.
IO
Input Grammar.in
Each row contains one production.
S -> AB | BC
A -> BA | a
B -> CC | b
C -> AB | a
Tests words.txt
and output
Each row (left side) contains one word to be checked. The corresponding row (right side) indicates whether the word can be generated by the grammar.
baaba
baaa
aaba
aba
bbb
1
1
1
0
0
Words to test on the converted automaton.
Another Grammar
Input Grammar.in
S -> XC
X -> aXbb | X
C -> Cc | c
Tests words.txt
and output
abbc
aabbbc
abbccc
aaabbbbbbccccccc
cab
abc
aabbcc
ab
1
1
1
1
0
0
0
0
Extended Example
Extend the algorithm presented above to not only tell whether the word $w$ can be generated, but also show the derivation tree. It should start with $S$ and end in $w$, showing what production was applied at each step:
S
BC (S -> BC)
bC (B -> b)
bAB (C -> AB)
baB (A -> a)
baCC (B -> CC)
baABC (C -> AB)
baaBC (A -> a)
baabC (B -> b)
baaba (C -> a)