Tried the examples and decided to give the Thamizhz Thirukkural text which I have been trying to use for most of my assignments a try. The Thirukkural couplets have a definite structure and I wanted to see what I could arrive at both phonetically and meaningfully by having Markovs. I decided to try the
examples both a try with a small sample set of the kurals in Thamizh from a text file.
Thamizh வாழ்வார் (pronounced vaazhvaar) is split into வா ழ் வா ர் - vaa-zh-vaa-r and not v-aa-zh-v-aa-r (try highlighting just வ in the above)
But just வ is also a character. So, வ and வா are treated differently which works out well for us because splitting வா into two might not make sense. This is because Indian language has 'matras' a smaller measure or quantity of a character which helps with the syllables.
I found myself tweaking the N-gram length and maximum length of generated text the most to come up with different outputs. And I was thinking if there is a formulaic way to find the right combination based on the formula of the language and text we give as input. For example, if we were to give haikus as input and generate haikus, the max length (or somehow syllables) can be defined. This is something I want to explore more. I think this will be similar to the Markov mixer example. There could be a slider that tries to maintain a ratio or relationship between the two parameters (N-gram length and maximum length of generated text) to always keep it at 5-7-5 syllables in 3 lines.
I got how Markov chains work clearly from this article by Victor Powell anad Lewis Lehe.
I also felt good reading Emily Martinez's questions because they allowed me to reflect and think more deeply about something I had just discovered, but has been invisible but used by me all this while.
Also tried to use the Thirukkural API in the thesis code and see what I get to, but ran into errors.