Automated checking for metrical errors: Prakrit gāhās

For a few of my projects it would be useful to ensure the metrical correctness (which often co-varies with grammatical correctness) of a text that I’ve typed up. I was inspired by Shreevatsa’s metrical analysis tool to come up with a very simple Python script for validating a text that has been composed in Prakrit gāhās. The code is below, pasted from this GitHub gist.

Unlike Shreevatsa’s tool, which is used primarily for identification, I use this script for proofreading. The input file follows a certain set of conventions:

  • It has to be in the ISO-15919 encoding. Hence for anusvāra.
  • The vowels e and o should be written like so when they are metrically short, and as ē and ō when they are metrically long.
  • Similarly, when a following anusvāra nasalizes a vowel without making the syllable metrically heavy, the vowel should be written with a tilde (ĩ, mostly in the endings –āĩ and ēhĩ) rather than with .
  • The labels for each verse are expected to be in Arabic numerals and placed, after a hyphen and a space, at the end of the second verse line.

Some of these requirements will seem a bit “fussy” to most readers and also from a technical point of view. The script, for example, should be able to parse a line without knowing in advance whether the signs for e and o represent long or short sounds in any given case. But I want to end up with a file where these distinctions are in fact marked, so I want the script to tell me when the letters as I have written them don’t produce a completely metrical text. This is because I’m shooting for a representation of the text in which the metrical structure (and hence the phonological structure) is perfectly represented by the writing. I’ll have more to say about the orthography of Prakrit later.

The output file will, for each line, print the parsed text if there are no errors. If there is an error, it will say where in the line the error occurs. Hence here is the input for the first verse of Dhanapāla’s Pāiyalacchī:

namiūṇa paramapurisaṁ purisuttamanābhisaṁbhavaṁ dēvaṁ
vucchaṁ pāiyalacchi tti nāmamālaṁ nisāmēha - 1

And here is the output:

Verse 1, line 1: namiūṇa paramapurisaṁ purisuttamanābhisaṁbhavaṁ dēvaṁ
[namiū][ṇaparama][purisaṁ][purisut][tamanā][bhisaṁbha][vaṁdē][vaṁ]
[LLG][LLLL][LLG][LLG][LLG][LGL][GG][G]
Verse 1, line 2: vucchaṁ pāiyalacchi tti nāmamālaṁ nisāmēha
[vucchaṁ][pāiya][lacchit][tināma][mālaṁ][ni][sāmē][ha]
[GG][GLL][GG][LGL][GG][L][GG][G]

The script showed me an error for verse 223:

Verse 223, line 1: uya piccha dharaï jīvaï duccaṁ duattaṇaṁ disā āsā
ERROR NUMBER 20:
Gaṇa number 5 in line 1 of verse 223 has the incorrect form LGL.

When I consulted Bühler’s edition again, I saw that I had to correct duattaṇaṁ to dūattaṇaṁ.

More interesting was the error that the script showed me for verse 175:

Verse 175, line 2: avarillam uttarijjaṁ uyaṭṭhī uccaō nīvī
ERROR NUMBER 9:
Gaṇa number 4 in line 2 of verse 175 has the incorrect form LGG.

When I checked on this, it turned out that uyaṭṭhī was indeed the reading of Bühler’s edition, although he gave ujjattō (which would be metrically fine). Uyaṭṭhī is also given in the Pāiyasaddamahaṇṇavō for nīvī. I didn’t have the temerity to change the text, but something is clearly wrong here.

Note that I’ve written the script for Prakrit, and it won’t work for Sanskrit āryās, because of Sanskrit’s greater phonological complexity.

Here is the gist: