Parsing Tamil verse: some observations

As I am learning more about Tamil meter, I have been very interested in Kevin Ryan’s suggestion (Phonology 34.3 [2017]: 581–613) that the metrical units of Tamil verse consist of a strong position and a weak position which are subject to weight-mapping of different strictness. The Tamil metrical tradition doesn’t distinguish between a strong and a weak position within a cīr, as far as I know, but it might not need to: it may have been taken for granted that the first acai in a cīr was “strong,” and the second “weak,” in some sense.

I tried writing another Python script to automatically parse a Tamil text into metrical units (syllables, acaicīr, and lines). The script can be found in this GitHub gist. The data I used was the electronic text of the Kuṟuntokai posted on GRETIL by Project Madurai (evidently from 2001!).

Luckily, that version of the Kuṟuntokai is typed according to the “traditional” system of putting spaces between cīr rather than between words. (I don’t know how traditional it is, only that modern editions have started to favor putting spaces between words rather than between cīr.) Hence all I had to do was parse each cīr into syllables, and then try to match these syllables to patterns of acai (this second part was the hardest). The result is a JSON file that represents every cīr as an array of two acai, each of which is labelled according to the type (nēr, nirai, nērpu, and niraipu) and quantity (G or L for nēr, LG or LL for nirai, etc.). I wasn’t able to parse every syllable — either because of errors in the electronic text, or more likely, my own lack of understanding of the intricacies of Tamil meter — but I think I got 97% of them. Hence the following statistics should be more or less right, at least for the Kuṟuntokai.

One way to think about positional asymmetry (i.e., strong and weak positions being regulated by somewhat different rules or constraints) is in terms of conditional probability. What is the chance that a nēr-acai will occur in a strong position? What is the chance that this nēr-acai will be constituted by a heavy rather than a light syllable? What is the chance that it will be followed by another nēr-acai in the weak position? And what is the chance that this nēr-acai will itself be constituted by a heavy syllable? And so on. These questions will require much more research, and asking intelligent questions of the data, but here are a few initial observations:

  • Nēr-acai are almost never light in the strong position. I only counted 18 instances, in contrast to 5,322 instances of a heavy nēr-acai.
  • Whether a nēr-acai or a nirai-acai occurs in the strong position, there is still about a 65% chance that a nēr-acai will occur in the weak position.
  • When a nirai-acai occurs in the strong position, if a nēr-acai follows in the weak position, it is more likely that the nēr-acai will be heavy if the final syllable of the preceding syllable of the nirai-acai is light (71% as opposed to 58%). This might simply be due to the different frequencies of word shapes in the lexicon.
  • Among the possibilities for a nirai-acai, LL predominates over LG (60% to 39%) in the strong position, but they are more closely matched in the weak position (53% for LL following a nēr-acai, and 49% for LL following a nirai-acai).

A few caveats: I have basically ignored nērpu and niraipu in these numbers, since they usually don’t rise above a single percentage point. Also, I am taking the traditional system more or less at its word. It might be that alternative ways of parsing Tamil verse (based on foot structure, example) will reveal other patterns and generalizations. Finally, I only allowed cīr to have a maximum of 2 acai. I don’t know if this is correct for the meter of the Kuṟuntokai, and some of the errors look like they might involve cīr with three acai, but I’ll have to wait until I understand the system a bit better.