Na'vi collation order?

Started by Yawne Zize’ite, August 03, 2011, 11:27:28 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Yawne Zize’ite

#20
I didn't lift a finger towards getting an ISO 639 code; Taronyu put in the application last year, after the deadline as it turned out, and it's on this year's list of proposed changes.  We'll find out sometime between January and May of next year when the results are published.  The proposed code is "nvi".

By "digraphs" I mean those units of the Na'vi alphabet written with two letters; kx, ll, ng, px, rr, tx, and ts are universally agreed to be digraphs, and the Na'viteri alphabet adds aw, ay, ew, and ey.  My current sort proposal sorts the digraphs where they fall in the Na'viteri order; aw and ay between a and ä, ew and ey between e and f, kx between k and l, ll between l and m, ng betwen n and o, px between p and r, rr between r and s, tx between t and ts, and ts between tx and u.  Several European languages use digraphs in their official alphabets, so support for multi-letter "letters" has already been devised.  Non-Na'vi letters are sorted where they stand in the regular order, except for c and g. C is sorted as identical to ts, and g is sorted as identical to ng.

The name you're looking for is probably Windows-1252, which is a tweak of ISO 8859-1, better known as Latin-1.  Possibly Latin-9.  Not to be confused with Latin-2 (which lacks ì), Latin-3 (before Unicode, in order to see Esperanto online you had to install and use special Latin-3 fonts to display Esperanto letters), Latin-6, Latin-7, DOS code page 437, Windows code page 932, MacRoman, etc.
It wasn't all that many years ago that I had to change my computer's code page to get Japanese to display in older programs, which meant that any accented letters used for Spanish - or Na'vi had it existed at the time - fused with the next letter into Japanese characters.  Just this year I tried to open up a file a few years old that I'd downloaded off the Internet, and all the diacritics were wrong; it had been prepared using Latin-2, and MS Word didn't give me the option to change character sets.  (I found that WordPad does recognize Latin-2.)

You see why I like Unicode a lot.

Edit: Remembered the file was Latin-2, not MacRoman.

Tirea Aean

um. wow. okay I think this should be fine if its based on naviteri order and g=ng and ts=c. though really I don't see the big deal of c and g considering no one ever uses c and g. except to mess with people and joke around at how apparetly ts and ng used to be written c and g at some time while the language was in supremely early state. it is what it is, no harm done whether they are in or out.

Quote
By the way, what program is used for the PDF dictionary?  If it's TeX, it should be possible (although not something I know how to do right now) to slip in a Na'viteri sort order file.

LaTeX and perl if I recall correctly. I think the alphabet and sorting is in Tuiq's domain, not that of Markì or me. that stuff is hard-coded in the backend. I think it would be possible to re-order the sort, but as it is, there's no huge problem with the way it is. Keep in mind that the dictionary has been around since the time of the first posts and of course has existed before the alphabet was mentioned on naviteri.org :3

Yawne Zize’ite

I put in c and g for compatibility, since it's plausible they might turn up in Na'vi text and should definitely be sorted as ts and ng if they do.  To me, that felt more important than accomodating foreign words. If I were hand-sorting, I would place c and g in foreign words (e.g. a tawtute named Cam) in their position in the English alphabet but c and g in Na'vi words (e.g. cam "war") as ts and ng.  However, if there's a way to link a sort file to a dictionary to do that, I'm not aware of it.

Thinking of compatibility, I found a bug in my sort order file; while it does place á ahead of aw (as it should), it doesn't recognize accented digraphs (áw, áy, éw, éy) as digraphs but sorts them as independent letters. Guess it's time for some more code.

I've run into two problems with these locale files; one is the simple lack of some of the data needed in Na'vi, and the other is differences in sawtute usage that locale files are designed to accomodate.  Part of the minimum data required for a locale submitted to CLDR is translations for durations of time such as "hours" and "minutes," which just can't be said in Na'vi.  Another part of the minimum data is the preferred type of question marks.  I like «», since they can't be confused with tìftang, but I'm not the one typing up a novel in Na'vi!  And so on.

`Eylan Ayfalulukanä

First for Tirea Aean:
I remember Taronyu mentioning LaTeX on numerous occasions. I do not remember anything being said about Perl, but that doesn't mean that Perl wasn't/isn't used.

Now, for Yawne Zize'ite: So I get the impression that you feel that the dipthongs should be represented by digraphs as well. Normally (especially for English), I wouldn't feel this way. But in Na`vi, the dipthongs are so distinct, that representing them with digraphs is not a bad idea at all. That said, there are a very few cases where a combination of {a,e} with {w,y} is not a dipthong (but I can't think of any right now). I wonder how you would differentiate these? As for c and g, I would design things so that these would get converted to ts and ng in the sorting process. Or else, flagged as illegal text.

For both of you: A lot of things have changed in Na`vi since 'the time of the first posts' (does this mean there have really been six Tirea Aeans??  :D ), and these changes and clarifications have resulted in all of us changing the way we do things. Although I would agree the dictionary is functionally fine the way it is, rearranging it into the 'Na`viteri word order' has plenty of merit. It reflects the way change has been handled in other areas in the past. (Now, we have to get Tuiq to fix Eana Eltu, or else use a post-filter fix. And last I heard, Tuiq wants to off-load the project.) So, although I would not mind things staying as they are, I would not object to having the dictionary put in Na`viteri format going forward.

Yawey ngahu!
pamrel si ro [email protected]

Tirea Aean

Quote from: `Eylan Ayfalulukanä on August 11, 2011, 08:58:39 PM
First for Tirea Aean:
I remember Taronyu mentioning LaTeX on numerous occasions. I do not remember anything being said about Perl, but that doesn't mean that Perl wasn't/isn't used.

That's the part I'm not sure of. The dictionary editing interface pages all end with .pl, which is a file extension for Perl, so it's my assumption. Either way, switching this around will have to be Tuiq's doing. he is the ONLY one with full direct access to the backend of Eana Eltu.

QuoteNow, for Yawne Zize'ite: So I get the impression that you feel that the dipthongs should be represented by digraphs as well. Normally (especially for English), I wouldn't feel this way. But in Na`vi, the dipthongs are so distinct, that representing them with digraphs is not a bad idea at all. That said, there are a very few cases where a combination of {a,e} with {w,y} is not a dipthong (but I can't think of any right now). I wonder how you would differentiate these? As for c and g, I would design things so that these would get converted to ts and ng in the sorting process. Or else, flagged as illegal text.

I can agree with all this, for what it's worth.

QuoteFor both of you: A lot of things have changed in Na`vi since 'the time of the first posts' (does this mean there have really been six Tirea Aeans??  :D ), and these changes and clarifications have resulted in all of us changing the way we do things. Although I would agree the dictionary is functionally fine the way it is, rearranging it into the 'Na`viteri word order' has plenty of merit. It reflects the way change has been handled in other areas in the past. (Now, we have to get Tuiq to fix Eana Eltu, or else use a post-filter fix. And last I heard, Tuiq wants to off-load the project.) So, although I would not mind things staying as they are, I would not object to having the dictionary put in Na`viteri format going forward.

LOL REALLY?? ;D (Tirea Aean != Toruk Makto) Other than that, all this is true. I wouldn't mind asking for possible collation reform of the Dictionary, but really that rests on Tuiq at this point. I'm also fine with it as it is. kinda torn between, but easily swayed.

Yawne Zize’ite

Quote from: `Eylan Ayfalulukanä on August 11, 2011, 08:58:39 PM
Now, for Yawne Zize'ite: So I get the impression that you feel that the dipthongs should be represented by digraphs as well. Normally (especially for English), I wouldn't feel this way. But in Na`vi, the dipthongs are so distinct, that representing them with digraphs is not a bad idea at all. That said, there are a very few cases where a combination of {a,e} with {w,y} is not a dipthong (but I can't think of any right now). I wonder how you would differentiate these? As for c and g, I would design things so that these would get converted to ts and ng in the sorting process. Or else, flagged as illegal text.
To be pedantic, we already use digraphs for the diphthongs; if we use the Na'viteri alphabet they should be treated as independent letters of the alphabet like Hungarian cs or Croatian nj or pre-1997 Spanish ch.  English, on the other hand, has many digraphs like "th" but does not treat them as letters in their own right.  I'm not sure how to programmatically handle cases where those letters don't form a diphthong; I'll have to look at some Hungarian sort files to see if there's a reasonably simple solution.

There's no way for a simple sort file to flag illegal text; it sounds like what you're looking for is the beginnings of a Na'vi word processor!  (Sort files of this sort tweak the Unicode Collation Algorithm, so if you can type it the sorter will take it.  Put "عن النآفية", "Na'viteri", and "ナヴィ語とは" into a modern program, and it will spit them back out sorted.  Oddly, Excel doesn't use the Unicode Collation Algorithm since it sorts Japanese before Arabic...a mystery for another time.)

Quote
For both of you: A lot of things have changed in Na`vi since 'the time of the first posts' (does this mean there have really been six Tirea Aeans??  :D ), and these changes and clarifications have resulted in all of us changing the way we do things. Although I would agree the dictionary is functionally fine the way it is, rearranging it into the 'Na`viteri word order' has plenty of merit. It reflects the way change has been handled in other areas in the past. (Now, we have to get Tuiq to fix Eana Eltu, or else use a post-filter fix. And last I heard, Tuiq wants to off-load the project.) So, although I would not mind things staying as they are, I would not object to having the dictionary put in Na`viteri format going forward.
I do like the idea of converging on a standard order (otherwise why this thread?) but it sounds like the backend that handles the wordlist given to Tirea Aean to typeset is written in a language I don't know.  I could possibly rearrange the sort order with a week of spare time and a Perl reference book, but I'm sure there is someone who could do it much better.