Na'vi collation order?

Started by Yawne Zize’ite, August 03, 2011, 11:27:28 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Yawne Zize’ite

Kaltxì ma smuk,

As an extension of my original idea of a Na'vi locale, a way to build Na'vi collation order into a locale would be useful; press a sort button and everything appears in order!  There is one problem, though - what is alphabetical order in Na'vi?  The dictionary sorts ' before a, ä after a, ì after i, kx after k, ng after n, px after p, ts after t, and tx after ts.  The unusual thing is sorting ' as the first letter of the alphabet, instead of the last as most additional letters are (cf. Lakhota collation order), which suggests an initial sort order influenced by ASCII code values that sort punctuation before letters.

Do I have alphabetical order right?

Lance R. Casey


// Lance R. Casey

Yawne Zize’ite

#2
Thank you; I didn't know that alphabet was official.  I assume that c and g would sort as ts and ng, and since the sort order is explicitly influenced by ASCII conventions (placing the apostrophe first rather than last as it is in pre-1990s alphabets) I assume that non-Na'vi letters (b, d, j, q, x) follow English alphabetical order.  Also following English sort conventions (not ASCII!) é should sort weakly after e, so tute comes before tuté in a dictionary.

' a aw ay ä b d e (é) ew ey f h i ì j k kx l ll m n [g=ng] o p px q r rr s t tx [c=ts] u v w x y z

Another question along those lines - how would people expect to see the letters used for listing items?  Which of these looks best?




') ['uo]A) ['uo]
AW) ['uo]
AY) ['uo]
Ä) ['uo]
') ['uo]A) ['uo]Ä) ['uo]E) ['uo]
F) ['uo]
A) ['uo]B) ['uo]
C) ['uo]
D) ['uo]
E) ['uo]

Tirea Aean

Quote from: Yawne Zize'ite on August 04, 2011, 12:25:57 PM

Another question along those lines - how would people expect to see the letters used for listing items?  Which of these looks best?




') ['uo]A) ['uo]
AW) ['uo]
AY) ['uo]
Ä) ['uo]
') ['uo]A) ['uo]Ä) ['uo]E) ['uo]
F) ['uo]
A) ['uo]B) ['uo]
C) ['uo]
D) ['uo]
E) ['uo]

I'm not sure I understand the question. ??? :(

Yawne Zize’ite

In English - at least US English - you see standardized tests that give you the answers A), B), C), and D).  I'm wondering if a Na'vi standardized test (in a sawtute classroom) would offer answers '), A), AW), and AY) or '), A), Ä), and E) or maybe answers A), B), C), and D).  It's not unusual, in languages that have extra letters, to not use some letters for counting, or conversely to use some letters like Q for counting that they don't use for writing words.

Ftxavanga Txe′lan

Oh, I see what you mean now! It's impossible that the Na'vi would have A), B), C) and D) because the consonants B, C and D don't exist in Na'vi; but as for other alternatives it's difficult to tell. :o

Maybe AW), AY), EW) and EY) would work? They're all two-character vowels, which I guess would be a minimum coherent. :) Or perhaps A), Ä), E) and F)? I think it could work out as our equivalent of A), B), C) and D) in the sense that they are the four first characters of the Na'vi alphabet (unless I did some mistake there). :D

Yawne Zize’ite

AW), AY), EW), and EY) would work for the special case of a 4-question test, but what about something longer?  We can use numbers as letters from A to Z (and then, when they run out as they sometimes do, start over with AA).

I'm asking because I'm working on Na'vi locale files - the piece that tells software that Na'vi is a language needing special language settings - and they need me to enter the Na'vi alphabet, Na'vi alphabetical order, and the list of letters used by Na'vi for numbering things - and yes, there are languages where all three are functionally different!

Ftxavanga Txe′lan

Quote from: Yawne Zize'ite on August 06, 2011, 08:19:56 PM
AW), AY), EW), and EY) would work for the special case of a 4-question test, but what about something longer?  We can use numbers as letters from A to Z (and then, when they run out as they sometimes do, start over with AA).

I'm asking because I'm working on Na'vi locale files - the piece that tells software that Na'vi is a language needing special language settings - and they need me to enter the Na'vi alphabet, Na'vi alphabetical order, and the list of letters used by Na'vi for numbering things - and yes, there are languages where all three are functionally different!

I guess the only person who could tell us is Karyu Pawl. :) Normally no one dares doing anything that hasn't been attested by him. hrh :D Unless something has been said on the matter before, which I'm not aware of? Well, I do find the matter really interesting. :D Maybe we can do some guesses of our own for the time being and see what other people have to say about that? :)

You mentioned not all languages have numbering lists that go in alphabetical order - maybe it would be nice that the Na'vi numbering goes from the last alphabet letter to the first? Something like this:

', Z, Y, W, V, U, TX, TS, T, S, RR, R, PX, P, O, NG, N, LL, L, M, L, KX, K, Ì, I, H, F, EY, EW, E, AY, AW, Ä, A

Yawne Zize’ite

We have the alphabet linked upthread; since Na'vi was first reduced to writing by humans, apparently English-speakers with similar computer technology to the modern-day version and no experience with Native American alphabets, there's no reason for it to be in an unusual order.  If A, B.... was good enough for the Phoenicians it's good enough for the Na'vi.  :D

Ftxavanga Txe′lan

I guess I was simply trying to make it not too similar to the Human system ahah. :D
But I do understand your point, and you're probably right. ;)

hemmond

IMHO it'd be best if we use the letters which don't sounds almost the same... If you're not very good in hearing difference between ä and a, you can easily swap these two letters... And I would rather use only one-character long letters. Longer can mix up the formating... So my idea is this:

A, E, F, H, I, K, L, M, N, O, P, R, S, T, U, V, W, Y, Z...
old gallery link?id=1849[/img]
old gallery link?id=1890[/img]

http://twitter.com/hemmondssandbox

If it's change in you, then the world is changing too.
--22nd World Scout Jamboree anthem.

Yawne Zize’ite

But what about the sounds represented by ä, ì, kx, px, and tx?  Drop them altogether?  That doesn't sound like a very good idea.  You also forgot the letters c and g.

I do generally like spelling sounds with only one letter, but I'm hard-pressed to think of something better than the x-system, especially since Na'vi does distinguish -p'- and -px-.  Dots are used for languages of Ethiopia, but if people complain about typing ä, ì, and ʼ, they'd rather spell wrong than type ṗ, ṭ, and ḳ.

`Eylan Ayfalulukanä

Quote from: Yawne Zize'ite on August 10, 2011, 12:51:16 PM
But what about the sounds represented by ä, ì, kx, px, and tx?  Drop them altogether?  That doesn't sound like a very good idea.  You also forgot the letters c and g.

I do generally like spelling sounds with only one letter, but I'm hard-pressed to think of something better than the x-system, especially since Na'vi does distinguish -p'- and -px-.  Dots are used for languages of Ethiopia, but if people complain about typing ä, ì, and ʼ, they'd rather spell wrong than type ṗ, ṭ, and ḳ.

I personally like how things are now. But one thing you could do, borrowing an idea from Klingon, is to use capital letters for the ejectives. Although capitalization is frequently used in writing Na`vi, there is no hard and fast rule. So, drop the use of capitalization, and reserve the capitals for the ejectives. The same could work for rr and ll, R and L, respectively. ä and ì would continue to be used, as these are in all the common non-unicode character sets.

Yawey ngahu!
pamrel si ro [email protected]

Tirea Aean

#13
Quote from: Yawne Zize'ite on August 10, 2011, 12:51:16 PM
Na'vi does distinguish -p'- and -px-.  [/spoiler=snip]Dots are used for languages of Ethiopia, but if people complain about typing ä, ì, and ʼ, they'd rather spell wrong than type ṗ, ṭ, and ḳ.[/spoiler]

Yes it does. p' and px are different. the IPA for p' (which can be seen in the word tsap'alute) would be [pʔ] (a voiceless unaspirated (possibly unreleased) bilabial plosive followed by a glottal stop) where px is the voiceless bilabial ejective [pʼ]

Yawne Zize’ite

Quote from: `Eylan Ayfalulukanä on August 10, 2011, 02:56:33 PMä and ì would continue to be used, as these are in all the common non-unicode character sets.
If by "all the common non-unicode character sets" you mean "all 8-bit sets designed for Western European languages plus Turkish and VISCII," that is a true statement. ;) Ì is replaced in character sets aimed at Northern and Eastern Europe, to say nothing of sets that support another script; it's not actively used east of Italy, except in Vietnamese which uses incompatible character sets.

I still run across a lot of data in Shift-JIS, although my perspective is heavily skewed by only being able to read Arabic (a little) and Japanese out of non-roman-script languages and Arabic encoding guessing working well enough that I don't have to manually choose it.

I'm with you in liking things as they are now.  I never liked mixed-case Klingon; it's annoying to handle and the only distinction with any functional weight is q/Q, which could be handled as k/q or q/qh.  It's too entrenched to change now, no matter how much better I think "Tah pag tahbe'" looks and how much of a pain it's going to be learning if there is a way to force no case mapping for Klingon text.  Something like "fIaTKeyA qeveg" could be useful in an environment that doesn't allow typing ä, ì, and ' (but preserves case), or if you need a quick glance at how many actual sounds you're using.

Tirea Aean

#15
I think with respect to the OP the question was answered with the naviteri blog post kefyak? and for what it's worth, I have the LearnNa'vi iPhone app and it sorts according to that standard. The dictionary however sorts aw and ay inside of a and sorts ew and ey inside of e, where they would be expected to be. Does that call for a convention change in the dictionary? If so I will raise the question in the Dictionary thread as well.

naviteri: ( ' ) tìFtang, A, AW, AY, Ä, E, EW, EY, Fä, Hä, I, Ì, KeK, KxeKx, LeL, 'Ll, MeM, NeN, NgeNg, O, PeP, PxePx, ReR, 'Rr, Sä, TeT, TxeTx, Tsä, U, Vä, Wä, Yä, Zä

iOS app: ' A Aw Ay Ä E Ew Ey F H I Ì K Kx L M N Ng O P Px R S T Ts TX U V W Y Z

Dictionary: ʔ A Ä E F H I Ì K Kx L M N Ng O P Px R S T Ts Tx U V W Y Z

note that LL and RR are not there in iOS and Dictionary because no Na'vi word or syllable can start with ll or rr. (which is why their names have the glottal stop in front ie 'll and 'rr in naviteri)

Yawne Zize’ite

Yes, I believe there should be one standard order.  I don't have any feelings on which order it should be, but there should be only one alphabetical order for Na'vi in use.  Na'vi doesn't have the complications that have led to multiple alphabetical orders in other languages.

`Eylan Ayfalulukanä

#17
I would think the Na`viteri listing to be the authorative listing, because it is from K. Pawl, or at least 'blessed' by him. As a practical collation order though, it has some very challenging aspects to it. Thus, you have the IOS and the dictionary collation orders. The IOS order is very close to Na`viteri, but may not work well for other operating environments. We are all used to the dictionary order, and it has no serious drawbacks, but it is not quite 'official'.

I question of the Na`vi are even aware of such a thing as 'collation order'. Speech is taught by use, and probably only a few Na`vi Karyu care much about the sounds of the language. And maybe they don't even care about that, as their history is encoded in song.

All of the collation orders discussed work fine for practical Na`vi learning by sawtute who are used to a sound-symbolic written language. I bet most people probably don't really notice the slight differences.

In the end, I don't see any reason to change any of the traditional learning tools, unless someone wanted to go to the considerable trouble to make the changes, and make them 'stick'.

And Yawne Zize'ite, I think you are getting the idea, more or less. By sticking to things in the extended ASCII character set (even with the differences you mention), you are going to get something that will 'pass through' just about any 8 bit channel out there. Unicode solves a lot of problems, but it isn't completely universal yet. The idea of using the Klingon-like upper case letters was to get things to a point where the letters we currently represent with two characters could be represented with one. Between this, and sticking to extended (8 bit) ASCII, we have a character set that will work on most any computer system in the world. If this is not a requirement, I think the use of unicode is to be encouraged, as this is where the digital world is going (and we should continue to represent Na`vi as we have been doing, as that is what everyone is familiar with).

As far as Klingon goes, I will be getting a mega-dose of Klingon next week when I attend the Klingon Language Institute's annual qepa', which will be held right here in sleepy little Reno, just prior to the Worldcon Science Fiction convention.

Yawey ngahu!
pamrel si ro [email protected]

Yawne Zize’ite

Why would Na'vi people have a native collation order?  That presupposes a tradition of teaching literacy, which they don't have.  Collation order is spread along with a script, so they'd use some variant on the "ABGD" order unless they reanalyzed the script, which doesn't seem very likely.  The ones who would, in-universe, have a collation order are the sawtute specialists like Grace.  IRL linguists spend some time and effort (more than I've put into it) on coming up with a best-fit Latin script for previously unwritten languages, including what counts as a letter and the order of the new letters.

The reason I want to settle on just one order is that I'm writing up sort order files that, when this and other problems are ironed out and we get our ISO 639-3 code, I'd like to submit to OpenOffice, LibreOffice, and similar projects.  They can be pretty much whatever we want - if z should be the first letter of the alphabet I can put that in - but we only get one without a very good reason.  "Software designed to process only English isn't aware of digraphs" is not a very good reason.

I attached the collation file I have now, which uses the Na'viteri order.  A file in this format is suitable for submission to *Office or use with demo collators: you can try one out here.  It will never be included in MS Word (which doesn't even include Esperanto), but I don't see insurmountable barriers to a built-in Na'vi sort order in *Office a year from now or so.  I've got the files to build a Na'vi-aware version of *Office now (which could then be enhanced with a few tweaks to the spellchecker dictionary), but neither the skill nor the hard drive space.

By the way, what program is used for the PDF dictionary?  If it's TeX, it should be possible (although not something I know how to do right now) to slip in a Na'viteri sort order file.

As for encodings, he who equates UTF-8 with Latin-1 is he who has not dealt with a malfunctioning Python 2.* program. :)
Of course we should use Unicode; I wasn't aware that was even a question in the year 2011, since Na'vi uses no unencoded characters.  I seem to be the only one who thought of using ṗ, ṭ, and ḳ, and I haven't noticed any support for that idea.  The main problem with using case to expand the alphabet is that remaining environments that don't allow letters past 7F are often environments that don't preserve case.

I've never heard of any such thing as "8-bit ASCII." There's 7-bit ASCII, and a few dozen mutually incompatible ways to use an 8th bit (a minority of which include letters used in Na'vi), but no 8-bit ASCII.

qep'a'Daq Qapla'!

`Eylan Ayfalulukanä

Noble gestures indeed, ma Yawne Zize'ite!

Getting both ISO 639-3 status, and better support in *office are both good pursuits, thus I now understand your interest in this.

You might want to explain how you support digraphs in either of these areas. Would these dipraghps include the dipthongs? I am also not sure I completely understand what you want to do with p,t and k. Although they are (supposed to be) unaspirated, they seem to fit fine where they are.

As for '8 bit ASCII', what I mean by that is the extended character set (and I am sure there is a more official name for it) used on the PC platform and (I am sure) other computers. Despite the ubiquitiquisness of unicode these days, there are still applications out there that don't even properly recognize extended ASCII. They get really interesting when you need an ä or an ì.

Yawey ngahu!
pamrel si ro [email protected]