OpenOffice Na'vi Locale - Help Needed!

Yawne Zize’ite · August 05, 2011, 04:44:28 PM

I devoted altogether too many hours last night to creating a Na'vi locale file for OpenOffice, and it is done - it has all the strings that must be translated for a locale file translated, formats entered, and even a custom Na'vi sort order (in the attached text file).

However, it is missing one critical item - community consensus.

This is where I need help. If you could set your word processor to Na'vi mode, what do you want it to default to? How should it do automatic quotation marks? Should it show English or metric units? 99,999.88 or 99.999,88 or 99 999.88 or 99 999,88?* How to format a long date (e.g. trrpuve, 5vea trr volvea vospxìyä 2011vea zìsìtä)? How to format a short date (e.g. 11/8/5, 5/8/11, 8/5/11)? 5:28 Kx.M. or 17:28? $10 or 10 $? And so on.

*Yes, using 8 and 9 was intentional. Getting numbers to display in octal instead of decimal is far, far more than a simple locale file can do.

Before submitting this locale file, I want to make sure that I'm sending in settings that the lìfya'olo leNa'vi is content with. I can't expect to please everyone, since I have to settle on one default, but I don't want to send in something we can't agree on.

The first poll is about quotation marks. " san " and " sìk " as the quotation marks themselves are not supported - OpenOffice only allows quotation marks one character long - so we're starting out with a second-best option. What sort of quotation marks do you think look best with Na'vi?

Yawne Zize’ite · August 05, 2011, 10:37:43 PM

After finding Taronyu's pending request for the ISO 639-3 code nvi, I plan to switch this locale to that code. This means it may not be accepted before we actually get the code, but on the bright side that's some extra time for testing.

Yawne Zize’ite · June 26, 2012, 12:36:44 AM

This project took off like a lead balloon.

A year later, I've learned how to implement a basic Naʼvi sort customization for LibreOffice. It works as expected; if you give it yaymak, yän, and *yazekwä, it will sort them in the order yazekwä, yaymak, yän. Because of how LibreOffice is designed, it comes attached to a Naʼvi locale file, which displays ʼRrtan dates in Naʼvi and allows you to select Naʼvi as your default language for documents. (If you have smart quotes enabled, it also turns all your single quotes into U+2019 '.) I have added an experimental collation for Romanized Quenya, a collation for Romanized Klingon that currently has a serious bug, and support for tagging documents as Dothraki or Sindarin. The next development in this line would be to add customized sorting for Dothraki and Sindarin, then full script support, and then this little project will be done; there's still not much you can do with a locale file, but the sort should be useful.

Unfortunately, at the moment what I have is a source code patch. Download your own copy of the LibreOffice source code, patch it, and then compile and install it. Since all I altered were internationalization files, it should work on any upcoming version of LibreOffice.

The other attached file is a slight modification of ʼeylan naʼviyä's Hunspell dictionary for Naʼvi with the language code changed from da to nvi, which will now be recognized as a dictionary for Naʼvi. Install it after you have installed a version of LibreOffice that acknowledges Naʼvi, and you should have automatic spellcheck.

`Eylan Ayfalulukanä · June 26, 2012, 02:19:09 AM

As a heavy Oo user, I will have to download this and compile it into LibreOffice on my Linux machines. Cannot do this on my Windows machines at work, though.

I will be anxious to see what you have done for Dothraki. Dothraki does not use any unusual characters, although it does have a number of digraphs. The rest of its collation is straightforward, and the current dictionary does not recognize the digraphs in any special way.

Klingon is a nice addition as well. I hope you can fix the bug, perhaps in time for the Qepa' in lats August.

Yawne Zize’ite · June 26, 2012, 03:42:24 AM

I have done zero for Dothraki. It's in the language list, assigned the private use code "qdk", and that's the end of Dothraki support. I wouldn't mind adding more Dothraki support, but that would require knowing Dothraki. To make a nice locale file, it's good to know conventions for separators (e.g. do you use . or , as the decimal point?), quotation marks, date formats, the words for AM and PM, the days of the week and their order, the months of the year in the relevant cases, quarters of the year, BC and AD, "true" and "false/not true", some currency unit, and alphabetical order. Details can be pulled in from another file if there's simply nothing to work with; I pulled everything in the Klingon locale but the alphabet and sort order from English locales, since there's just not enough information to make a locale file but I needed one to attach a sort tailoring file to. And there's extrapolation; I used a lot of that for the Na'vi file (AM and PM coming out as S.K. and Kx.M., translating the eras as "before Jesus" and "after Jesus", numbering the months from 01 to 014, etc.), so I'm sure there's something wrong or overlooked in the text.

It's the wrong time of night to get immediate support with the Klingon bug. In a nutshell, I put two different instructions in the sort tailoring file, either one of which should have sorted the apostrophe U+0027 after the letter z, and neither was respected. I tested them on the ICU's Locale Explorer, so I don't think I miswrote the rule. If you can type the Modifier Letter Apostrophe U+02BC, that will sort after z by default. (U+02BC will sort after other apostrophe-like characters and before a in Na'vi, due to the same bug.) There's a nonzero chance of this being deemed worth fixing in the main branch, since this problem would affect many American languages.

Be very careful with the sort; if given any chance Calc will revert to the default language's sort, which is not going to be Na'vi or Klingon unless you set one of those as default. Even tagging the cells with the correct language doesn't guarantee the correct collation.

`Eylan Ayfalulukanä · June 27, 2012, 02:43:30 AM

A lot more development needs to be done on Dothraki before there will be enough information to answer those questions! (I did see your PM, though.)

Yawne Zize’ite · June 27, 2012, 05:18:00 AM

As I did for Klingon, I can put in a shell of a locale file that pulls all its data from another locale (en_US, unless something else is more appropriate) except for the alphabet and sort order -- but I'll still need the alphabet and sort order.

`Eylan Ayfalulukanä · June 27, 2012, 03:23:48 PM

en_us would be a good place to start. If I have time tonight, I'll see what I can put together for those basic items.

Irayo

Yawne Zize’ite · June 29, 2012, 04:46:57 PM

Well, I have good news and bad news.

The bad news first: not only has the Klingon bug not changed - my plan for getting rid of it that didn't work earlier relied on ICU 4.8 and LO uses 4.4.2 - but I've also found a new bug in Quenya indexing. I intended th to index as s, but while th sorts as s I can't find a way to force it to index as s (although þ indexes as s without any problems, and þ should generally be preferred in Quenya text). It's being sorted as Th between S and T for the moment.

It's amusing that both bugs are in code I put in to compensate for the end user entering letters that, strictly speaking, shouldn't be there; <th> is rare in Tolkien's Quenya writings (he used <þ>) and ' is a punctuation mark not the letter ʼ. So hopefully they won't be crippling.

The good news: Dothraki and Sindarin now have "full" support, including support for Dothraki contracted geminate digraphs (kkh, etc.) in sorting. Sindarin has a full locale file; Dothraki, due to less information, draws all but its sorting and indexing from en_US (US English). This does include the US measurement system.
Quenya and Sindarin locales have been changed to use Imperial instead of US measurements. (I didn't know it was even an option before.)
Sindarin collation has problems similar to Naʼvi collation; since there is no non-lexical way to tell the difference between true digraphs and coincidental pairs of letters, words such as "Edhelharn" and "gaurhoth" will be missorted as if they contained "lh" and "rh".
I have also made various tweaks and improvements (well, I think they're improvements) to Klingon, Naʼvi, and Quenya.

As always, complaints are welcome! Once I can squash those bugs, my next projects are working on a Windows binary patch (the goal being to make something that can be used to patch an installed copy on Windows; thanks to the difficulties of Cygwin this is not going very fast) and script support for pIqaD. What's the "standard" Unicode pIqaD font at the moment?

Yawne Zize’ite · June 30, 2012, 08:50:29 PM

Good news: the Klingon bug is squashed! PEBKAC error; I thought that if I escaped out the ' and the ' with a \, it would work, but I needed to quote it with straight quotes. This refinement has been added to the Naʼvi collator as well. (While Dothraki, Quenya, and Sindarin use apostrophes, they use them as punctuation not letters.)

No progress on the other goals, but that Klingon bug was a show-stopper.

Human No More · July 02, 2012, 05:57:29 PM

Nice

Quote from: Yawne Zize'ite on August 05, 2011, 04:44:28 PMShould it show English or metric units?

I think you mean Imperial units. We haven't used them for decades, thank you very much

IMHO: Decimal should be dot; thousand should be comma, as those are the standards. Short date should be either big endian ([2012|12]/07/16 - Japanese style) or little endian (16/07/[2012|12] - rest of the world other than US) - just not middle endian (US style) for ease of parsing.

Yawne Zize’ite · July 02, 2012, 09:54:24 PM

I set metric units, a slight preference for big-endian numerical dates but little-endian dates with words (since the grammar came out better that way), and 99 999.88 for the number separators (not the Continental European-style 99.999,88 or the fully Anglosphere 99,999.88; the currently recommended practice for thousands in international contexts is to use a space). They seemed like the safest choices for a property that was created in the US but popular worldwide.

Would you believe that people in the US speak in middle-endian dates? Except for "the Fourth of July," which specifically means the American Independence Day holiday, we really say "July second" or "September fifteenth, two thousand nine". We didn't start writing dates that way just to annoy the rest of the world.

Strangely, in the US we do call them "English" units, not "US" units, even though our units aren't the same as Imperial. "US" units makes sense, seeing as how they're restricted to the US, but we never call them that except in self-consciously international or technical contexts.

I set the Quenya and Sindarin locales to Imperial; since Prof. Tolkien died in 1973 and didn't use metric in his books, that felt best. I even tried an experimental middle dot (·) as the decimal separator but

Yawne Zize’ite · November 20, 2012, 07:29:36 PM

An update. This version has no new functionality, but will work with LO 3.6 and 3.7, and changes the Dothraki code from my qdk to the qdo listed in the Conlang Code Registry.

I was working on adding some calendars for use with Quenya and Sindarin, since that looked less intimidating than figuring out how to hack new "Unicode ranges" out of the Private Use Area for scripts, but the next step in Naʼvi support would be more support tools; a hyphenator, a dictionary, etc.

Trying to build LO on Windows machines proved formidably difficult; by the end I'd gotten it to build a bare-bones experimental build of LO that wasn't close at all to the main build, which kept crashing for unknown reasons. I plan to get back to it sometime, but maybe after the semester is through.

`Eylan Ayfalulukanä · November 21, 2012, 03:27:54 AM

Ma Yawne Zize'ite,
I'm glad to see you are sticking with this project. I have become so busy that it is leaving me precious little time to conlang, but conlang stuff I do as much as I have time for.

I plan to do some updating to my computers soon. That might be a good time to try applying your patch to LO.

OpenOffice Na'vi Locale - Help Needed!

What should the Na'vi quotation marks be?