Na'vi Frequency Dictionary - Info and Suggestions

Started by Tìtstewan, April 25, 2014, 08:40:00 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Tìtstewan

I have no knoledge about NLKT thing...but it worth playing with it.
Hmm, it looks like there is need just few steps to have a Na'vi-English translator (looking at Na'vilator :P)

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

Kame Ayyo’koti

Quote from: Wllìm on June 08, 2014, 09:01:28 AM
Clearly creating a complete grammar of Na'vi would be a extremely large job in this way :( I never experimented with NLTK's machine-learning features, and I guess that would require much less work however ;D

I have had a plan for quite some time, about trying to write a Na'vi parser that can analyse words and sentences. However, due to me being busy, that will have to wait one month more, when I have holiday :D
Yeah, whichever way we try to do it, it is going to take some work to make it understand Na'vi. If you get it to work, I would definitely love to see it in action! :)

Irayo nìtxan for posting your code, too.
"Your work is to discover your world, and then with all your heart give yourself to it."

Tìtstewan

A little update,

First, last weeks I had some problems with the software, I use for this project. Oeru txoa livu. :-[ Troubleshooting was annoying but I have re-intalled all the stuff, so they should work now perfectly.
The next thing is the verification. At this point, I can say now, no script based thing will be successful. There are many old posts which was written without that grammar knowdge that we have today. I really like to make this project as authorative as possible, that means to fix mistakes and this means human work.
Just a simple example: nume once used transitively, which was wrong. That means, the sentences must be corrected from *Oel nume holpxayt to Holpxayri oe nume.

However, the must stuff are done, and I should be finish soon. :)

I also note mistakes I saw. It could be useful for the Karyu as well as for the Numeyu:
common mistakes:

- forgotten/wrong case endings
- overuse of topic case
- wrong used infixes: infixes are not placed in the si-element
- wrong ending regarding transitive/intransitive verbs, especially with "nume"
- wrong use of the si-element (mostly used without their noun part)
- si-element used as "lu"
- general problems with si-verbs - best thing I saw: "pamrel tìsusìri" o___O
- *moeng, oeng without a + case ending
- wrong genitive on pronouns (*poyä, *fkoä *oengyä, *oengä etc.)
- generally wrong use of genitive (*C-yä, *u-yä, *o-yä)
- non-modal verbs used as modal verb in modal construction
- wrong-used modal verbs in general plus wrong word order
- general confusion with "kop" and "nìteng"
- forgotten apostrophes
- used case ending on verbs
- used unproductive affixes
- wrong placed "ke"
- forgotten "ke" in case of double negation
- used diacritics...
- forgotten the subjunctive after "fte"/"fteke"
- forgotten lenition
- adpositions used as prefixes
- used "futa" insted "tsnì" and vice versa
- used words from non-canon dictionaries
....

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

Tìtstewan

Update:

The word verifying is now complete. :D  Now, I have a file that it is 1.5 MB big and contains over 250'000 words! (WOU)

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

Tirea Aean

Tewti! :D I suppose that's the file that gets scanned for frequencies of each word? I guess now we are just about to see the completion of this project! :D

Tìtstewan

Yes, this is the file which will be used for the word's frequency. :)

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

Tirea Aean

Will these files be available somewhere when this is all done, to look at all the work that's been done on this historical project? I think it would be cool to save these files in the Gallery or something. :)

Tìtstewan

That final file will be available in the gallery, of course. :) (there is already a folder for that :P )
As for the raw materials, I think, they will be added in the gallery too.

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

阿波

Seysonìltsan ma Titstewan. Ngeyä tìkangkem leiu apxa srung awngeyä olo'fpi.

Tìtstewan

Irayo nìtxan ma EzyRyder!

-----
Now, finally I got the datas for frequency of the words. It was a very difficult way to this step. A apostrophe bug in the word sorting in Excel let drop most of the apostropes. Fixing it eats the whole night, but I got the weird solution: replacing the different types of apostrophe signs by one Na'vi uncommon letter (q). Now, I'm going to write the final document.

Are the smuk want a document which has the words sorted by their frequency or sorted by alphabeth or both (two documents)? :-\

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

阿波

I think, at the very least, sorted by frequency. That's the main point of frequency dictionaries, the ability to learn the most common and useful words before the... less so. But unless it's a trouble, perhaps having both wouldn't hurt too, although I don't see much use for it.

Tìtstewan

I still writing the dictionary. I use a special table, and if I do it correct, it should be easy to create both file. :)

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

阿波

It would also be nice if it could be one line per entry, so that it could be easily imported to a spreadsheet, so that Anki users (if there are any other than me :) ) could suspend their decks, and un-suspend unknown words in order of frequency.

Tirea Aean


阿波

Oh, and another useful feature could be either percent of the corpus a word comprises, or the number of occurrences of each word, if you'll also give the precise number of words in the corpus.

Tìtstewan

Quote from: Tirea Aean on July 27, 2014, 08:51:04 AM
Frequency
^ ;D
Stay tuned for the upcomming days. :D Writing a dictionary is a bit more difficult than I originally thought..

Quote from: EzyRyder on July 27, 2014, 08:12:50 AM
It would also be nice if it could be one line per entry, so that it could be easily imported to a spreadsheet, so that Anki users (if there are any other than me :) ) could suspend their decks, and un-suspend unknown words in order of frequency.
What file type does Anki use? .txt ?

Quote from: EzyRyder on July 27, 2014, 01:56:46 PM
Oh, and another useful feature could be either percent of the corpus a word comprises, or the number of occurrences of each word, if you'll also give the precise number of words in the corpus.
??? :-\ something like this?
Na'viIPAEnglishAmount    Percentage
fa[fa]  adp.with, by means of    XXXXXX.XXX%
fahew     [fa.ˈhɛw]  n.    smellXXXXXX.XXX%



-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

阿波


Tìtstewan

Now, the dictionary is written, I only have to add the values and calculate the percentages, then it's done :) :) :)

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

Tìtstewan

Well, I originally was going to finish the project tomorrow, but the real life jumps into it...
I ONLY have to work on 13'000 word (this is, to get all attached adpositions and all affixes)
(meh, I really hate when something unexpected like this happens :( :'()

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

baritone

Quote from: Tìtstewan on July 27, 2014, 02:05:15 PM
Quote from: Tirea Aean on July 27, 2014, 08:51:04 AM
Frequency
^ ;D
Stay tuned for the upcomming days. :D Writing a dictionary is a bit more difficult than I originally thought..

Quote from: EzyRyder on July 27, 2014, 08:12:50 AM
It would also be nice if it could be one line per entry, so that it could be easily imported to a spreadsheet, so that Anki users (if there are any other than me :) ) could suspend their decks, and un-suspend unknown words in order of frequency.
What file type does Anki use? .txt ?

Quote from: EzyRyder on July 27, 2014, 01:56:46 PM
Oh, and another useful feature could be either percent of the corpus a word comprises, or the number of occurrences of each word, if you'll also give the precise number of words in the corpus.
??? :-\ something like this?
Na'viIPAEnglishAmount    Percentage
fa[fa]  adp.with, by means of    XXXXXX.XXX%
fahew     [fa.ˈhɛw]  n.    smellXXXXXX.XXX%
I already know how to use it.
Na'vi language contains about 200 prepositions, conjunctions and pronouns.
Dictionary should be divided into sets of flashcards to 200 words each, and then the student will know what percentage of frequently used words he had learned after the learning of each set of flashcards.
I look forward to when your frequency dictionary is ready.