Na'vi Frequency Dictionary - Info and Suggestions

Tìtstewan · June 08, 2014, 09:09:11 AM

I have no knoledge about NLKT thing...but it worth playing with it.
Hmm, it looks like there is need just few steps to have a Na'vi-English translator (looking at Na'vilator

)

Kame Ayyo’koti · June 08, 2014, 09:45:50 AM

Quote from: Wllìm on June 08, 2014, 09:01:28 AM
Clearly creating a complete grammar of Na'vi would be a extremely large job in this way I never experimented with NLTK's machine-learning features, and I guess that would require much less work however

I have had a plan for quite some time, about trying to write a Na'vi parser that can analyse words and sentences. However, due to me being busy, that will have to wait one month more, when I have holiday

Yeah, whichever way we try to do it, it is going to take some work to make it understand Na'vi. If you get it to work, I would definitely love to see it in action!

Irayo nìtxan for posting your code, too.

Tìtstewan · July 07, 2014, 03:56:08 PM

A little update,

First, last weeks I had some problems with the software, I use for this project. Oeru txoa livu.

Troubleshooting was annoying but I have re-intalled all the stuff, so they should work now perfectly.
The next thing is the verification. At this point, I can say now, no script based thing will be successful. There are many old posts which was written without that grammar knowdge that we have today. I really like to make this project as authorative as possible, that means to fix mistakes and this means human work.
Just a simple example: nume once used transitively, which was wrong. That means, the sentences must be corrected from *Oel nume holpxayt to Holpxayri oe nume.

However, the must stuff are done, and I should be finish soon.

I also note mistakes I saw. It could be useful for the Karyu as well as for the Numeyu:
common mistakes:

- forgotten/wrong case endings
- overuse of topic case
- wrong used infixes: infixes are not placed in the si-element
- wrong ending regarding transitive/intransitive verbs, especially with "nume"
- wrong use of the si-element (mostly used without their noun part)
- si-element used as "lu"
- general problems with si-verbs - best thing I saw: "pamrel tìsusìri" o___O
- *moeng, oeng without a + case ending
- wrong genitive on pronouns (*poyä, *fkoä *oengyä, *oengä etc.)
- generally wrong use of genitive (*C-yä, *u-yä, *o-yä)
- non-modal verbs used as modal verb in modal construction
- wrong-used modal verbs in general plus wrong word order
- general confusion with "kop" and "nìteng"
- forgotten apostrophes
- used case ending on verbs
- used unproductive affixes
- wrong placed "ke"
- forgotten "ke" in case of double negation
- used diacritics...
- forgotten the subjunctive after "fte"/"fteke"
- forgotten lenition
- adpositions used as prefixes
- used "futa" insted "tsnì" and vice versa
- used words from non-canon dictionaries
....

Tìtstewan · July 24, 2014, 06:49:42 PM

Update:

The word verifying is now complete.

Now, I have a file that it is 1.5 MB big and contains over 250'000 words! (WOU)

Tirea Aean · July 24, 2014, 07:27:50 PM

Tewti!

I suppose that's the file that gets scanned for frequencies of each word? I guess now we are just about to see the completion of this project!

Tìtstewan · July 24, 2014, 07:30:48 PM

Yes, this is the file which will be used for the word's frequency.

Tirea Aean · July 24, 2014, 07:34:51 PM

Will these files be available somewhere when this is all done, to look at all the work that's been done on this historical project? I think it would be cool to save these files in the Gallery or something.

Tìtstewan · July 24, 2014, 07:49:39 PM

That final file will be available in the gallery, of course.

(there is already a folder for that

)
As for the raw materials, I think, they will be added in the gallery too.

阿波 · July 25, 2014, 06:41:41 AM

Seysonìltsan ma Titstewan. Ngeyä tìkangkem leiu apxa srung awngeyä olo'fpi.

Tìtstewan · July 27, 2014, 07:10:15 AM

Irayo nìtxan ma EzyRyder!

-----
Now, finally I got the datas for frequency of the words. It was a very difficult way to this step. A apostrophe bug in the word sorting in Excel let drop most of the apostropes. Fixing it eats the whole night, but I got the weird solution: replacing the different types of apostrophe signs by one Na'vi uncommon letter (q). Now, I'm going to write the final document.

Are the smuk want a document which has the words sorted by their frequency or sorted by alphabeth or both (two documents)? $:-\$

阿波 · July 27, 2014, 07:57:43 AM

I think, at the very least, sorted by frequency. That's the main point of frequency dictionaries, the ability to learn the most common and useful words before the... less so. But unless it's a trouble, perhaps having both wouldn't hurt too, although I don't see much use for it.

Tìtstewan · July 27, 2014, 08:01:12 AM

I still writing the dictionary. I use a special table, and if I do it correct, it should be easy to create both file.

阿波 · July 27, 2014, 08:12:50 AM

It would also be nice if it could be one line per entry, so that it could be easily imported to a spreadsheet, so that Anki users (if there are any other than me

) could suspend their decks, and un-suspend unknown words in order of frequency.

Tirea Aean · July 27, 2014, 08:51:04 AM

Frequency

阿波 · July 27, 2014, 01:56:46 PM

Oh, and another useful feature could be either percent of the corpus a word comprises, or the number of occurrences of each word, if you'll also give the precise number of words in the corpus.

Tìtstewan · July 27, 2014, 02:05:15 PM

Quote from: Tirea Aean on July 27, 2014, 08:51:04 AM
Frequency

^

Stay tuned for the upcomming days.

Writing a dictionary is a bit more difficult than I originally thought..

Quote from: EzyRyder on July 27, 2014, 08:12:50 AM
It would also be nice if it could be one line per entry, so that it could be easily imported to a spreadsheet, so that Anki users (if there are any other than me ) could suspend their decks, and un-suspend unknown words in order of frequency.

What file type does Anki use? .txt ?

Quote from: EzyRyder on July 27, 2014, 01:56:46 PM
Oh, and another useful feature could be either percent of the corpus a word comprises, or the number of occurrences of each word, if you'll also give the precise number of words in the corpus.

$:-\$ something like this?

Na'vi	IPA	English	Amount	Percentage
fa	[fa] adp.	with, by means of	XXXX	XX.XXX%
fahew	[fa.ˈhɛw] n.	smell	XXXX	XX.XXX%

阿波 · July 27, 2014, 02:09:59 PM

Yup, sounds fine. .txt should work.

Tìtstewan · August 03, 2014, 03:50:45 PM

Now, the dictionary is written, I only have to add the values and calculate the percentages, then it's done

Tìtstewan · August 16, 2014, 04:36:13 AM

Well, I originally was going to finish the project tomorrow, but the real life jumps into it...
I ONLY have to work on 13'000 word (this is, to get all attached adpositions and all affixes)
(meh, I really hate when something unexpected like this happens

)

baritone · August 30, 2014, 12:37:00 AM

Quote from: Tìtstewan on July 27, 2014, 02:05:15 PM
Quote from: Tirea Aean on July 27, 2014, 08:51:04 AM
Frequency
^
Stay tuned for the upcomming days. Writing a dictionary is a bit more difficult than I originally thought..

Quote from: EzyRyder on July 27, 2014, 08:12:50 AM
It would also be nice if it could be one line per entry, so that it could be easily imported to a spreadsheet, so that Anki users (if there are any other than me ) could suspend their decks, and un-suspend unknown words in order of frequency.
What file type does Anki use? .txt ?

Quote from: EzyRyder on July 27, 2014, 01:56:46 PM
Oh, and another useful feature could be either percent of the corpus a word comprises, or the number of occurrences of each word, if you'll also give the precise number of words in the corpus.
$:-\$ something like this?
Na'vi IPA English Amount Percentage
fa [fa] adp. with, by means of XXXX XX.XXX%
fahew [fa.ˈhɛw] n. smell XXXX XX.XXX%

I already know how to use it.
Na'vi language contains about 200 prepositions, conjunctions and pronouns.
Dictionary should be divided into sets of flashcards to 200 words each, and then the student will know what percentage of frequently used words he had learned after the learning of each set of flashcards.
I look forward to when your frequency dictionary is ready.