Complete word derivation list

Started by EyeOfPython, January 17, 2015, 12:18:36 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

EyeOfPython

So I created a project that reads from the NaviDictionary.pdf, builds word-trees according to the "derivated from" in the dictionary and orders the words by roots first, then derived words, etc. This is very useful when trying to learn from scratch

The script that generated the words (bit messy though, don't judge me by this :C):
https://github.com/EyeOfPython/Navi-Vocab-Generator

The file outputted is navi-derivation.txt (a *.csv file, actually) and I converted that into *.xls.
In the *.txt is a space in front, because I had a lot of problems to import this into Excel ...

The first column is the term, and the remaining columns are the words, from which the term is derived.
However, this list is very flawful, in fact, I have many false positives and a lot of derivatives are missing, due to the fact that not all parent derivatives are listed in the dictionary.

Now my request to you is that you, skilled Na'vi community, help me, unskilled Na'vi fan, to correct the list with the appropriate parent derivatives, so that I can create a optimal ordering/sorting for the Na'vi words, based upon "derivation depth" and the (coming) frequency table.

When you start working on a chunk, first claim your words with a post, then correct the words you claimed, and then post a *.txt, *.csv or *.xls (preferably the last one) with only the rows you changed. That way we can split the work better.

It would be really nice to see a Anki package with an optimal ordering of the words, not just plain alphabetical.
If I make errors, don't hesitate to correct me.

EyeOfPython

Important is that the result is actually a tree. Do not add a parent, if a child of that parent is already present.
Example:
tsukspaw spaw
ketsukspaw tsukspaw -> NOT spaw
If I make errors, don't hesitate to correct me.

阿波

I'll try to do the first 100 entries today, just to see how long is this gonna take. I'm gonna do this during study breaks, so be patient.

阿波

Approximately half an hour per 100, so far. Not bad. Just skim through, to check if it's all right.

Plumps

Sounds interesting. Are you looking for productive derivational parts or whatever the word is comprised of? I noticed in a (very very very) quick overview in the original .xls that 'awlo was list as something like 'aw, alo, -lo and -o. Because -lo is not a derivational morpheme (doesn't have a meaning on its own) – it's just 'aw and alo.

The thing is also to consider: in this example, what is the parent? What is the derivationl morpheme. It's just an ellision of 'awa alo. There are a number of words that work like that.

Unfortunately I'm not well-versed in programming languages at all and don't think that I will be of much help :(

阿波

As far as I understand, the purpose of this project is to make the order of learning easier, by learning the derivations after what they are derived... of? from? So, the proper etymology isn't particularly important.

EyeOfPython

First of all: Thanks for you contribution! :)

As I said, this data is constructed with a script, which isn't even fully developed. Because of that, stuff like (your example) 'aw, alo, -lo and -o can be emitted, which has two false positives. I think it's easier to remove false positives than to add missing ones, so I think of this more as useful than as harm.

And indeed, I want to use this to construct Anki ormemrise vocab which has a for learning optimized order.

Regards,
Tobi
If I make errors, don't hesitate to correct me.

EyeOfPython

EzyRyder, you still got two things "wrong", as I see (more like another interpretation than I do):
nì'awtu   nì'aw   tute -> this should be -tu , because there is a suffix in the table for -tu
way a plltxe   way   -a-   plltxe -> this might seem like hair-splitting, but it should be "a" instead of "-a-".
The rest seems fine
If I make errors, don't hesitate to correct me.

Tìtstewan

I don't know enough about php, c++ and stuff. :( But I still like that idea of a word derivation list.

Quote from: EzyRyder on January 17, 2015, 12:53:46 PM
Approximately half an hour per 100, so far. Not bad. Just skim through, to check if it's all right.
Cool, that file reminds me very much at my frequency project. ;D

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-