Na’vi Word Usage Reference (WIP)

Kame Ayyo’koti · October 22, 2014, 08:47:27 PM

This is a little something I'm putting together for myself, and I thought I'd share.

I wanted to have a reference that listed all examples for a particular word, so using my scripting wizardry, I threw together something that took a bunch of sentences from Paul's website, parsed them, and produced this PDF. I need to add more content, and it needs a LOT more polish (and fixes), but all-in-all I think it works great for its purpose.

Here is what I have so far: Preview Example

The script is written in Python. My plan is to make it flexible enough to accept and output different formats. For example, it could read in example sentences with translations (like the example) from a spreadsheet, or just plain Na'vi text. You could have it create LaTeX files (for compilation to PDF), HTML pages, maybe other things. There would be some customization. If I get that far, I've been thinking of adding a little Natural Language Toolkit code, to hopefully get better information, but that would require some work.

I imagine it could be used to produce learning resources. For example, a story written nìNa'vi could be run through it, and the resulting PDF would have the story and the data (word counts and all that). Perhaps this information could help students learn to read the story more quickly.

I could release the code as it is right now, but to be honest, it's mostly a disaster lol. I was in a hurry to get it working because I was so excited by the idea, so it needs a LOT of cleaning up and documentation. But if anyone else would like to have a look and maybe make something of their own from it, I'd be happy to share it.

Tirea Aean · October 22, 2014, 09:15:50 PM

+1

NICE. Can I see the source? <3 Python

Wllìm · October 23, 2014, 01:34:37 AM

WOW! +1

So is the parsing / stemming all done automatically? That is completely awesome!

I'd love to see the code; please don't worry if it's not the neatest code ever, since if you compare it to my usual code style it will be great

Plumps · October 23, 2014, 04:12:08 AM

Looks amazing!

To me that almost looks like a frequency dictionary...

$:-\$

I'd be more interested in what sources you used. You say a bunch of sentences from Naviteri, so it's not the whole blog, am I right?

Tìtstewan · October 23, 2014, 09:24:48 AM

Quote from: Plumps on October 23, 2014, 04:12:08 AM
Looks amazing!
To me that almost looks like a frequency dictionary... $:-\$

I'd be more interested in what sources you used. You say a bunch of sentences from Naviteri, so it's not the whole blog, am I right?

This.
I guess, the sources are the Na'vi sentences and textes Pawl has used.

Kame Ayyo’koti · October 23, 2014, 12:21:58 PM

Quote from: Plumps on October 23, 2014, 04:12:08 AM
Looks amazing!
To me that almost looks like a frequency dictionary... $:-\$

I'd be more interested in what sources you used. You say a bunch of sentences from Naviteri, so it's not the whole blog, am I right?

The sentences were taken directly from several of Naviteri's earliest/oldest posts. It's not the entire blog. I only processed enough to create a decent example of what this script produces. Furthermore there are mistakes I made in the parsing, and some of Paul's examples even have mistakes (one example uses *soaiayä!), so the document I posted definitely isn't finished.

I'm planning to process all of Naviteri so I can use it as my own learning reference. I will post my copy of this document when I'm done so others can use it.

In it's current form, the program is meant to provide a reference document containing (for the set of text you give it):

Frequency of each word the entire input contains.
Frequencies of words, sorted by Part of Speech.
Words used sorted by Part of Speech (just for convenience).
A basic "dictionary" that provides the word's meaning, and a list of all sentences where it is found, with the word highlighted. (With the sentence's translation beside it.)

So yes, it is a frequency dictionary for the set of text you gave it. But it's also a "word-usage" reference, since you can view all the ways that word has been used in the input text.

While right now it provides a complete "reference document," I intend for it to be as flexible as possible. It could provide whatever you want (within it's capabilities; word count/count sorted by PoS/example sentences highlighted/etc.) in whatever output format you want (PDF, CSV, JSON, XML, plaintext, plaintext with BBCode (for posting to the forum

), sl.). It could accept different forms of input as well: sentences with translations (as in this example), sentences without translations, blocks/chunks of text (like stories or forum posts), etc.

In fact, this program maybe could be used for any language at all. (Although I imagine very agglutinating languages would be problematic.)

Quote from: Wllìm on October 23, 2014, 01:34:37 AM
So is the parsing / stemming all done automatically?

This program builds directly on the word-counting method I used in the script I made a while ago. (Never posted the code for that.) I just added a "dictionary" to it in order to provide word definitions for the document.

Unfortunately parsing as it works now cannot be fully automatic, since the method I used is very simple and language is complex. I'm not intending to provide any miracles here. It does however have a certain amount of "memory," so once you provide it with information you shouldn't have to provide that information ever again. From then on, it will only require your help if a word could have two or more possible definitions. (For example, "tute" could be "tute: person" or "tuté: woman.") I already have some ideas for further improvement in this area.

I will see about throwing together a download for the program today, with a little explanation of how it works.

Kame Ayyo’koti · October 23, 2014, 09:06:05 PM

As promised, here is the program(s). It comes with a small tutorial.

Booklet Maker

I'm releasing it under the GPL, so if anyone wants to make something of it, you're welcome to. No need to ask beforehand.

I can answer questions, but since this isn't (yet) really meant for public consumption I'm not going to promise everyone will be able to get it to work. It's really meant for anyone who wants to look at the code.

As long as life doesn't conspire against me (which it mostly has this year...), I will flesh it out and produce a full-featured program with a GUI that will be as user-friendly as possible. I'll also make sure it can run on Windows.

Na’vi Word Usage Reference (WIP)

Kame Ayyo’koti

Tirea Aean

Wllìm

Plumps

Tìtstewan

Kame Ayyo’koti

Kame Ayyo’koti