Computational linguistics & Na'vi

Started by Our Lady of Toast, January 18, 2010, 07:18:37 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Our Lady of Toast

I know "computers" and "Na'vi" seem like a strange match, but bear with me here.   ;)

I'm doing my PhD in linguistics, and work with (and write!) a lot of code for dealing with language.  Right now I'm putting together a parser - basically, a program that checks that Na'vi sentences are grammatical (based on the rules we know).  Parsers can also be used as parts of larger systems, such as translation programs or chatbots; I'm hoping to end up with something other people in the community could enjoy.

Is anyone else here working on a similar project?  (If you are, btw, I'm using the SFST package for morphological parsing, and will probably use NLTK for the rest ... since those are the tools that'll be used by the class I'm TAing this semester.)

suomichris

Sounds like a cool project!  I imagine it will be hard to deal with infixes using a parser like that, but then again, I know next to nothing about natural language processing.... ;)

Na'rìng

I thought out how I would program the parser, but my right after I thought that I then thought: it is better to be relatively fluent in the Na'Vi language first before I write up the program.

Although an error-correction system would be nice for Open Office! I can imagine my teachers response when I hand in an essay written in Na'Vi... That would be priceless! And 100% grammically correct, too!

That's my two cents :P.
Eywa ngahu ma smukan.
Eywa'evengä yawne lu oeru.


It kraon. XD!!! (Speak it the Na'vi way)

dcb

#3
I've been entering Na'Vi into FieldWorks Language Explorer. So I have a Na'Vi project database with 460 or so lexemes including the prefixes and infixes and suffixes. FieldWorks Language Explorer (FLEx) contains two parsers, one based on XAmple, and the other a Phonological parser, though I haven't got to the stage yet where it can parse Na'Vi, it shouldn't be two hard to do.

I'm not sure that this would be much use for a computational linguistics course, since there is nothing to program.
It could be useful for many other things though. Since I can export to a web-based dictionary program (Lexique Pro) that could be useful for this site as it would be a nice way to present the information that we have about Na'Vi words.

All the best,
dcb

suomichris

Quote from: dcb on January 19, 2010, 04:29:40 AM
I've been entering Na'Vi into FieldWorks Language Explorer. So I have a Na'Vi project database with 460 or so lexemes including the prefixes and infixes and suffixes. FieldWorks Language Explorer (FLEx) contains two parsers, one based on XAmple, and the other a Phonological parser, though I haven't got to the stage yet where it can parse Na'Vi, it shouldn't be two hard to do.

I'm not sure that this would be much use for a computational linguistics course, since there is nothing to program.
It could be useful for many other things though. Since I can export to a web-based dictionary program (Lexique Pro) that could be useful for this site as it would be a nice way to present the information that we have about Na'Vi words.

All the best,
dcb
Cool!  I've been using Fieldworks for my dissertation; do you think you could forward me your project file so I could take a look at it in Fieldworks?

Our Lady of Toast

Quote from: Na'rìng on January 18, 2010, 09:29:24 PM
I thought out how I would program the parser, but my right after I thought that I then thought: it is better to be relatively fluent in the Na'Vi language first before I write up the program.

This is very much a "your mileage may vary" area.  I've found that writing parsers gives me a lot of practice that helps me memorize grammatical rules (especially if I'm pairing it with other kinds of study).  Plus, since it forces you to precisely specify your grammar, it sometimes teaches you that you don't know as much as you thought you did.

On the other hand, it's not much help for learning vocabulary (at least while you're getting it working) and you can end up spending a lot of time on obscure things and not so much on common things.

Swok Txon

wow this will be good

i hope you do a great job
I'm rooting for ya!

i had the same idea to do it in C++ but i realized the complication and my tiny knowledge of Na'vi i just couldn't do it lol

Surprise me!


Na'rìng

When I get around to it I'm going to write it all in C#. and your right: when you are incorporating languages within other languages, you really do learn more about it! Even right now as I'm learning Na'Vi, it is also helping me learn more about my main language. It really is amazing! It is also helping meknow more about the basis of all human languages... I love it!
Eywa ngahu ma smukan.
Eywa'evengä yawne lu oeru.


It kraon. XD!!! (Speak it the Na'vi way)

dcb

@suomichris

Yes I'll be happy to send you the FieldWorks project that I have put together so far. I'm afraid that I have used it for software testing, and it is far from complete. Still it may be a start.

In the Phonemes area, the phonological features of all the sounds have not been completed.
I have two Natural Classes defined by phonological features, it would probably be safest to delete those and add Consonants and Vowels defined by a list of phonemes.

If I can obtain a the dictionary data in a spreadsheet format then I could import it. I'd probably start the project from scratch if I were to do that. The advantage of that would be that the parts of speech for the words would be included.


Toruk Makto

#9
Member Tuiq has a robot client on the IRC server that is doing some simple parsing and translating. I think he wrote the thing in Java, or Whips-N-Chains, or something equally intuitive, so you may want to see what he is up to. It would be totally magical to have a really good parsing/xlater on IRC.

Lì'fyari leNa'vi 'Rrtamì, vay set 'almong a fra'u zera'u ta ngrrpongu
Na'vi Dictionary: http://files.learnnavi.org/dicts/NaviDictionary.pdf

Seze

Sounds like a great project.  It would be really nice if you make your project open-source for others in the community to use.  I've received quite a few requests for a translation section in my iPhone App, and I really dislike reinventing the wheel when somebody else already has one that rolls well.  I wish you the best of luck on this...


Learn Na'vi Mobile App - Now Available

Our Lady of Toast

I've been putting in a lot of time on this without much to show, but hope I've just had a breakthrough.

I was working with the SFST finite-state transducer, and after a lot of work, discovered that the documentation was written by ayskxawng and doesn't match what the program actually does.  When I switched to XFST, I immediately started making a lot more progress.

XFST is free for non-commercial use, so I'm hoping to end up with something that could be used as a module for other larger projects as well.  I will post more as I have more to show.