Tsim Apiak - number converter, translator

Started by Muzer, July 06, 2010, 05:08:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Muzer

Sh4rK has kindly made an online Na'vi number to octal and decimal converter (only that way around for now), which I am hosting on my server. There is a strong possibility of us making other projects together under the same name, including a Na'vi to English translator (like Eana Eltu but open source - hence the name Tsim Apiak ;)), so stay tuned! (Hence why I put "translator" in the title). We have already started work on this, in fact.

The number converter can be found at http://navi.tim32.org/number

If anyone wishes to help with this project, get in touch in this thread or on the LN IRC server. It is in very early stages at the moment, and it is changing far too rapidly, so source code is not currently available to the general public - but rest assured, once we've got our repository sorted properly and brought the code into a reasonably stable state, we will make the source fully available to everyone. The code itself is GPL3, but also with clause 4 of the Apache Licence version 2 (where the two conflict, the clause I mentioned in the GPL trump any opposing statements in the Apache Licence), in order to ensure proper credit is given to each and every person who contributes. The code will be mostly, if not entirely, python 2.6.



When the translator itself appears, it will almost definitely use the fantastic SQL database generated by Eana Eltu, and of course give credit to it. Keep up the great work Tuiq! I've not yet decided whether we'll use your infix data or generate our own, however. We are using the infix data from Eana Eltu as it is correct.



EDIT: Here is the link for the latest parser (but not yet a translator): http://navi.tim32.org/parse



EDIT2: Here is the translator: http://navi.tim32.org/translate
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

'eylan na'viyä

about the number converter:
good that you make this one first. For the other direction there is already one but this was missing.
http://www.learnnavi.org/navi-vocabulary/#numbers

PS: python is a good choice ;)

Sh4rK

I'm glad you like it!

But it's just the beginning, as we are working on a complete translator (like Muzer wrote)!

P.A.'li makto

 Ma tsmukan!
Hát ez aranyos! Még előfordulhat, hogy szükségem lesz rá, és használni fogom! Megér egy karmát!
It's cute! I may need it and use it in the future! Worth a karma!

facebook: soaia leNa`vi

Seze

Glad to see another open-source project coming to the community.  I did the same thing with the Mobile App when I started it, I keep the code closed at first to get a good base layer in place then opened it up for everyone.  Best of Luck!


Learn Na'vi Mobile App - Now Available

'eylan na'viyä

Maybe this saves a bit of work(for the translator): this is what i used for the spell checking dictionaries. In the script that generates all the infix combinations that are included in the dict i used these combinations:
eyk|äp|-   ,   am|ìm|ìy|ay|er|ol|arm|ìrm|*ìry|*ary|*alm|*ìlm|*ìly|*aly|asy|ìsy|iv|imv|iyev|ìyev|irv|ilv|-  ,  äng|ei|uy|ats|-     + awn + us
*: not attested
i think this covers at least 99% of daily usage. To check every possible combination (beyond this selection) would have made the file too big and they aren't attested anyway.

So if you are going to use a different set of infix combinations please let me know. I think it would be best of both scripts share the same grammar rules.

the dictionary generating php-script(you need to create a "cache" folder with rw-rights in the same dir):

Sh4rK

P.A.'li makto:
köszi a karmát ;)

Seze:
Thanks!

'eylan na'viyä:
Thanks, we will definitely use that! (it misses "ìmìy" actually, add that to your script :))

To everyone:
The translator is being developed every day and is getting close to release! :D Check back at times for update!

Muzer

#7
I've finished something that can extract infixes from words. Currently it doesn't work with words like tìtìng (tries and fails to parse it as t<ìt>ìng) but I will fix this - and it DOES work with tìsusiti (tì-s<us>i-ti). Prefixes and suffixes I'll attempt today or tomorrow. Sh4rK, meanwhile, has written something to extract the infix positions from IPA, which is (currently) more accurate than Eana Eltu's data.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

'eylan na'viyä

Quote from: szabot on July 08, 2010, 11:36:33 AM
'eylan na'viyä:
Thanks, we will definitely use that! (it misses "ìmìy" actually, add that to your script :))
ah right! that was one of the tings i did not include. There isn't really a reason against it. But it raises a lot of Questions:
Is there also a ìyìm,amìy,ayìy,... or even combinations of combined infixes like iyev,alm,... (the later is a bit unlikely in daily conversation, but still in theory)?
anyhow im going to add this infix the next time i make a change.

Sh4rK

Quote from: 'eylan na'viyä on July 08, 2010, 02:32:45 PM
ah right! that was one of the tings i did not include. There isn't really a reason against it. But it raises a lot of Questions:
Is there also a ìyìm,amìy,ayìy,... or even combinations of combined infixes like iyev,alm,... (the later is a bit unlikely in daily conversation, but still in theory)?
anyhow im going to add this infix the next time i make a change.

i don't know if the others are correct but ìmìy is in the movie. check on the wiki.

Sh4rK

Tuiq made an update so we don't need to calculate infixes now, because they are correct.

One less thing to worry about :D

Muzer

#11
Right. While I was busy over the past few days, Sh4rK has rewritten most of my code and implemented prefixes and suffixes perfectly, as well as ordinal numbers and partial support for cardinal numbers. It is quite slow, but that's mainly because it's running on a 9-year-old laptop ;). But I think he deserves quite the round of applause!

The only things left are:

* Optimising - DONE by Sh4rK
* Adding translation into the actual target language (relatively straightforward now that we've extracted the Na'vi root and all of the pre/suf/infixes) - DONE - new link for translator, see first post.
* Testing
* The following extra features/bugfixes (EE being Eana Eltu)

TODO for Tsim Apiak

* Cardinal Numbers - not really in EE - DONE by Sh4rK
* Poltxe (rather than pollltxe) - not in EE - DONE by Muzer by adding the ability to add special-case words
* Ignore prefixes in dictionary when they stand alone (which is theoretically, at least, illegal - eg tsa) - not in EE - DONE by Sh4rK
* Tsa- prefix - in EE - DONE by Muzer (was quite a nasty bug that prevented one or two pre/suffixes from working, I've sorted it hopefully)
* Pak, to, possibly some more - not in EE - to added as a suffix/adposition, but not yet as a word in its own right. We might work on our own mini word list for things like this that aren't in the SQL. Pak moved sections by Taronyu so it's now in EE. - DONE by Muzer by adding the ability to add special-case words
* Ordinal numbers - not really in EE - DONE by Sh4rK.
* ftxey... fuke special case - not in EE - half-implemented by Taronyu - fuke now works as it is a word in its own right, but ftxey is still always "choose".
* Fix "a" on its own - in EE - DONE by Muzer
* Lenition of prefixes - not in EE - DONE by Muzer
* Implement more readable, multi-language terms for affixes - Imperfective rather than IMPF., for example
* Add multi-language support for the hardcoded wordlist
* Make it more strict when checking affixes (so noun affixes can only exist on a noun, for instance, and that mutually exclusive groups of affixes (eg case endings) can't be applied too many times)
* Tsar, etc.



Here is the new link: http://navi.tim32.org/parse



Please post any bugs whatsoever here and I'll add them to the TODO list.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Muzer

#12
Just fixed the link, and Sh4rK fixed another bug


EDIT: Oops, I "fixed" the wrong link and actually broke it. Changing right now...
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Muzer

The translator now works! Sh4rk and I have been working pretty hard to implement some more features and fix some more bugs, and this is the result! It still isn't anywhere near final, however - it's still painfully slow (admittedly, mostly caused by the slowness of the server itself), and there are still plenty of things we need to fix (see two posts above), though this list is shrinking by the day ;)


http://navi.tim32.org/translate



Please test, leave any feedback (good/bad/whatever). I hope Sh4rK will agree that around now would be the best time to release the source - next time I see him, I'll discuss it and hopefully we can get it all out. Then maybe someone else with a faster server can host it! [In my dreams :P]

And remember - when we do release the source, the whole point is that people can improve it. However, it's more fruitful, and in everybody's best interest, to send any changes back to us that you make, rather than creating a fork - then we can all be working on the same code base, which is always a good plan. Of course, there won't be anything to stop you from making a fork and hosting it yourself, as long as you also release the source, but just consider sending your changes to us first. After all, if we like your changes, we'll probably just take them out of your fork and put them in ours anyway, so forking it would be twice as much work for you with pretty much the same outcome. Of course, offering to mirror it and forking it are two different things - mirrors are more than welcome!
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Muzer

#14
Source code!

Checkout here:

svn://tim32.org/navi

Browse here:

http://websvn.tim32.org/


(if you're wondering why there are so many revisions: the way we have it set up, in order to test something, we have to commit it.)



If you want to be able to commit code to the svn, let me know and I'll give you an account.




EDIT: Sh4rK has made it many many many times faster!
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Sh4rK

#15
Did you notice that sometimes you call me Sh4rK, sometimes szabot?

lol

EDIT:
Quote from: Muzer on July 19, 2010, 05:28:23 PM
Sh4rK has made it many many many times faster!

With your idea :P

EDIT 2:
I am now officially Sh4rK, so please correct every occurence of "szabot" to "Sh4rK".

Muzer

We just added a few things to the translator, crossing all of the original items bar one from the TODO list. I have added more, however, so we still have work to do ;)


We now have hardcoded a list of extra words that may be needed - Na'vi, tawtute and to are the only full words on it so far. If there are any more words not in Eana Eltu that you want implemented,  let us know and we'll add it!
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Carborundum

#17
I'd like to know what the long and short-term goals for this project are, so that I can provide meaningful testing.
Is it ultimately supposed to be a full scale translation engine, like google translate and what-not? Or is merely meant to find and extract affixes?
As you know, it currently accepts affixes that it shouldn't, like aykameie --> see, see into, understand, know (spiritual sense)-LAUD.-PL.. Is this considered a known bug, or do you need more examples of it doing it?

Anyhow, great work so far!
We learn from our mistakes only if we are made aware of them.
If I make a mistake, please bring it to my attention for karma.

Tsamsiyu92

Your number translator is having problems with like zamevol

Muzer

@Carborendum: Eventually I am hoping to be able to translate into fluent English - but that is very much a long-term goal. As for your bug - yes, it is known, and yes, it will be fixed - but if you find any valid words that are incorrectly interpreted due to the bug, let me know and I'll add a temporary workaround.

@Tsamsiyu92: Interesting - I'll have a look when I get home. The number stuff is Sh4rK's, but I might be able to figure it out.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive