Eana Eltu: Translator, Dictionary, API and putxìng.

Started by Tuiq, January 07, 2010, 04:20:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Tireapamtseo

This is such a wonderful tool!  I will most definitely be using this, especially once I better understand grammar (and can tell it more accurately what/how to translate, etc.). 
Irayo!!!
Ewya, srung si oeru fte tsun kivame a fì'u ke tok kea keyey mì sìngop ngengayä,
Tírey srereu nì'aw.

Alìm Tsamsiyu

This thing is fricking awesome.  Good job dude.

One thing that needs to be changed that I noticed is that the (newer) different forms of the suffixes (like the dative -r and -ur along with the already known -ru) do not translate, even in advanced mode.

I also noticed lenited forms are sometimes missed, I tried " eylanä " for example, and it could not translate it.  It also doesn't translate " eylan " as "PL-friend (lenited)" as it should.

It seems to have problems with shortened lenited forms in general.
Oeyä ayswizawri tswayon alìm ulte takuk nìngay.
My arrows fly far and strike true.

Tuiq

It doesn't. It's caused by the out-of-dateiness caused by my laziness. However, these times are over. I've coded a 'server' for EanaEltu; the website and the bot query now the same database. They will always be up to date - not like it was until now, where the bot was newer. Also, it shouldn't have any problems with eylanä anymore (although, that's not a bug, it's not implemented, that -LENITED does not automagically say -PLURAL (because it's possible that mì lenited, which would be a check..)).

Also, added possibilites to translate it to Dutch, thanks to leofox. I, leofox and taronyu are working on a better system to translate his dictionary until then I won't accept any more languages (but you can, of course, begin to translate.. if it doesn't bother you you may have to redo everything if you want your language to be included in EanaEltu).

Anybody interested in JSON-ing the EanaEltu server directly? Yes, it's possible. You could use the EanaEltu engine and code your own interface around it.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

I am glad to announce the newest feature on EanaEltu: TRANSLATIONS. It will be possible to translate taronyu's dictionary to the language of your choice. Keeping it uptodate will be very easy since it'll display which of your inputs are "outdated".

As of now, translations for Dutch and Hungarian are in progress. I'm trying to do it for German (test and "haha lol" purposes). I'm looking forward to release it this weekend or next week. If you are interested in translating it send me a PM, I'll send you more informations. You really need a lot of time to keep this up to date. Please make sure you are capable of doing this.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

#44
long time since I've written something. However, short version:
- The translation system is working. Translations for Hungarian, Brazilian Portuguese, Dutch and German are in progress. See post above for more information.
- Added -uy- (-CER, honorific/ceremonial infix)
- Fixed a bug with all adpositions/prepositions containing ì, ä or '
- Fixed stupid "luyu is lu-yu" bug - instead it's "l<uy>u" now.
- "Nightly" generated PDF, TSV and CSV. Any important format I'm missing? If you use one of these in your applications, please give credit to me and Taronyu (the translators should be honoured too if you're using a localized version).
- navi_translate.pl has nicer strings now.
- Using short English terms according to the dictionary.

If somebody wants a different format for their program or API access (asking EanaEltu in realtime for translations, word lookups or Na'vi numbers), PM me.
Eana Eltu: PDF/TSV/jMemorize

Ftiafpi

Thanks for all your hard work on this, I (and obviously many others) use this every day.

Tuiq

#46
Thank you.

The EanaEltu API has finally opened its gates. Attached is a (hopefully working) version of ShoutNavi.pm, the Perl implementation to communicate with the API. In short: http://dvi.clonk2c.ch/navi_api.pl with JSON, where data=JSONDATA is. For implementation details take a look at the Perl file, it's not documentated but quite small and the design is really easy to understand.

Also attached is ShoutNaviTest.pl which allows you to take a look at a sample sentence, 'Eywa ngahu, ma smukan sì smuke'. You can not change this sentence, it is hard coded in the API. If you like the API and want to implement Eltu in your program PM me for a free unlimited API key.


So, yes, I hope people can use it.

Update

I lowered the restrictions for the public key a bit. It's now possible to use askLookup along with askTranslation with the beta key. Also, introduced askLanguages to get the (available) language codes.

And, now I'm going to explain /how/ it works.

Every request to http://dvi.clonk2c.ch/navi_api.pl has the same format:
It's POST. And there's a field called data, containing the json string.

JSON:
The whole thing is a HASH.

  • key: This is the key you've got from me (or the demo). It's to identify the application. STRING.
  • request: This is the request you want to do (lookup / translate / num / lcs). STRING.
  • data: Not required for lcs. The data: This is the string you want to translate. STRING.
    For num, this is the number. It's assumed as a decimal number, except it starts with 0 or 0o ("zero", "zero o").
    For translate and lookup, this is the word/sentence to look up.
  • langs: Only required for lookup. Array containing the language codes (2 to 5 chars, eng, nav, nl, ptbr, ru, de... use lcs to get a list). ARRAY
  • exact: Only required for lookup. Tells Eana to look *exactly* for this word. 0/1.

The answer is always a hash. It has a field called 'status' which is either 'Success' or 'Failure'. If it has failed there should be (but don't rely on) a 'message' field saying what went wrong. The data field may contain any type, for example, translate and lookup return arrays, where lcs returns a hash and num a simple string. I recommend simple playing around with it. It's not /that/ hard even if the names are a bit cryptic. They ALL make sense.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

Update. New navi_translate.pl, navi_dict.pl and some work on the API. navi_dict.pl looks a bit.. uh. complex. untidy. Any comments on that?
Eana Eltu: PDF/TSV/jMemorize


Tuiq

Because something is badly messed up in the daemon's code, I had to shut down the server (including the bot, the API and everything else). I apologize and will try to solve this problem this afternoon.
Eana Eltu: PDF/TSV/jMemorize

Mithcoriel

Oooooooooh... I wish I'd found this thread sooner. XD I wanted to make a translator too! Well, guess that's redundant now..
My respect, as I know how hard it is to program this stuff. O_O

By the way, is there a downloadable version of this?

Anyway, I played around testing some words. Here's some possible buggies:
- English to Na'vi: doesn't recognize the word "interpret", and also not "translate" (should translate to "ralpeng")
- I would recommend making it tolerant to i=ì and à=a and the like
- English to Na'vi: doesn't recognize the word "people". (should translate as "na'vi" or maybe "aysute")
- Na'vi to English: doesn't recognize the word "na'vi"
- You should also teach it animal words, like Palulukan, Toruk, Ikran...
- I entered "prrte" (was something Taronyu said, and I got only half the word. Forgot how it went.) and the result was: porrte (po) (Contradicted. Dative. Dative. Accusative. Feminime.) I would make sure it can't take the Dative twice. (Also, a feminime particle after a dative particle?). I think I would have solved that with some boolean (Taken Dative yet?true or false. Taken Plural? true or false) (same thing with ayayayaypun = pxun = arms)
(-what does pivängko mean? Someone here said it, but the machine doesn't recognize it.)
-srayung, which should be translated as the future form of "help", is translated as "sirayung (si) (Contradicted. Dative. Adjective attributed. Derived nominative agent noun. Pronounal clusivity.)"

I might be testing this thing a bit more often. ^^ Since I'm already in the mindset of optimizing-translators etc.


Ayoe lu aysamsiyu a plltxe "Ni" !
Aytìhawnu ayli'uyä aswok: "Ni", "Peng", si "Niiiew-wom" !

Tuiq

- English to Na'vi is really, really really out dated and not supported anymore.
- Tolerant was once. Caused many problems. No, not really. There's a difference between ì and i, see si and sì for example. "helo" is not interpretated by computers as "hello", is it?
- Proper names are not included anymore. (Na'vi, Nantang)
- Giving it some kind of "intelligence" to avoid such things (like -MASC-FEM, or like you said DAT twice) is one of the "do it some day when I got bored" 2do list. Please let the implementation be my problem. It's quite confusing anyway. :D
- p<iv><äng>ko, would be my guess - actually, pko would be an invalid word (imho), so it can just be one of these magic words (like kìyevame) which are not included in the dictionary yet.
- Actually, it's "srung si", so you'd have to say srung sayi (or something like that). It's the second word, Eana can not handle "torn apart" verbs yet (although it DOES recognize that 'srung si' itself is a compound verb).

And no, there's no downloadable version of this, only the API. The first post explains why I won't release the source.
Eana Eltu: PDF/TSV/jMemorize

Mithcoriel

QuoteThere's a difference between ì and i, see si and sì for example. "helo" is not interpretated by computers as "hello", is it?

I know there is a difference. But point is, it's very likely that people will confuse the two. Someone might misspell a word in a forum post, writing it with i instead of ì, and then another person wanting to translate it will be unsuccessful.
Filtering out "helo" as "hello" is difficult cause it requires the computer to be intelligent and figuring out there's an extra letter there. I'd say that's why computers don't filter it out, not cause the makers assumed it wasn't necessary for it to be filtered out. But i and ì are more easily implemented.
Ayoe lu aysamsiyu a plltxe "Ni" !
Aytìhawnu ayli'uyä aswok: "Ni", "Peng", si "Niiiew-wom" !

Tuiq

It's not really. And it really DOES cause problems. Believe me, I had this implemented some time ago. It just replaced [ìîíï] with i. same with ä. It caused only problems. You also have to imagine what's happening if you threw brbttbtbasbwrbnasd into the translator and it says it's "Hello, my son" - people won't learn anything. If the translator can't translate it, they may ask for feedback what's wrong. And THIS is what causes learn effects.
Eana Eltu: PDF/TSV/jMemorize

Mithcoriel

Oh, sure, if it replaced multiple ì with i, then I can understand why you removed it. I just meant if there was a way to replace single ì, then it would be useful. (And that's strange...cause there must be some way to do that.)

QuoteYou also have to imagine what's happening if you threw brbttbtbasbwrbnasd into the translator and it says it's "Hello, my son"

That's why I was suggesting the "no multiple datives" and similar suggestions. That's exactly what causes that kind of thing.
Kaltxi, on the other hand, is a na'vi word to me.
Ayoe lu aysamsiyu a plltxe "Ni" !
Aytìhawnu ayli'uyä aswok: "Ni", "Peng", si "Niiiew-wom" !

Tuiq

I just think, there's a reason you write words in the way you do it. For example, "Apfel" and "Äpfel" are not the same in German - one is singular, the other is plural. There are more stupid examples in French, for example.

Look, tell me a way to handle this. Adding an extra checkbox? What should it do then? And for the best example, what if somebody types "si". What do I expect?
Eana Eltu: PDF/TSV/jMemorize

Notorious

Irayo! Very nice!
Adding a comment to find my way back ^^

Mithcoriel

Quote from: Tuiq on March 06, 2010, 12:25:06 PM
I just think, there's a reason you write words in the way you do it. For example, "Apfel" and "Äpfel" are not the same in German - one is singular, the other is plural. There are more stupid examples in French, for example.

I know it's very different in german. But for english-speaking people writing Na'vi, an a and an ä and an à are probably close to the same thing.
It's true, maybe the ä really is a stronger case, so maybe the translator should be tolerant of ì/i, but not ä/a

Quote from: Tuiq on March 06, 2010, 12:25:06 PM
Look, tell me a way to handle this. Adding an extra checkbox? What should it do then?

You mean how to implement the i/ì tolerance? The way I did it, in the function that compares words ("does the input match this word in the wordlist" or whatever), it first replaces all the ì in the inputword with i (in a temporary variable of course), and the same thing with the word in the vocab list.
(just like you do when you want it to ignore the case, a.toLowerCase == b.toLowerCase, only here it's a.replace(i,ì) == b.replace(i,ì) )

Quote from: Tuiq on March 06, 2010, 12:25:06 PM
And for the best example, what if somebody types "si". What do I expect?

An excellent point. I thought there were no words where replacing an ì with i changes the meaning. But "si" is. (hopefully the only one?)  I, personally, would just tell it to be tolerant unless the word is si/sì.
Ayoe lu aysamsiyu a plltxe "Ni" !
Aytìhawnu ayli'uyä aswok: "Ni", "Peng", si "Niiiew-wom" !

Seze

I'm impressed with the progress on this.  I've been using the latex files Taronyu was pushing out, but I've recently found that they have been deprecated and that this project is the replacement.  One thing that would really help me out in using this for my iPhone App is to separate the part of speech as its own field in the CSV/TSV files.  I am pretty sure that I can easily modify the files myself, but it would be really nice to have this in the files to begin with.  Also, is there any way to check the version number of the CSV/TSV files?  One of the features that was requested for the iPhone App was a way to download updates to the dictionary without creating new versions of the App.  Perhaps either through your API or a basic text file that has the version number in it. 


Learn Na'vi Mobile App - Now Available

Tuiq

You can use the power of HTML itself to get the "version":
Last-Modified: Mon, 01 Mar 2010 11:28:27 GMT


Check out the HTTP Headers. If you HEAD the file it sends you all the headers, but not the file itself. So, my way to do it:

- HEAD the .csv
- If the saved Last-Modified != this Last-Modified: GET it.

This way you're saving quite a bit of traffic.


To be honest, I don't understand what you're meaning with "part of speech". Isn't (v.), (n.) already added in the English/localized column? It should be quite easy to extract it when you're parsing the file, imho.
Eana Eltu: PDF/TSV/jMemorize