Eana Eltu: Translator, Dictionary, API and putxìng.

Started by Tuiq, January 07, 2010, 04:20:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Kä'eng

I found some apparent incorrect infixing in the .sql. Several words have multiple syllables after the <3> position:

fkarut, ftia, ska'a, sngä'i, spe'e, steftxaw, tsranten, tswayon, fkxake, sleyku, spule, srese'a, steyki, tsngawvìk, tsre'i, stä'nì

Also, the words kllfro' and kllkxem have <1> and <2> in the first syllable when they belong in the second (see http://forum.learnnavi.org/language-updates/misc-answers/)
Ma evi, ke'u ke lu prrte' to fwa sim tuteot ayawne.
Slä txo tuteo fmi 'ivampi ngat ro seng, fu nìfya'o, a 'eykefu ngati vä', tsakem ke lu sìltsan.
Tsaw lu ngeyä tokx! Kawtu ke tsun nìmuiä 'ivampi ngat txo ngal ke new tsakemit.
Ha kempe si nga? Nì'awve, nga plltxe san kehe. Tsakrr, ngal tsatsengti hum!

Tuiq

There is no such thing as "incorrect infixing in the .sql". Either it's in the whole system or it's nowhere.

kllfro' and kllkxem are "irregular" in the sense that they have a root position which is not recognized by Eana Eltu as such. The composed verb rule only applies for THING + verb, and since kll is not in the dictionary, it can't be detected correctly. I doubt that it will be in the dictionary again because Taronyu has recently thrown out the roots, so this will stay invalid quite a bit.

The problem with all these words is most likely that they are not correct splitted apart. For Eana, it's one big ass syllable. I'd be glad if you could list all these words with their syllables.

Edit:
I kind of "fixed" it. It's now  tsw<1><2>ay<3>on for it. I have no idea if that is correct (tsw?). If it is I'll update the sql files (they might be updated with that format soon anyway).
Eana Eltu: PDF/TSV/jMemorize

Taronyu

If you send me a list of those, as well, I may be able to ask Frommer if he could make them official words.

Tuiq

I think all these verbs you listed are now working properly. The SQL file is updated. About these kll-things, there's not really anything I can do. I do not want to include extra data for irregular verbs because it worked without as of yet.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

Please note that the SQL file is now under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. The URL has changed, it's now NaviData.sql, not NaviDictionary.sql.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

With a few bugs left, Eana is now capable of things like eltu si.
Eana Eltu: PDF/TSV/jMemorize

omängum fra'uti

Quote from: Tuiq on June 26, 2010, 06:22:39 AM
I think all these verbs you listed are now working properly. The SQL file is updated. About these kll-things, there's not really anything I can do. I do not want to include extra data for irregular verbs because it worked without as of yet.
If you don't consider being correct important, why have the information there at all?  Finding the infix positions for most verbs is easy, where it's important having the marking is for irregular verbs, which is exactly where you seem to feel it isn't worth dealing with.

THIS is the reason having multiple dictionaries can be a good thing.  If people go to wikibooks or the LN wiki, they will be able to find or figure out the correct infix positions.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Taronyu

Quote from: omängum fra'uti on June 30, 2010, 05:29:38 PM
Quote from: Tuiq on June 26, 2010, 06:22:39 AM
I think all these verbs you listed are now working properly. The SQL file is updated. About these kll-things, there's not really anything I can do. I do not want to include extra data for irregular verbs because it worked without as of yet.
If you don't consider being correct important, why have the information there at all?  Finding the infix positions for most verbs is easy, where it's important having the marking is for irregular verbs, which is exactly where you seem to feel it isn't worth dealing with.

THIS is the reason having multiple dictionaries can be a good thing.  If people go to wikibooks or the LN wiki, they will be able to find or figure out the correct infix positions.

I have those marked. You know that, right?

omängum fra'uti

Quote from: Taronyu on June 30, 2010, 06:43:39 PM
I have those marked. You know that, right?
I wasn't commenting on your stuff there, rather Tuiq's generated stuff.  And given that you DO have it marked (And have for awhile, yes?  Just recently changing HOW it was marked), it seems like that would be simple to use that in an automated fashion to generate correct infix positions.  Since all the irregular forms from compound verbs have the infixes in the last syllable, it would be a simple matter of counting the marks in the IPA and using that to determine whether to mark one or two locations.  That is certainly a lot easier than trying to compare every word against every other words to find compounds.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Muzer

Woah, woah, easy there - counting the marks in the IPA? You'd essentially have to write some Na'vi TTS software (minus the actual voice synthesis) to figure out how it would be pronounced before you can really start to parse the IPA. Unless Tuiq has some ingenious scheme in mind...


(And if someone DOES write that, I'd be happy to take the code and turn it into a full-blown speech synthesis thing ;))
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Taronyu

Quote from: Muzer on July 01, 2010, 08:27:41 AM
Woah, woah, easy there - counting the marks in the IPA? You'd essentially have to write some Na'vi TTS software (minus the actual voice synthesis) to figure out how it would be pronounced before you can really start to parse the IPA. Unless Tuiq has some ingenious scheme in mind...


(And if someone DOES write that, I'd be happy to take the code and turn it into a full-blown speech synthesis thing ;))

No? I have the IPA written in the dictionary.

Tuiq

Quote from: omängum fra'uti on June 30, 2010, 05:29:38 PM
Quote from: Tuiq on June 26, 2010, 06:22:39 AM
I think all these verbs you listed are now working properly. The SQL file is updated. About these kll-things, there's not really anything I can do. I do not want to include extra data for irregular verbs because it worked without as of yet.
If you don't consider being correct important, why have the information there at all?  Finding the infix positions for most verbs is easy, where it's important having the marking is for irregular verbs, which is exactly where you seem to feel it isn't worth dealing with.

THIS is the reason having multiple dictionaries can be a good thing.  If people go to wikibooks or the LN wiki, they will be able to find or figure out the correct infix positions.

I never said anywhere that my data is correct. As you may have noticed, Eana Eltu is not a dictionary. Everybody is free to use whatever they want. I do not force people to use Eana Eltu's SQL Data or anything else.

Quote from: omängum fra'uti on June 30, 2010, 07:29:04 PM
Quote from: Taronyu on June 30, 2010, 06:43:39 PM
I have those marked. You know that, right?
I wasn't commenting on your stuff there, rather Tuiq's generated stuff.  And given that you DO have it marked (And have for awhile, yes?  Just recently changing HOW it was marked), it seems like that would be simple to use that in an automated fashion to generate correct infix positions.  Since all the irregular forms from compound verbs have the infixes in the last syllable, it would be a simple matter of counting the marks in the IPA and using that to determine whether to mark one or two locations.  That is certainly a lot easier than trying to compare every word against every other words to find compounds.

It's not marked properly. It's marked just by adding "(foo)" to the verb. I can hardly count that was "it's marked", because that would only need to exceptions (where something is put in brackets, any comment, unrelated to infix stuff). It was and it is way too much effort for me to parse that. It's not worth it, simply.

Also, I have no idea of IPA. Or TeX. And I'm not going to learn anything more about it than is required to drive it at this state. By combining them to compound verbs, it's much easier to handle them. And it works for 99% of the verbs. It's not a lot easier. It's way easier to just for-each all words again and check if it is there once at initialization. Doing IPA IPA stuff is bad. IPA IPA!

I'm honest: I'm having different projects now. Older projects, way more interesting projects. So please, if you say the next time that anything here is not correct and you don't like it and all, please remember WHY EE was created. Eana Eltu was created to show Tsepxor how easy it would be to have navi->english and english->navi word lookup (something nobody uses eana for today anymore). Having a rush of interest in parsing and creating languages in the beginning of this year, I extended it a little bit. The SQL extension and the "eltu si"-thing were most likely the last big changes that happened.
Eana Eltu: PDF/TSV/jMemorize

Muzer

Quote from: Taronyu on July 01, 2010, 09:28:45 AM
Quote from: Muzer on July 01, 2010, 08:27:41 AM
Woah, woah, easy there - counting the marks in the IPA? You'd essentially have to write some Na'vi TTS software (minus the actual voice synthesis) to figure out how it would be pronounced before you can really start to parse the IPA. Unless Tuiq has some ingenious scheme in mind...


(And if someone DOES write that, I'd be happy to take the code and turn it into a full-blown speech synthesis thing ;))

No? I have the IPA written in the dictionary.

Yes, but figuring out which parts of the word translate to which parts of the IPA sounds, to me, to be non-trivial.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

omängum fra'uti

#133
Quote from: Muzer on July 01, 2010, 11:38:15 AM
Quote from: Taronyu on July 01, 2010, 09:28:45 AM
Quote from: Muzer on July 01, 2010, 08:27:41 AM
Woah, woah, easy there - counting the marks in the IPA? You'd essentially have to write some Na'vi TTS software (minus the actual voice synthesis) to figure out how it would be pronounced before you can really start to parse the IPA. Unless Tuiq has some ingenious scheme in mind...


(And if someone DOES write that, I'd be happy to take the code and turn it into a full-blown speech synthesis thing ;))

No? I have the IPA written in the dictionary.

Yes, but figuring out which parts of the word translate to which parts of the IPA sounds, to me, to be non-trivial.
But you don't need to.  It's a specific character that Taronyu uses to represent the infix position (Currently two actually, but always either one or the other).  Simply finding the occurrences of that character tells you whether it appears once or twice.  If it appears once, then in 100% of the cases, it appears as the last syllable of the word its in.  (Cases like tìng nari need it in the first word not the second, but its still in the final syllable of the word and you can figure out what word it is in by looking at the spacing.)

And BTW, quick half attempt at a check....

There are 5 verbs that you have incorrectly marked as 1 infix position
There are 16 verbs that you have incorrectly marked as 2 infix positions

In total from the word list I got that from, there are 214 verbs.  That means that around 10% of the verbs are wrongly marked for that data.
BUT...  Of those 214 verbs, only 118 of them are multisyllabic.  Monosyllabic verbs there's no possible way for it to be wrong, so looking at JUST the multisyllabic verbs, that's roughly 18% of the verbs with any sort of question wrongly marked.

I'd hardly call that "working for 99% of the verbs".

So now you have something you know is incorrect (Though seem to dismiss the scale of the error), no interest in correcting it, and no interest in allowing others to correct it.  And I could probably write code to do it correctly by the method I outlined above in less time than it spent to write up this response.  It's not like it's a huge undertaking.

And I'm not talking about anything EanaEltu does as far as parsing.  I'm talking about the data you export.  People will take that for a reference either directly or indirectly from inclusion in other projects, and the result is an incorrect reference.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Tuiq

Writing something is not wasting my time. Also, the exported data is parsed by Eana Eltu first, so if the data is wrong, the whole system is aswell. Feel free to code your own projects, I'm the last one going to stop you. I'd be glad to be able to do something completely different. I'm not interested in the language at all, neither am I in IPA. And it's not the only thing not working right now, for example, the newest IPA has some TeX in it. Like I said before, I'm not going to release any source. Using the data given in the sql you can do your own thing - the words are included, the IPA is included - do your own infixes thing. At the moment, there are far more important things than a SQL files for.. about 50 people.
Eana Eltu: PDF/TSV/jMemorize

Muzer

But that's rather difficult to do, considering your project is not open-source - so unless you make it so, you are essentially the only person who can write something that is guaranteed to slot in to the current system easily without a lot of fiddling.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Tuiq

No. I said I'm not going to open it, and that's the way it will stay. Not only that you could do exactly nothing with the source code since the database will never be open source (I mean, there's a reason we don't print TeX anymore, hm?) - you would have to do that with Debug data or something. Also, the code isn't documented at all. It's a huge file, the main parsing unit. We're speaking of a 44 KB file, containing 1084 lines of code. The demon, the connection to the demon and all that stuff has 500 more. And the actual dictionary creation (which also creates the sql) has 820 lines.

That's about 2400 lines of Perl. Does anybody here even speak Perl? Test yourself:

Code (Perl) Select
$INFIXES{s0} = $INFIXES1 = join '|', map { quotemeta($_->{inf}) } grep { $_->{publish} } @{$INFIXES{0}};
Eana Eltu: PDF/TSV/jMemorize

Muzer

Well, if you're not going to make it free software, you can't use the "write it yourself" excuse. It's just impossible.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Tuiq

I can. You can write a better infix script using the LaTeX and nav. Both is in the SQL.
Eana Eltu: PDF/TSV/jMemorize

Muzer

Yes, but if someone does that, the chances of it being included in EanaEltu are much slimmer as the code is likely to be pretty incompatible, or even in a different language if they didn't think it would ever be implemented.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive