Eana Eltu: Translator, Dictionary, API and putxìng.

Tuiq · May 28, 2010, 05:22:37 AM

Many times I was asked if there could be a better interface to provide data for offline applications. While the API was only usable with an active internet connection and provided almost all information you probably ever needed, the CSV/TSV files did not. Until now!

It took over 15 years, 15 billion US Dollars, a few cookies and some little animals but now it's here: The SQLinator. It contains almost everything the API could tell you, but it's easily usable even offline! The file is generated every time the PDFs are.

Also, I'm looking forward to next week when I'm going to implement even MORE hats! Stay tuned.

omängum fra'uti · May 28, 2010, 05:27:16 AM

Not everyone is pro hat you know....

Spoiler

Tuiq · May 28, 2010, 05:33:57 AM

You're crazy. Everybody wants hats.

Toruk Makto · May 29, 2010, 04:51:03 PM

...and safety dance!

Sіr. Ηaxalot · May 30, 2010, 07:17:53 AM

I have to report a bug.

Some of the IPÅ seems to be double encoded, e.g. the IPA for za'ärìp is "zaËÊ.æÉ¾.ÉªpÌ", while most IPA look fine.

Tuiq · June 01, 2010, 03:43:07 PM

Not sure, but should be fixed.

Muzer · June 03, 2010, 05:30:22 AM

Hmm - why does the translator want "salivew" rather than (what I'm pretty sure is correct) sivalew?

Tuiq · June 03, 2010, 06:37:59 AM

It defines salew as sal<1><2><3>ew - composed of sa and lew - and applies the composed verb rules. I don't know much about Na'vi, I have somebody to confirm or deny that behaviour.

omängum fra'uti · June 03, 2010, 06:48:09 AM

Lew is an adjective not a verb, and sa isn't even a word, so theres two reasons thats wrong right there. The meaning of lew has little to do with salew.

'eylan na'viyä · June 08, 2010, 02:45:30 PM

Hi Tuiq,
I made a spell checking dictionary based on hunspell which is used by many applications some time ago.
http://forum.learnnavi.org/your-projects-other-resources/navify-software/
But hunspell has a problem: It does not support infixes. So i ask you if you could make a script that exports a hunspell .dic file from the database including the most common conjugations of the verbs.
The file format is easy. There's the file of the momentary dictionary:
http://forum.learnnavi.org/your-projects-other-resources/navify-software/?action=dlattach;attach=4950

The letter after the / defines the word type. I'd work out the replacement rules for all the types.

It would be cool if you could implement this export function.

Tuiq · June 08, 2010, 02:58:23 PM

"most common" is kind of hard to tell - I mean, for example, which are the most used verb times in English? Also, what's that number at the beginning? Size?

Just thinking: There are about, I guess, 2*3*4 combinations of verbs (and Eana does not even support smashed forms yet) - that would create 24 entries for one verb. That's an awful lot (there are about 210 verbs - 210 * 24 is more than all the words we even got yet (~1000)). That's a lot.

'eylan na'viyä · June 08, 2010, 05:45:57 PM

Actually i don't know what the number at the beginning is. I think its an identification code and can be a random number. I just copied it from another file an it worked in all application i used it.

I think these infixes would cover up 99% of daily usage: (there are more, but most are mere assumptions)

position 0,1,2
eyk|äp|awn|us|- , am|ìm|ìy|ay|er|ol|arm|ìrm|*ìry|*ary|*alm|*ìlm|*ìly|*aly|asy|ìsy|iv|imv|iyev|ìyev|irv|ilv|- , äng|ei|uy|ats|-
*=not in the pdf; awn&us need to be treated as adjectives

5*23*5 *210=575*210=120 750

thats an incredible large number but the file only needs to be machine read and writable. And that is still the case. Eg: the belarus dictionary has 1 570 000 lines.

But i think at the moment it would be enough if only 1 infix per word would be recognized.
that would be (4+22+4+1)*210=6 930 entries. Still a lot but it would result in a quite small file for a dictionary.
Or are there other limitations than filesize?

Tuiq · June 09, 2010, 04:55:37 AM

It's the time. It's already taking up to 30 seconds to generate the SQL, the dictionaries, CSV and TSV. If I add many more outputs (or one big...), this could slow down the whole thing very easily very much. If that's the case I'll have to switch the system - dictionaries would be automagically generated at midnight, this may breaks all addons, I'll have to think about that. It's quite complicated.

However, are there any other people/apps that could profit from that format?

'eylan na'viyä · June 09, 2010, 10:48:46 AM

30 seconds are quite a lot for such a small file. Do you know what exactly takes that long? If if its only the database access i think this wouldn't take much longer because each verb still has to be loaded only once.

At the moment i don't know other uses for the resulting file, but a wordlist is a very basic thing which could be useful for things nobody thinks about at the moment.

Tuiq · June 09, 2010, 01:47:14 PM

"small file?" You're kidding, right? NaviDictionary.pdf, NaviCatDictionary.pdf, DictionaryNavi.pdf - three pdfs have to be created out of .tex. This isn't working at super sonic speed. Then there is the giant sql file, the jMemorize file, a CSV file, a TSV file - this takes some time.

'eylan na'viyä · June 09, 2010, 03:57:05 PM

The pdfs are big, thats true. I misread and thought it were 30 sec each file. The dictionary that i made is only 7k. Compared to creating 3 pdfs inflecting some verbs should not take that long i guess.

Edit: i made the replacement table now:

Quote
.adj      A
adj., adv.   A & D
adj., intj.   A & J
adj., n.   A & N
adp.      Z
adv.      D
adv., intj.   D & J
conj.      C
dem.      N
dem., pn.   N
inter.      I
intj.      J
n.         N
n., adv.   N & D
n., intj.   N & J
num.      A
part.      C
phrase      J
pn.         N
pn., adv.   N & D
prefix      Z
v.         V
v., intj.   V & J
""         J

maybe its easier&better to split the type by "," than handling each of the combinations like "n., adv." individually.

I'm also making a script that can generate dictionary packages with installers for every supported application automatically.

Tuiq · June 11, 2010, 04:55:56 AM

It's just, as of now the files are FTP'ed to learnnavi.org, which takes the most time. Transferring one more time will require more time, it's in a time where you can really feel every second you have to wait. Not to mention that aborted scripts could trigger hell.

'eylan na'viyä · June 11, 2010, 06:02:31 AM

you would not need to copy the file somewhere. you only need to trigger this script and it will download the file and generate the packages. It's almost complete now.

Tuiq · June 13, 2010, 05:03:06 AM

All Eana services will be down tomorrow for a few hours/days. The already generated PDF, TSV and SQL files hosted on eanaeltu.learnnavi.org are not affected.

Tuiq · June 22, 2010, 03:14:23 PM

Changed jMemorize format from .csv to .tsv. Same change applies for the filename, jm.csv won't be updated anymore.