Eana Eltu: Translator, Dictionary, API and putxìng.

Started by Tuiq, January 07, 2010, 04:20:17 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Tuiq

#100
Many times I was asked if there could be a better interface to provide data for offline applications. While the API was only usable with an active internet connection and provided almost all information you probably ever needed, the CSV/TSV files did not. Until now!



It took over 15 years, 15 billion US Dollars, a few cookies and some little animals but now it's here: The SQLinator. It contains almost everything the API could tell you, but it's easily usable even offline! The file is generated every time the PDFs are.

Also, I'm looking forward to next week when I'm going to implement even MORE hats! Stay tuned.
Eana Eltu: PDF/TSV/jMemorize

omängum fra'uti

Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Tuiq

Eana Eltu: PDF/TSV/jMemorize

Toruk Makto


Lì'fyari leNa'vi 'Rrtamì, vay set 'almong a fra'u zera'u ta ngrrpongu
Na'vi Dictionary: http://files.learnnavi.org/dicts/NaviDictionary.pdf

Sіr. Ηaxalot

I have to report a bug.

Some of the IPÅ seems to be double encoded, e.g. the IPA for za'ärìp is "zaˈʔ.æɾ.ɪpÌ", while most IPA look fine.

Tuiq

Eana Eltu: PDF/TSV/jMemorize

Muzer

Hmm - why does the translator want "salivew" rather than (what I'm pretty sure is correct) sivalew?
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Tuiq

It defines salew as sal<1><2><3>ew - composed of sa and lew - and applies the composed verb rules. I don't know much about Na'vi, I have somebody to confirm or deny that behaviour.
Eana Eltu: PDF/TSV/jMemorize

omängum fra'uti

Lew is an adjective not a verb, and sa isn't even a word, so theres two reasons thats wrong right there.  The meaning of lew has little to do with salew.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

'eylan na'viyä

Hi Tuiq,
I made a spell checking dictionary based on hunspell which is used by many applications some time ago.
http://forum.learnnavi.org/your-projects-other-resources/navify-software/
But hunspell has a problem: It does not support infixes. So i ask you if you could make a script that exports a hunspell .dic file from the database including the most common conjugations of the verbs.
The file format is easy. There's the file of the momentary dictionary:
http://forum.learnnavi.org/your-projects-other-resources/navify-software/?action=dlattach;attach=4950

The letter after the / defines the word type. I'd work out the replacement rules for all the types.

It would be cool if you could implement this export function.

Tuiq

"most common" is kind of hard to tell - I mean, for example, which are the most used verb times in English? Also, what's that number at the beginning? Size?

Just thinking: There are about, I guess, 2*3*4 combinations of verbs (and Eana does not even support smashed forms yet) - that would create 24 entries for one verb. That's an awful lot (there are about 210 verbs - 210 * 24 is more than all the words we even got yet (~1000)). That's a lot.
Eana Eltu: PDF/TSV/jMemorize

'eylan na'viyä

#111
Actually i don't know what the number at the beginning is. I think its an identification code and can be a random number. I just copied it from another file an it worked in all application i used it.

I think these infixes would cover up 99% of daily usage: (there are more, but most are mere assumptions)

position 0,1,2                  
eyk|äp|awn|us|-     ,     am|ìm|ìy|ay|er|ol|arm|ìrm|*ìry|*ary|*alm|*ìlm|*ìly|*aly|asy|ìsy|iv|imv|iyev|ìyev|irv|ilv|-    ,    äng|ei|uy|ats|-
*=not in the pdf; awn&us need to be treated as adjectives

5*23*5 *210=575*210=120 750

thats an incredible large number but the file only needs to be machine read and writable. And that is still the case. Eg: the belarus dictionary has 1 570 000 lines.

But i think at the moment it would be enough if only 1 infix per word would be recognized.
that would be (4+22+4+1)*210=6 930 entries. Still a lot but it would result in a quite small file for a dictionary.
Or are there other limitations than filesize?

Tuiq

It's the time. It's already taking up to 30 seconds to generate the SQL, the dictionaries, CSV and TSV. If I add many more outputs (or one big...), this could slow down the whole thing very easily very much. If that's the case I'll have to switch the system - dictionaries would be automagically generated at midnight, this may breaks all addons, I'll have to think about that. It's quite complicated.

However, are there any other people/apps that could profit from that format?
Eana Eltu: PDF/TSV/jMemorize

'eylan na'viyä

30 seconds are quite a lot for such a small file. Do you know what exactly takes that long? If if its only the database access i think this wouldn't take much longer because each verb still has to be loaded only once.

At the moment i don't know other uses for the resulting file, but a wordlist is a very basic thing which could be useful for things nobody thinks about at the moment.

Tuiq

"small file?" You're kidding, right? NaviDictionary.pdf, NaviCatDictionary.pdf, DictionaryNavi.pdf - three pdfs have to be created out of .tex. This isn't working at super sonic speed. Then there is the giant sql file, the jMemorize file, a CSV file, a TSV file - this takes some time.
Eana Eltu: PDF/TSV/jMemorize

'eylan na'viyä

#115
The pdfs are big, thats true. I misread and thought it were 30 sec each file. The dictionary that i made is only 7k. Compared to creating 3 pdfs inflecting some verbs should not take that long i guess.

Edit: i made the replacement table now:
Quote
.adj      A
adj., adv.   A & D
adj., intj.   A & J
adj., n.   A & N
adp.      Z
adv.      D
adv., intj.   D & J
conj.      C
dem.      N
dem., pn.   N
inter.      I
intj.      J
n.         N
n., adv.   N & D
n., intj.   N & J
num.      A
part.      C
phrase      J
pn.         N
pn., adv.   N & D
prefix      Z
v.         V
v., intj.   V & J
""         J

maybe its easier&better to split the type by "," than handling each of the combinations like "n., adv." individually.

I'm also making a script that can generate dictionary packages with installers for every supported application automatically.

Tuiq

It's just, as of now the files are FTP'ed to learnnavi.org, which takes the most time. Transferring one more time will require more time, it's in a time where you can really feel every second you have to wait. Not to mention that aborted scripts could trigger hell.
Eana Eltu: PDF/TSV/jMemorize

'eylan na'viyä

#117
you would not need to copy the file somewhere. you only need to trigger this script and it will download the file and generate the packages. It's almost complete now.

Tuiq

All Eana services will be down tomorrow for a few hours/days. The already generated PDF, TSV and SQL files hosted on eanaeltu.learnnavi.org are not affected.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

Changed jMemorize format from .csv to .tsv. Same change applies for the filename, jm.csv won't be updated anymore.
Eana Eltu: PDF/TSV/jMemorize