Navi global dictionary

Started by Sіr. Ηaxalot, February 13, 2010, 02:28:52 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Sіr. Ηaxalot

Kaltxì!

As you might have seen, I have been working on a dictionary application. I have realized that I need to do a complete remake of my Editor application, which is far to glitchy at the moment. I was thinking of making this new editing system with MySQL and PHP, for easier editing of the database. That would mean that I would get a Web based editing GUI.

However, with a system like that, it would be possible to write converters, to (almost) any file format (CSV, TSV, BEF, JSON requests, maybe even PDF (never done that before)). I figured that there would probably be more people that would be interested in a solution like this.

A global solution like this would however require that people that know the language better would contribute to editing and maintaining the database, and of course, a solid planning of how to store the data and retrieve it from the database.

What I have done so far is an example database, and an example row in the database.

Table:

Row:


The thought behind the columns is that the navi field should contain the word, with all possible gramatic variants,  which is stored withing () and separated with |, which is pure Regex code. That will make it possible to simply match the input string using to the navi field using Regex. I have also planned on implementing "shortcuts" when entering the word, so you wont need to enter the (|ìm|ay|er|ol|ìrm) part for every word. The most of the other fields are pretty self explaining. The fonetic field could contain underlining information.

To do something like this I need to have controls on who is inserting/editing words, to avoid spamming in the database. To solve this issue I have planned to have a few trusted users, which can approve words submitted by other users.

My thought was that this might come in handy while the language is developing, especially for people that's writing applications that depends on a dictionary, so that instead of everyone creating their own dictionaries, they can just update from this one.

wm.annis

I would strongly recommend you add another table for examples.  Good dictionaries don't just give you a few words for definition, but give phrases and sentences to show you how the word is really used.  Take a look at this, for a venerable example.

'eylan na'viyä

another one for related words or synonyms would be cool. and maybe one for "false friends" (words that you easiely mix up as a beginner).

and if this database should handle media too, a spoken audio link(or multiple links, so that you can hear it from 2 different voices ) and maybe a image link(s) would be nice.

just an idea what fields might be usefull, filling them is another story.

Tuiq

EanaEltu does something quite similar. It stores its data in a SQL database (5 tables, excluding the user and normal forumsystem) which is used translation service, the ("new" and "beta" and WIP) Dictionary and for creating all these PDFs. Actually, I could easily generate TSV/CSV for jMemorize/whatever, I'd just have to talk with Taronyu about that. Internally, the websites/the bot communicates via JSON over a UNIX socket with the "eana eltuyä leeylana vrrtep" (a Perl demon). I could open the API over HTTP if there is any need (nobody asked so far, so I assumed it's not necessary.)

Edit: Woops, links were wrong.
Eana Eltu: PDF/TSV/jMemorize

Miguel APG

some one knows how to say "think" in Na'vi?

irayo, eywa ngahu
ìlä ikran, oe uniltìranyu lu toruk makto

Tuiq

[09:42:08] <Tuiq> !eng think
[09:42:09] <EanaEltu> Tuiq: think (v.) - fpìl
Eana Eltu: PDF/TSV/jMemorize

Sіr. Ηaxalot

Quote from: Tuiq on February 14, 2010, 01:27:09 AM
EanaEltu does something quite similar. It stores its data in a SQL database (5 tables, excluding the user and normal forumsystem) which is used translation service, the ("new" and "beta" and WIP) Dictionary and for creating all these PDFs. Actually, I could easily generate TSV/CSV for jMemorize/whatever, I'd just have to talk with Taronyu about that. Internally, the websites/the bot communicates via JSON over a UNIX socket with the "eana eltuyä leeylana vrrtep" (a Perl demon). I could open the API over HTTP if there is any need (nobody asked so far, so I assumed it's not necessary.)

Edit: Woops, links were wrong.

Oh, I didn't know that actually. Maybe it would be better to use that system, that already exists instead. If I knew his database structure I would be able to write a converter that could generate files for my dictionary application (which uses a custom format).

It might be good to keep focus on the old app, I feel like I'm just abandoning and starting on another project,

Tuiq

If you give me the format of your database I'll take a look. Since whole EanaEltu is written in Perl it's the best thing if I add/change anything by myself.

By the way: I got Taronyu's permission to create .csv/.tsv versions of the database. So, stay tuned.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

CSV and TSV exported, also there's an API for Eana now.
Eana Eltu: PDF/TSV/jMemorize

Seze

Quote from: Tuiq on February 15, 2010, 09:57:11 PM
CSV and TSV exported, also there's an API for Eana now.

Where does one find these files at (CSV, TSV)? 


Learn Na'vi Mobile App - Now Available

Sіr. Ηaxalot

My datafiles uses an encoding type that I've lend from the BitTorrent specification, that is called BEncoding

Each word is defined in a list, wich contains strings for the word in Na'vi, fonetic writing, the word in English, type of speech and I plan to implement a list with example sentences.

Example: Without examples
l6:Kaltxì8:kal.'t'ɪ5:Hello2:n.lee

List {
  String(6) "Kaltxì"
  String(8) "kal.'t'ɪ"
  String(5) "Hello"
  String(2) "n."
  List {} // The examples, to be filled...
}

Below is the same string, but filled with examples.

l6:Kaltxì8:kal.'t'ɪ5:Hello2:n.l18:Kaltxì, ma tsmukan17:Kaltxì, ma tsmukeee

List {
  String(6) "Kaltxì"
  String(8) "kal.'t'ɪ"
  String(5) "Hello"
  String(2) "n."
  List {
     String(18) "Kaltxì, ma tsmukan"
     String(17) "Kaltxì, ma tsmuke"
  }
}


These lists are stacked after each other, with no spacing between the lists.

Hope that made some sense ;) Otherwise, it's not to much of a problem for me to write an CSV to my fileformat. (but I would like fonetic writing if you have that stored?)

Tuiq

#11
Well, this isn't much of a problem. Seems to be quite easy to write. Although I'd have to convert the IPA (unless I and Taronyu "upgrade" that to UTF8 soon). Not possible. :effort:. About the examples, well. We could talk about that - would sure be nice to add some examples to the normal EanaEltu translator as well (Also, I thought about adding explanations to the translator, for example "What means dative" and that kind). I add the possibility to add example sentences to EanaEltu (though it'd be something the translators have to translate too. Well, poor guys) Nope we can't. No examples. So, this way, I'm afraid you have to stick to the CSV/TSV files.

CSV and TSV files go here! Fast, there's a spy creeping around.
Eana Eltu: PDF/TSV/jMemorize

Sіr. Ηaxalot

Quote from: Tuiq on February 16, 2010, 12:37:38 PM
Well, this isn't much of a problem. Seems to be quite easy to write. Although I'd have to convert the IPA (unless I and Taronyu "upgrade" that to UTF8 soon). Not possible. :effort:. About the examples, well. We could talk about that - would sure be nice to add some examples to the normal EanaEltu translator as well (Also, I thought about adding explanations to the translator, for example "What means dative" and that kind). I add the possibility to add example sentences to EanaEltu (though it'd be something the translators have to translate too. Well, poor guys) Nope we can't. No examples. So, this way, I'm afraid you have to stick to the CSV/TSV files.

CSV and TSV files go here! Fast, there's a spy creeping around.

All right, so I'll have to parse the CSV then, but just one question. What exactly does the * and -- stand for in the CSV. I'm guessing that it has something to do with positioning next to a word (-- = another word?).

Tuiq

Take a look at the dictionary. In its introduction, everything is explained.
Eana Eltu: PDF/TSV/jMemorize