Author Topic: "Dictionary" Generator  (Read 860 times)

0 Members and 1 Guest are viewing this topic.

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
"Dictionary" Generator
« on: March 15, 2016, 03:50:19 pm »
Kaltxì ma smuk,

Few days back, I have created a PHP based script that use the NaviData.sql to generate a dictionary page, for example, one can see on the LN Vocab page. :)

It is only one single php file one can find at my GitHub repository:
https://github.com/Titstewan/DictionaryGenerator

This generator generally creates an html page as output, but with a few changes in the following line,
Code: [Select]
echo '<span style="font-weight: bold; margin-left: -0.7em;">', $data2['navi'], '</span> [', $data2['ipa'] ,'] <em>', $data2['partOfSpeech'],'</em> ', $data2['localized'], '<br />';one can use it to generate an XML output.
Code: [Select]
echo '<entry><word>', $data2['navi'],'</word><pro>[', $data2['ipa'] ,']</pro><source>PF</source><pos>', $data2['partOfSpeech'],'</pos><def>', $data2['localized'], '</def></entry>';Just put the echo in a foreach function as follows:
Code: [Select]
foreach ($vocab as $data2)
{
// echo the stuff
echo '<entry><word>', $data2['navi'],'</word><pro>[', $data2['ipa'] ,']</pro><source>PF</source><pos>', $data2['partOfSpeech'],'</pos><def>', $data2['localized'], '</def></entry>';
}
echo '</dictionary> -->';

To run this, one will have MySQLi enabled. This script has been written in a PHP 5.5 environment. However, this could work under older PHP versions, but I haven't tested it.

If you have problems to get this running, just let me know. :)

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline Wllìm

  • Taronyu
  • ****
  • Posts: 516
  • nl Netherlands
  • Karma: 47
    • Wimiso (weptsenge oeyä)
Re: "Dictionary" Generator
« Reply #1 on: March 17, 2016, 04:41:06 am »
Nice, this should be able to replace my manual find-and-replace process for updating the grammar tools on my website. +1 :D

(I'll still need to apply manual fixes on the infix positions, though, and remove derived verbs...  :( I've been thinking for a long time about making a special dictionary containing - instead of the meaning - grammatical information about words...)
Stress practiceNoun declensionsVerb infixes •  Weather forecasts in Na'viKDE nìNa'viMy Na'vi blog
Seykxel sì nitram! Ngal rolun fì'upxaret aketsuktse'a! :D

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #2 on: March 17, 2016, 04:55:24 am »
I am glad you like it! :)
The EE database contains also the infix information like '<1><2>ak<3>u.  Just add the variable $data2['infixes'] to the main echo line. :)

EDIT: One can create a function that scans the $data2['infixes'] variable for <1>, <2> and <3> and replace them by the corresponding infixes (by using an array). By that, one could generate a list of the related verb with their infixes.
« Last Edit: March 17, 2016, 05:02:39 am by Tìtstewan »

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline Wllìm

  • Taronyu
  • ****
  • Posts: 516
  • nl Netherlands
  • Karma: 47
    • Wimiso (weptsenge oeyä)
Re: "Dictionary" Generator
« Reply #3 on: March 17, 2016, 12:41:58 pm »
Well, I know that there is an infixes column, but it contains mistakes as it is filled by a script (see this post). The IPA data is correct, but it is a lot of work to parse it. Also some other information is missing... I think while the current dictionary is great for humans, it is not easy to use the data for grammar analyzers and so on ;)
Stress practiceNoun declensionsVerb infixes •  Weather forecasts in Na'viKDE nìNa'viMy Na'vi blog
Seykxel sì nitram! Ngal rolun fì'upxaret aketsuktse'a! :D

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #4 on: March 17, 2016, 01:15:30 pm »
Ah, yes... I forgot about that post. :-[ So, nevermind. :)
Yeah, the issue with Eana Eltu... We would need to create a completely new environment that
A) actually support UTF-8 characters (use my scipt and switch to Russian, you'll get a lot of question marks)
B) offer more flexibility like adding sentences
C) get rid of that LaTeX thing that is indeed powerful but it has problems regarding character encoding (why on earth they don't add fully UTF-8mb4 support, and that in all packages?)
and some other stuff I forget...

Also, (no joke) I started to derp with a fresh SMF installation and try to create a "dictionary system" modification. But the thing is, I am not a php dev, and therefore i have to lurk A LOT in various documentations.
Just see the attachment, this is what I got so far....

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline Wllìm

  • Taronyu
  • ****
  • Posts: 516
  • nl Netherlands
  • Karma: 47
    • Wimiso (weptsenge oeyä)
Re: "Dictionary" Generator
« Reply #5 on: March 17, 2016, 03:56:37 pm »
That looks good. I see both advantages and disadvantages to having the dictionary integrated with the forum software. Advantage would be that it looks nice to have the dictionary integrated with the forum. Disadvantage would be that it may be harder to develop and maintain...

About LaTeX: if one uses XeLaTeX instead of PDFLaTeX to compile, you get full UTF-8 support. It should even work with languages with complicatted scripts like Chinese (I never tried it though - I don't speak Chinese ;D) If you have questions about LaTeX stuff: I use it often, so just ask :D

I think that it would be best to have the database decoupled from whatever program is used to produce the output. So if you want a PDF, you could use some program that invokes XeLaTeX; if you want HTML, you can use something like your PHP script; and so on :)

Sentences would be great! Also maybe the Frequency Dictionary could be integrated... And maybe etymology information for each word? Okay, I'm getting a bit too enthusiastic here ;D

I think I am going to develop some prototype this weekend... :-\
Stress practiceNoun declensionsVerb infixes •  Weather forecasts in Na'viKDE nìNa'viMy Na'vi blog
Seykxel sì nitram! Ngal rolun fì'upxaret aketsuktse'a! :D

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #6 on: March 17, 2016, 04:36:23 pm »
That mod is definitely not supposed to be installed on *this* forum. EanaEltu is also a "hacked" forum software, just because one does not have to create a permission, login, or session system too, only the part for the dictionary have to be done. I just took SMF because I mostly understand how it works. EE forum is based on Perl and I know absolutely nothing about about Perl.

LaTeX is interresting and on some things very powerful (I actually use it for creating a new Na'vi reference (Horen amuve). It works well, but for example TeXStudio is showing errors about font faces because of the IPA stuff and I had difficulties with \href{}{} because it still used T1 or OT1 coding. I "fixed" an apostrophe by writing \%E2\%80\%99 ...this is not what I really want to deal with links that has non ASII signs. O___o

There should be a "database that rule them all", that could be the origin for all other dbs, also for a version with LaTeX to generates the PDF (which has to keep its current form for various reasons, btw). Kop, I (and some other people for sure) would really love to get that dictionary automated that Plumps has created. And finally, what let wish to add is also a kind of word management system for the LEP because I fear that the LEP word list is becomming more and more complicated and bigger.

I am not sure what do you mean with "develop some prototypes", but if you consider to create such a system, I'll let you know that there was a group in the past that planned to create someng like that, but stopped further development. I would suggest to team up and develop such a system together because I doubt that one single person can just create such a thing (I am not saying that it is impossible, btw :)) So, some of us have Githup and stuff... ...shall we create a new thread about it?

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #7 on: March 19, 2016, 09:31:51 pm »
That dictionary generator could totally fit in the "tools" area. :)

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline `Eylan Ayfalulukanä

  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 4744
  • us United States
  • Karma: 44
  • Palulukan alu Kenya 06/23/1996 - 01/15/2017
    • The Lionlamb website
Re: "Dictionary" Generator
« Reply #8 on: March 21, 2016, 04:03:33 pm »
Attempts to get EE to work well beyond Na'vi and Dothraki have not been successful. High Valyrian, another language that lurks on this server and is not talked about much, uses some special and somewhat unusual diacritics on vowels. They give EE a fit because it uses PDFTeX. You can get PDFTeX to generate the correct characters, but the escape codes end up in the database and make things that parse the database throw up when they are encountered.

I'd like to see something developed that work work with a lot of different languages. It should be easy to add and edit word entries, be flexible in its formatting, make a nice-looking dictionary, and a database that can be universally understood. You can come close to all these things at once, but to perfect them will take some real work. I am sure there are commercial programs out there that will do this, and it would be understood why they are not inexpensive.

Yawey ngahu!
pamrel si ro [email protected]

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #9 on: March 21, 2016, 04:25:36 pm »
I am sure there are commercial programs out there that will do this, and it would be understood why they are not inexpensive.
I haven't found a web applipication that could cover all the necessary stuff we will need. Perhaps, it just has not been written yet. That's why I'll try to create such a dictionary system that will use UTF-8mb4. I am very worry about how to convert the stuff from the database into a PDF file that have to look very very close to the current Na'vi dictionary...

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #10 on: January 07, 2017, 12:30:26 pm »
So, yeah...
I simplified all the code of the the generator. I still wonder why on Earth I haven't done it earlier... :-[

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

Offline Tirea Aean

  • The Blue One
  • Olo'eyktan Anawm
  • Palulukan Makto
  • *****
  • *
  • *
  • *
  • Posts: 9766
  • nv Eywa'eveng
  • Karma: 241
  • Oeri ran lu srung
    • Tirea Aean
Re: "Dictionary" Generator
« Reply #11 on: February 27, 2017, 07:12:09 pm »
Well, I know that there is an infixes column, but it contains mistakes as it is filled by a script (see this post). The IPA data is correct, but it is a lot of work to parse it. Also some other information is missing... I think while the current dictionary is great for humans, it is not easy to use the data for grammar analyzers and so on ;)

I've had no problems with our database in Fwew or Vrrtep thus far. Then again, these use plaintext dumps of the tables.

I have created a "fwew data file manufacturing/editing suite" that is automated. It is a set of scripts I run on every PDF update to do the following to get the update out for Fwew

  • Download NaviData.sql
  • drop local database; replace it with what's in NaviData.sql
  • select the tables into outfiles
  • shell script to fix the broken infix location data of compound verbs
  • probably some other stuff I forgot to list here
  • scp the data files to dictionarydata folder on my website

I just got done (mostly) working on getting Infix parsing to work in fwew. What I ended up doing:

Since the <1><2><3> "infixes" field of the data file is now reliable thanks to my hax.sh from step 4 above...

  • make a regex string for each infix position. Like: "(äp)?(eyk)?"
  • grab the infix location data from the infixes field of the data file
  • replace the things that look like <1> with such strings, make this the regex to match the input against
  • compile regex, call a Match All String function, to see what infixes the word has

IPA just works. Unless you're on Windows.


About EE, indeed I do remember the previous efforts to replace Eana Eltu. It for some reason never took off.

I really don't see why it's so impossible to do this. EE is literally just a pile of Perl hax. We're probably just over thinking it.

The mentioned effort was actually in support of redoing EVERYTHING. A brand new Database layout, a brand new PDF generator, AND brand new graphic interface for users to edit the dictionary. Yeah, that's a lot of work. But it can be done.  A few people on GitHub and a lot of dedication can pull it off. Even in such a way that the PDF output looks identical. (We would need to study the source code of the PDF to know exactly how to reproduce the layout and style in order to make the new product identical)

Yeah. Would be cool to have a universal dictionary system with Full UTF8 support. and all that stuff and what not.



Learn Na'vi Discord Chat: https://discord.gg/WF6qcmv

Offline Yawne Zize’ite

  • Uniltìranyu
  • **
  • Posts: 176
  • Karma: 4
Re: "Dictionary" Generator
« Reply #12 on: March 04, 2017, 02:53:58 pm »
Is using PDFTeX an absolute necessity? XeLaTeX and LuaLaTeX both natively use UTF-8.

Offline Tìtstewan

  • LearnNavi Zeykoyu
  • Toruk Makto
  • Palulukan Makto
  • *****
  • *
  • *
  • Posts: 9763
  • de Germany
  • Karma: 319
  • Ke lu oeru kea krr krrtalun!
    • My YouTube Channel
Re: "Dictionary" Generator
« Reply #13 on: March 04, 2017, 02:55:45 pm »
As far as I know, it is needed to create the pdf file.

-| Dict-Na'vi.com | Na'viteri Files | FAQ | LM | Puk Pxaw 'Rrta | Kem si fu kem rä'ä si, ke lu tìfmi. |-

 

Become LearnNavi's friend on Facebook Follow LearnNavi on Twitter! Watch LearnNavi's videos on YouTube

SMF 2.0.15 | SMF © 2017, Simple Machines
Privacy Policy
| XHTML | RSS | WAP2 | Site Rules

LearnNavi is not affiliated with the official Avatar website,
James Cameron, or the Twentieth Century-Fox Film Corporation.
All trademarks and servicemarks are the properties of their respective owners.
Images in the LearnNavi.org Forums and Gallery may not be used without permission.

LearnNavi Affiliates:
ToS

LearnNavi is the community to learn Na'vi, the Avatar Language
"A place where real friendships are made." -Paul Frommer

AvatarMeet | Learn Na'vi Forum | Learn Na'vi Wiki | Na'viteri

LearnNavi