Eana Eltu: Translator, Dictionary, API and putxìng.

Started by Tuiq, January 07, 2010, 04:20:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

`Eylan Ayfalulukanä

Another question: At what point in history did EE come on line and replace the manually edited dictionary document?

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

January 2010. The beginning of this topic marked that pretty closely.
Eana Eltu: PDF/TSV/jMemorize

baritone

Is it possible to give access to *.tex files for translators after creating the dictionaryes?

This would be usefull becase:
1) In some cases it is much easier to look for errors in the database through viewing the *.tex source files
2) The hyperref package leave list of appendicies on Russian dictionary empty dispite of the \hypersetup{unicode=true} instruction. This instruction worked fine at my home machine.
3) The ä and ì letters in Russian dictionary are combined from two ¨a and symbols respectivelly, which makes impossible to find some words in the dictionary with this letters. This can be solved with xelatex, which is part of new TexLive package and incompartible with obsoleted teTeX package, that is used in LN server.

If the translators had access to the *.tex files, I could maintain the translation database on the Eana Eltu and compile a high-quality Russian dictionaries at home, waiting for the latex software updates.

`Eylan Ayfalulukanä

I would find this useful as well, for troubleshooting, especially for Valyarian.

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

Quote from: baritone on October 29, 2013, 11:21:14 AM
Is it possible to give access to *.tex files for translators after creating the dictionaryes?

This would be usefull becase:
1) In some cases it is much easier to look for errors in the database through viewing the *.tex source files
2) The hyperref package leave list of appendicies on Russian dictionary empty dispite of the \hypersetup{unicode=true} instruction. This instruction worked fine at my home machine.
3) The ä and ì letters in Russian dictionary are combined from two ¨a and symbols respectivelly, which makes impossible to find some words in the dictionary with this letters. This can be solved with xelatex, which is part of new TexLive package and incompartible with obsoleted teTeX package, that is used in LN server.

If the translators had access to the *.tex files, I could maintain the translation database on the Eana Eltu and compile a high-quality Russian dictionaries at home, waiting for the latex software updates.

In the past, access to the dictionaries was denied for... reasons beyond my knowledge right now. There's a workaround, though: If your dictionary doesn't compile, it should show you the whole LaTeX, including line numbers. Generally, I'm against free access to the TeX. It's causing more issues than it's worth it.

As for different TeX things, the system is quite open for that, but you'd have to annoy Markì for that.
Eana Eltu: PDF/TSV/jMemorize

baritone

Quote from: Tuiq on October 29, 2013, 05:05:58 PM
There's a workaround, though: If your dictionary doesn't compile, it should show you the whole LaTeX, including line numbers. Generally, I'm against free access to the TeX. It's causing more issues than it's worth it.

As for different TeX things, the system is quite open for that, but you'd have to annoy Markì for that.
What could be wrong if the *.tex files are moved to the same location as *.pdf files after the compilation? What the issues may be raised if someone reads *.tex files?
Reading the *.log file quite useless without access to the *.tex files, especially in case of T2[ABCD] font encoding in russian texts.

Tuiq

Because the .tex were not meant to be public by Taronyu if I remember correctly. I'm not one hundred percent sure, but I think we agreed on that back then. From my POV, as is the principle of information hiding, you don't give your clients access to everything, just what they need. Because the TeX can, and will, change at random, I do not want people to parse it instead of the original database. It's an unstable, unreliable source for information.

Looking through the code, it seems as if only original authors (people with meta access, i.e. non-translators) can access the source on create_dict, either by triggering an error in pdflatex or by specifying check=1 in the URL. (?check=1 or lc=foo;check=1 for other dictionaries).
Eana Eltu: PDF/TSV/jMemorize

baritone

Quote from: Tuiq on October 30, 2013, 06:32:49 AM
Because the .tex were not meant to be public by Taronyu if I remember correctly. I'm not one hundred percent sure, but I think we agreed on that back then.
Of course, the search for errors on the database does not always lead to the result, because the errors in the *.tex file can appear even in the absence of errors in the database (as happened recently with Russian dictionary, see: teTex on the LN server). But the author's desire is sacred.

Tuiq

I do understand and completely agree that the current situation is not ideal. However, having to browse a four thousand TeX file for a missing bracket isn't exactly either.

But as said before, without explicit permission from Taronyu and the other dictionary editors, I'm not going to release the TeX files live. I'm not quite sure if I legally was even allowed to and while I'm sure nobody would cause much drama about this, I would like to avoid it altogether.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

You certainly have permission from this dictionary editor!

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

The problem here is that I'd need an option to enable it for certain dictionaries (as all are currently using the very same code, saves a lot of maintenance). If the need arises for VL, we can talk about it. As I've said before, if you attach ?check=1 to the create_dict URL, you should get a dump of the LaTeX as main author. For example, /create_dict.pl?lc=de;check=1 or simply /create_dict.pl?check=1 for the main dictionary. So, if you want to give somebody a copy of their language's TeX, you are free to do so. You've got the ability. Don't get worried if the text is red, I'm just pretending an error happened so it dumps everything.

In case of new/not working dictionaries, we usually give out the TeX so you can fiddle around on your local machine and then adapt it on EE/tell us what is necessary to do. I think I've sent baritone a copy of the TeX (did I forget?). But as I've said, newer TeX installations and all that stuff are out of my jurisdiction. I'm maintaining the system (i.e. the web front and backend), not the tools used to create the dictionary itself.
Eana Eltu: PDF/TSV/jMemorize

Toruk Makto

Part of the reason I have not made a huge effort to modernize the typesetting apps on LN is that aside from adding vocab, the English/Na'vi dictionary really needs to have an unchanged look and feel for a while and I don't want to horribly break anything right now.

-Markì

Lì'fyari leNa'vi 'Rrtamì, vay set 'almong a fra'u zera'u ta ngrrpongu
Na'vi Dictionary: http://files.learnnavi.org/dicts/NaviDictionary.pdf

Tuiq

At least on my end, it is completely irrelevant. I could rewrite EE in .NET and it would produce the exact same LaTeX, while offering newer, better interfaces and options for future generations of the dictionary.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

Irayo, ma Tuiq! I did not know about the 'check = 1' flag for compiling. I will have to try it, as I want to run the Valyrian dictionaries through some different PDF generators, like xetex. I can also use the well-established Dothraki dictionary as a test to see if there are any typesetting changes that occur with the different PDF generators.

Now, if I can find the time to work on this... !

Yawey ngahu!
pamrel si ro [email protected]

baritone

Quote from: Toruk Makto on October 31, 2013, 04:37:03 PM
Part of the reason I have not made a huge effort to modernize the typesetting apps on LN is that aside from adding vocab, the English/Na'vi dictionary really needs to have an unchanged look and feel for a while and I don't want to horribly break anything right now.

-Markì
Yes, it was clear that the latex changing is risky right now. But when I asked for the *.tex source files, I was looking for a workaround for the problem in russian dictionary with the letters ä and ì as well. With the current latex this problem can only be solved by making the fonts with the special encoding. It would be a stupid job, because sooner or later, but the new distribution tex with unicode font support will be installed.

But until that does not done, I could compile the Russian dictionary at my home machine and place it on learnnavi.org. This would solve the problem of searching the words with the letters ä and ì, and makes the dictionaries with normally generated list of appendixes.

Well, since I am not the editor, but only the translator, I can not get the source *.tex files automatically. But may be it is possible to get them for me after the dictionary changes simply by asking about it, so I could transfer compiled *.pdf files back to someone that can upload it to the server? The problem with the letters ä and ì delivers a lot of pain for users of Russian dictionary, and if they can be avoided, it is desirable to do so.

Tuiq

I don't quite understand the issue to be honest. What does "not being able to search" mean? As example, the German dictionary, which also features ö and ü, isn't searchable with Adobe (for example, searching for "äus" as in "Geräusch" returns all possible "aus" words). However, with Firefox's internal PDF reader it works flawlessly.

I'm inclined to believe that this is simply an issue with the reader, not necessarily the generated PDF?

Nobody should edit the TeX by hand all the time, because there is, besides EE, nothing that really tracks the changes (except voluntarily led lists, which might not include all changes). EE itself doesn't keep a list per se either, but simply a timestamp when a thing got edited. So you would, necessarily, have to compare the dictionary all the time. It's a lot of useless hassle.
Eana Eltu: PDF/TSV/jMemorize

baritone

Quote from: Tuiq on November 01, 2013, 11:19:19 AM
I don't quite understand the issue to be honest. What does "not being able to search" mean? As example, the German dictionary, which also features ö and ü, isn't searchable with Adobe (for example, searching for "äus" as in "Geräusch" returns all possible "aus" words). However, with Firefox's internal PDF reader it works flawlessly.

I'm inclined to believe that this is simply an issue with the reader, not necessarily the generated PDF?

Nobody should edit the TeX by hand all the time, because there is, besides EE, nothing that really tracks the changes (except voluntarily led lists, which might not include all changes). EE itself doesn't keep a list per se either, but simply a timestamp when a thing got edited. So you would, necessarily, have to compare the dictionary all the time. It's a lot of useless hassle.
The unability to find some words in Russian dictionary has been caused by Russian 8-bit font encodings in TeX. The T2[ABCD] font encodings does not includes the ä and ì characters, and therefore at the generation of pdf (or dvi) files these two characters are replaced by two. I have already written about this. This does not matter in typography. These fonts and encodings were created by Olga Lapko for his publishing house. German dictionary does not use the T2 font encoding.

I want to take *.tex files, generated by EE, and I not going to change anything in them except the font selection commands (to select unicode font, True Type or so on), to use it with xetex. In case of using the unicode fonts by xelatex for pdf generation, the ä and ì characters will not be replaced by two.

I'm going to maintain a Russian word dictionary in the EE up-to-date, and I going to ask the *.tex files every time the dictionary will be changed to compile high-quality dictionaries on my home machine, as it can not be done at the server. I'm ready to do it up to the time when the xelatex will has been installed at the LN server.

`Eylan Ayfalulukanä

In my case, letters with macrons, as used in High Valyrian, cause pdftex to 'choke', as they are more 'extended' unicode than most vowel markings. There is a LaTeX escape code (for lack of a better term) that causes the macrons to print properly in pdftex. But I have discovered taht they don't create a record that can be searched on, if the database is used to build other tools. I al also seeming to recall that the finished PDF was not searchable if macrons were involved, as well.

Due to extenuating circumstances in my personal life, this isn't something I have visited in a few months, and I need to go back and take a good hard look at it.

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

Quote from: `Eylan Ayfalulukanä on November 01, 2013, 03:18:52 PMThere is a LaTeX escape code (for lack of a better term) that causes the macrons to print properly in pdftex. But I have discovered taht they don't create a record that can be searched on, if the database is used to build other tools.

If you mean that in the SQL/whatever there's LaTeX instead of the actual word, I'm sure that can be changed. We had `i and 'a for a while, I believe before we switched to UTF8.

However, UTF8 itself *should* cover everything.
Eana Eltu: PDF/TSV/jMemorize

baritone

#239
Quote from: Tuiq on November 01, 2013, 04:12:19 PM
However, UTF8 itself *should* cover everything.
There is no UTF8 at plflatex in tetex distribution. It is 8-bit application at any part of it. Unicode support appeared only in xetex, included in texlive.

P.S. Unicode support in (pdf)latex is like the one in linux text console in utf mode. pdflatex translates utf8 in 8-bit font encoding, and it works with bytes thereafter. And with xelatex the babel package should be replaced by polyglossia in order to avoid unicode translation into 8-bit font encodings.