Translation of the official dictionary / Překlad oficiálního slovníku (discussion in English)

Started by Taronyu, July 19, 2010, 10:21:11 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Taronyu

Sorry, I don't speak Czech.

Do you want to translate the dictionary through the system that Tuiq set up? It would mean that you have a czech dictionary like these:

http://eanaeltu.learnnavi.org/dicts/NaviDictionary_sv.pdf
http://eanaeltu.learnnavi.org/dicts/NaviDictionary_hu.pdf
http://eanaeltu.learnnavi.org/dicts/NaviDictionary_pbtr.pdf
http://eanaeltu.learnnavi.org/dicts/NaviDictionary_de.pdf

It's fairly easy, especially as I've seen you've done all the work, and it would mean that you'd be able to see what I update instead of having to use the changelog. Contact me or Tuiq for details.

Tawtakuk

Hi, Richard, thanks for your offer and for creating the dictionary in the first place!

As for using this translation method, that depends on other members and their will to copy-paste, but having a separate dictionary has (had?) three reasons for us:


  • It was done way before Tuiq's system was created - back when You were writing TeX files yourself. There was no other way to do it other than writing the dictionary from scratch.
  • My original decision was opposite to yours ("Call me old-fashioned, but I like documents."), using an on-line form so that only the up-to-date version is available and people don't stockpile old versions on their hard drives. Again, this was in the early days of guessing word roots from the ASG, when newer versions often rendered earlier dictionary entries completely incorrect. This is no longer the case as newer versions now only add more stuff.
  • It allows somewhat longer and more detailed entries where the meaning is dubious (i.e. only one English equivalent and no usage, which could have only some of the multiple meanings of that English word - and those meanings are often carried by distinct Czech words)

Well, thanks again, I'll leave the decision to our community :)
01000011 01011010 00100000 01001101 01101111 01100100 01100101 01110010 01100001 01110100 01101111 01110010 00000000

Taronyu

Thanks, Tawtakuk. The way I see it though, the first and second of those reasons are now obselete, and it's simply a matter of time to translate the thing - and there's no real time limit on translation, you guys can take a bit to do it. And as for the third reason, you can pretty much translate it with whole extra paragraphs if you want to. That's not really a constraint.

But yeah, think it over. :)

Tawtakuk

I'm not hiding the obsoleteness of the reasons. They were mere explanations of the existence of our "stand-alone" dictionary ;)

However, point 2 still remains somewhat valid as people might prefer an online, always-up-to-date version instead of having to download a PDF over and over again. Maybe Tuiq could write a different script to output his database as an HTML/CSS document, not unlike what Google Docs gives, to allow both approaches. That would really be cool :D
01000011 01011010 00100000 01001101 01101111 01100100 01100101 01110010 01100001 01110100 01101111 01110010 00000000

Taronyu

Some browsers (like firefox) let you see the pdf in an tab. Maybe try that?

Will talk to Tuiq about it. But from a purely kxeyey-driven point, one dictionary with translations is probably better. And it's useful to have a document when the internet is down, in my opinion. But we know where I stand.

Tawtakuk

A PDF in a tab is still a PDF - a cumbersome, slow-to-render monster even when typeset by TeX, plus searching in Adobe Reader (plug-in or not) has never been quite as fast as web-page searching. And general accessibility is better - phone web browsers, not even the older ones, don't have problems with the simple HTML Google Docs generate, whereas attempting to download and view a PDF on that phone is doomed from the beginning. And even with a PDF-enabled phone, you're stuck with the rigid document formatting, making quick word lookups rather unpleasant.

I agree to the error problem, the "maintenance cost" of our current solution certainly is higher :) And that's the reason why I'm discussing this instead of dismissing the idea. And I'm beginning to think that should Tuiq create the simple web-page output, I'm definitely in - there will be no more advantages in having an isolated dictionary.
01000011 01011010 00100000 01001101 01101111 01100100 01100101 01110010 01100001 01110100 01101111 01110010 00000000

Tuiq

And here we begin the flamewar. It's not possible to convert a TeX to HTML (well, of course, it is, but not satisfying). I agree that the Acrobat Reader is a bit bloaty and all, but after all, it's a nice and well-known (as well as accepted) standard for documents. HTML is a real pain - and I know what I am talking about, I have designed a few websites. Some browsers just want to feel silly about something, and then you can debug it again. Let me tell you: I'm not a guy who respects any browser that does what it shouldn't. If it looks messed up in the Internet Explorer (most common case), then it will look messed up. I will put *no effort* into fixing it.

Also, we're not talking about simple HTML with a few <table> and numbers. We are talking about more or less complex formatting, including IPA (which is now UTF8 compatible, but then again, you have to know that the user's font HAS all the characters necessary and is UTF8 ready. Of course you can do it as &entity;, not sure if that does solve the font problem.

The dictionary itself is not really thought to be used as a look-up-tool (well, it is, but it does not work as smooth as it could) - and I'm not sure whether or not HTML will fix that. Now you can say "Okay, searching for some things is broken because Adobe messed up", after that you'd have to say "Okay, searching for (anything|something) is messed up because $browser does not support that and that and this and BLAH". There is an online HTML-only lookup which is never used, sadly, so I didn't improve it in any way. Of course, that could be done. Also, there would be the API, which would allow you to create your own lookup thing, designed for your needs. And if you want to create an own list, there's the SQL which contains all enabled languages, IPA, Na'vi and infixes.


And, to show you what it is like to "translate" the dictionary, I'll just post a picture of the translation system (blue is the actual field your cursor is in - since a lot of people messed up the right ordering).
Eana Eltu: PDF/TSV/jMemorize

Tawtakuk

Hmm, the problem may be more complicated than I've thought.

I had the idea that your system is more or less just a database of word-translation pairs, with links to a containing section (two or three SQL tables). Given the screenshot, I now see that there is probably TeX markup in the entries and that it is rigidly constructed around replacing variables in pre-formatted strings...

What I had in mind by "quick lookup" is not a server-side search, which involves sending a request each time and is highly impractical for repeatedly looking up multiple words, for example when you are translating someone's forum post, and even more impractical for mobile users, given mobile connection latency. My usage scenario is: Encounter a Na'vi text - open the dictionary in a new tab - quickly look up all you need, switching between tabs, then keep the dictionary open should you need it more - close the entire browser when you're done. It only involves one HTTP request for all the known words, and using any browser's in-page search is always blazing fast, and it allows you to jump over occurences back and forth (very important for reverse usage, when translating X to Na'vi).

As for IPA, we don't need it if we mark stress by underlining and mention any exceptions to the otherwise pretty unambiguous latin transcription.
And not having a UTF-8 ready font, that's not going to be the case of any non-english dictionary readers and if there is a problem then OK, they will see gibberish in the pronunciation brackets but they can download the PDF if necessary or solve their own font issues, who cares - we're not creating a paid-for service.

As for markup, format and browsers messing up, you'd be absolutely right if we were to reproduce the original document look using HTML and CSS. But again, that is not the goal, the web variant should be minimalistic, with only the most basic markup - headings, lists, bold, italic, tables and that's it! Take a look at the web-published Google Docs version of our current dictionary - ignore it's in Czech and just look at the simplistic but functional and accessible formatting: http://docs.google.com/View?id=dfrz8dmn_1gv7mc2dm
I use this in my 3 years old phone and it loads fast and is quickly searchable. A PDF is neither of those. All we in fact need is something like this, a simple and always-up-to-date on-line dictionary, as an alternative to the otherwise very nice PDF document.

So, without further rambling - is it possible to output a super-simple HTML from your DB or not? The only real problem I see is the stress, which cannot be easily extracted from IPA to underlining. But having UTF-8 IPA for those who can display it would be pretty much enough.

Thanks for commenting on this issue, Tuiq :)
I may move this discussion to a separate thread later on...
01000011 01011010 00100000 01001101 01101111 01100100 01100101 01110010 01100001 01110100 01101111 01110010 00000000

Tuiq

Quote from: Tawtakuk on July 22, 2010, 12:50:23 PM
Hmm, the problem may be more complicated than I've thought.

I had the idea that your system is more or less just a database of word-translation pairs, with links to a containing section (two or three SQL tables). Given the screenshot, I now see that there is probably TeX markup in the entries and that it is rigidly constructed around replacing variables in pre-formatted strings...


That's completely right. It was designed to allow Taronyu to change words easy on the way, later came other people to translate it more or less easily. After all, it's not a lookup or anything, it's a simple LaTeX creating machine. However, Eana Eltuyä vrrtep, the demon running the translation system for the website and the IRC, is parsing this data (it's also generating the SQL you may have already seen, which has exactly two tables, one for Na'vi, one localized).

Quote from: Tawtakuk on July 22, 2010, 12:50:23 PM
What I had in mind by "quick lookup" is not a server-side search, which involves sending a request each time and is highly impractical for repeatedly looking up multiple words, for example when you are translating someone's forum post, and even more impractical for mobile users, given mobile connection latency. My usage scenario is: Encounter a Na'vi text - open the dictionary in a new tab - quickly look up all you need, switching between tabs, then keep the dictionary open should you need it more - close the entire browser when you're done. It only involves one HTTP request for all the known words, and using any browser's in-page search is always blazing fast, and it allows you to jump over occurences back and forth (very important for reverse usage, when translating X to Na'vi).
(Don't read this: To create the SQL file, the demon is capable of fetching *all* available words including all translations and stuff. It's a pretty big query, but since it's using UNIX sockets, it doesn't matter. Damn, you read it, didn't you.)

Quote from: Tawtakuk on July 22, 2010, 12:50:23 PM
As for IPA, we don't need it if we mark stress by underlining and mention any exceptions to the otherwise pretty unambiguous latin transcription.
And not having a UTF-8 ready font, that's not going to be the case of any non-english dictionary readers and if there is a problem then OK, they will see gibberish in the pronunciation brackets but they can download the PDF if necessary or solve their own font issues, who cares - we're not creating a paid-for service.

As for markup, format and browsers messing up, you'd be absolutely right if we were to reproduce the original document look using HTML and CSS. But again, that is not the goal, the web variant should be minimalistic, with only the most basic markup - headings, lists, bold, italic, tables and that's it!
Well, that would be very well possible - even an index with hyperlinks and all.

Quote from: Tawtakuk on July 22, 2010, 12:50:23 PMTake a look at the web-published Google Docs version of our current dictionary - ignore it's in Czech and just look at the simplistic but functional and accessible formatting: http://docs.google.com/View?id=dfrz8dmn_1gv7mc2dm
I use this in my 3 years old phone and it loads fast and is quickly searchable. A PDF is neither of those. All we in fact need is something like this, a simple and always-up-to-date on-line dictionary, as an alternative to the otherwise very nice PDF document.

So, without further rambling - is it possible to output a super-simple HTML from your DB or not? The only real problem I see is the stress, which cannot be easily extracted from IPA to underlining. But having UTF-8 IPA for those who can display it would be pretty much enough.

Thanks for commenting on this issue, Tuiq :)
I may move this discussion to a separate thread later on...

Actually, I just updated navi_dict.pl - if you take a look at it, you can now display the IPA. I'm not a professional, but I'd say it's about to be correct, using only UTF8.

It is possible, more or less, to create this simple HTML. I'll see what I can do and post if I have something to show.
Eana Eltu: PDF/TSV/jMemorize

Tawtakuk

Yeah, the IPA output looks OK on everything I've tried. Thanks for the heads-up, I'm looking forward to seeing the result :)
01000011 01011010 00100000 01001101 01101111 01100100 01100101 01110010 01100001 01110100 01101111 01110010 00000000