Eana Eltu: Translator, Dictionary, API and putxìng.

Started by Tuiq, January 07, 2010, 04:20:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

`Eylan Ayfalulukanä

Unicode has characters for almost all IPA symbols. I'll look up what the IPA is for that dental symbol, and set it up as a dead key on my keyboard 9like I do for macrons now). There is also IPA for the primary and secondary stress symbols. I belive the primary stress symbol is U+02C8 and the secondary stress symbol is U+02CC. You would make both Marki and I very happy if we didn't have to type 'texprimstress' for this symbol every time we need it!

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

Like I've said, it would be very possible to have some sort of chart available. I imagine some kind of table with tabs, one for "Na'vi characters", another for IPA, you get the idea. One for Dothraki. Perhaps Na'vi and Dothraki can share one, or we do it instead of tabs in one table. We'll have to see.

But in the worst case, I do suppose I could offer something where you could simply CTRL-C the characters from.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

Are you talking about something like a spreadsheet?

I compiled a list of unicode special characters this evening. it is complete for Dothraki, and mostly complete for the Valrian languages. There is quite a bit of Na'vi there as well, but since I don't have any access to the Na'vi instance of EE, I don't know what I might be missing.

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

Something like this, yes.



We'll have to figure out the details, of course. I'll need to know which characters (and how many) there are, if we want to group them (for example, here I just grouped 6 characters together; alphabetically sorted). Here it's one toolbar going across the whole page, but I'll look into columns, which will be required anyway I believe.

So, if you have the list, you might want to group them somehow (by category? I'm no linguist, but I guess that could make sense?)
Eana Eltu: PDF/TSV/jMemorize

Tuiq

So far, so well.



This would be my idea of the new interface. Looks very much prettier than what we have now, doesn't it?

It has some pretty nity features too; changes are saved per-field with a date. That means two people can edit the same word at the same time, and unless they also edit the same field (let's say I update Source and you update IPA), both changes will be saved without any loss. Of course, the system will tell you that something has changed since you've last opened it, but that's about it. It's going to be saved.

The translation interface looks very similar, and also much prettier.



It's clearly visible which field has been outdated (red is outdated, green is OK, yellow is "not touched yet" and white is non-changeable). This was the most annoying part of the whole operation I believe. It's done. I have to add race hazard protection when a maintainer changes a word you are currently translating, but that should be a walk in the park.

Excited.
Eana Eltu: PDF/TSV/jMemorize

Tuiq



Virtual internet points for the first person to guess what the checkbox is for!
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

So basically, what you want is a table showing all possible IPA values (and their unicode equivalent) that can be used for a given language, including the non-special characters? This is not difficult to do, but will have to wait until this evening.

I like the new interface. It is certainly a lot cleaner!

I have no idea what the checkbox is for. To alert the maintainer you have changed something?

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

#287
Quote from: `Eylan Ayfalulukanä on November 12, 2013, 03:39:22 PM
So basically, what you want is a table showing all possible IPA values (and their unicode equivalent) that can be used for a given language, including the non-special characters? This is not difficult to do, but will have to wait until this evening.

Yes-ish. Perhaps implement it as a button, which will either A) copy it to your clipboard or B) insert it in text field you have currently selected (think of it as some sort of on-screen-keyboard).

Quote from: `Eylan Ayfalulukanä on November 12, 2013, 03:39:22 PM
I like the new interface. It is certainly a lot cleaner!

That was my goal. It's also making stuff much easier for exporters, because I can now say "Export all Dothraki, IPA and Definition into a .sql file".

Quote from: `Eylan Ayfalulukanä on November 12, 2013, 03:39:22 PM
I have no idea what the checkbox is for. To alert the maintainer you have changed something?

You haven't translated a dictionary yet, so you can't know. In the current EE system, there are cases where a word is changed, but your translation doesn't have to. For example, "Definition" contained a typo in English that you obviously did not carry over. So you would be forced to update your entry, even although you don't have anything to update. This is the reason you see all those "% update date" entries in the comment fields - to circumvent this system.

This checkbox is introduced to get rid of this. With this checkbox, you clarify "I want you to update this entry no matter what." Per default, non-changed entries will never be updated in the database. This checkbox circumvents this.

So if a field reports "Hey, I was changed" and you see that your translation is still up-to-date, you can check the checkbox and the system will think you have updated the entry.

A few difficult tasks lie ahead of me:

A) Import Dothraki data
B) Create a search-in-dictionary function
C) Create the good old "List of outdated and untranslated words" list. This one can be a bit tricky.
D) Try to purify Dothraki data (for example, introduce a "Noun" class that will be used for all words that have "n." as part of speech).
E) Write the exporter stuff.

Fun stuff to do, I assume. Let's get to work.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

Quote from: Tuiq on November 13, 2013, 12:19:33 AM
Quote from: `Eylan Ayfalulukanä on November 12, 2013, 03:39:22 PM
So basically, what you want is a table showing all possible IPA values (and their unicode equivalent) that can be used for a given language, including the non-special characters? This is not difficult to do, but will have to wait until this evening.

Yes-ish. Perhaps implement it as a button, which will either A) copy it to your clipboard or B) insert it in text field you have currently selected (think of it as some sort of on-screen-keyboard).

So what you are saying here isn't so much that you need a list of code points, but that I need to be set up to generate the unicode characters I need, as I need them, vs using LaTeX escape codes?

And I am translating a dictionary-- Dothraki into Na'vi. I just have had precious little time, like Marki, to work on it lately.


Yawey ngahu!
pamrel si ro [email protected]

Tuiq

That's right, I think. I need "ä", for example, not "¨a".
Eana Eltu: PDF/TSV/jMemorize

Tuiq

The good news: I successfully imported all Dothraki words, at least the English one.

The mediocre news: The processing went at 2 words/seconds, taking a whole 20 minutes in total. I'm not sure if this is caused by unoptimized code (I'm still quite new to this whole Entity Framework stuff), the fact that it is likely built as debug, my overly greedy validation procedures or MS SQL being slow as hell. I'll have to profile it if this should be an issue in terms of generating the dictionary/searching/whatever comes up.

It looks kinda good so far though.
Eana Eltu: PDF/TSV/jMemorize

Tuiq



We'll have to argue about the page size, I guess, but 10 words/page seems OK for me. The current formatting is a bit weird, especially for translators, I guess, but it has a search function. A fancy search function, mind you.

The next thing would be to implement some sort of "You might want to take a look at this word" thing, which could become a bit difficult. It's not really well implemented in current EE versions as it is.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

Zhey Tuiq, this looks like an interesting start. I am assuming this is the listing interface. If done online, there need not be a limitation to page size, especially for search results.

The record is kind of confusing to read, as everything is the same size/typeface. Is it possible to put the field description text in a light face as it is now, and the values in a bold typeface. Example:

word   Dothraki: lekh dothraki IPA: lex \|[doTRaki Definition: Dothraki Part of speech: ni Source: DP comment: (language)

This format also works, especially for translators and translators:

Dothraki: lekh dothraki
Template: word
IPA: lex \|[doTRaki
Part of speech: ni
Definition: Dothraki
Source: DP
comment: (language)

Notice I put the Dothraki word first? When visually scanning the list to find something, the first item akways stands out.
This could be interesting, seeing how many templates Naʼvi (and the Valyrians) have.

Although it would probably be hard to do, showing the definition as it actually appears in the dictionary would be nice.






Yawey ngahu!
pamrel si ro [email protected]

Tuiq

Quote from: `Eylan Ayfalulukanä on November 28, 2013, 03:16:39 AM
Zhey Tuiq, this looks like an interesting start. I am assuming this is the listing interface. If done online, there need not be a limitation to page size, especially for search results.

I disagree. With Na'vi, you would have 3800 entries on one page. Dothraki has quite a few too. The page is huge and loads quite slowly because the processing isn't as cheap anymore. What reason could there be to have all words on one page?

Quote from: `Eylan Ayfalulukanä on November 28, 2013, 03:16:39 AM
The record is kind of confusing to read, as everything is the same size/typeface. Is it possible to put the field description text in a light face as it is now, and the values in a bold typeface. Example:

word   Dothraki: lekh dothraki IPA: lex \|[doTRaki Definition: Dothraki Part of speech: ni Source: DP comment: (language)

Done.



Quote from: `Eylan Ayfalulukanä on November 28, 2013, 03:16:39 AM
This format also works, especially for translators and translators:

Dothraki: lekh dothraki
Template: word
IPA: lex \|[doTRaki
Part of speech: ni
Definition: Dothraki
Source: DP
comment: (language)

Notice I put the Dothraki word first? When visually scanning the list to find something, the first item akways stands out.
This could be interesting, seeing how many templates Naʼvi (and the Valyrians) have.

This format could work, but would blow up the site quite a bit. Perhaps I'll need to column these things a bit? If we had two columns per word, it would help a lot with the vertical size. We're wasting quite a bunch of horizontal space.



Now imagine that for 3800 words!

We could also think of employing new options for these. For example, have a checkbox that says "This field will not be shown in the overview" (for example, IPA, Part of Speech or Source do not really need to be on the overview list?). We could also have rendering plugins that would tell how to format those, depending on the type. So you could get different formatting for those lines. We could probably do both.

Also, I somewhat agree, the template would not need to be included in this list.

Quote from: `Eylan Ayfalulukanä on November 28, 2013, 03:16:39 AMAlthough it would probably be hard to do, showing the definition as it actually appears in the dictionary would be nice.

I think it's the order you defined the fields for the template, meaning that the oldest field is on top.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

Ah, now, we are getting somewhere! Both of these are improvements on the previous post.
Two columns works if you keep the Dothraki word in the top field of the left column.
I listed items by the order they appear in a definition. Is there some constraint in the order items could be listed?
I really like the check box idea. Could this be expanded to the extra fields I have always wanted, such as 'canonic citation', example sentence, etc.?
As far as the entries being too long, keep in mind that for editing you don't generally work with a large group of words at once. IMHO, having to do some scrolling is a fair price to pay for a clean, ordered, easy to work with UI.

Yawey ngahu!
pamrel si ro [email protected]

Tuiq

Quote from: `Eylan Ayfalulukanä on November 29, 2013, 02:40:46 AM
Ah, now, we are getting somewhere! Both of these are improvements on the previous post.

I agree.

Quote from: `Eylan Ayfalulukanä on November 29, 2013, 02:40:46 AM
Two columns works if you keep the Dothraki word in the top field of the left column.

It would likely be "The first half of defined fields is in the left column, the other half is in the other." Which means if you've defined Dothraki first, Dothraki would be in the top-left.

Quote from: `Eylan Ayfalulukanä on November 29, 2013, 02:40:46 AM
I listed items by the order they appear in a definition. Is there some constraint in the order items could be listed?

Hm. Perhaps it could be possible to have an additional field for template fields, which would allow you to "number" them. i.e. giving Dothraki 0 and every other item 1/whatever would assure that it is always in front. This could be done relatively easily, I think. It just adds to the template overlay.

Quote from: `Eylan Ayfalulukanä on November 29, 2013, 02:40:46 AM
I really like the check box idea. Could this be expanded to the extra fields I have always wanted, such as 'canonic citation', example sentence, etc.?

The beauty of the system is that all fields are equal. If we were to introduce this "skip this field in the overview"-flag, it could be applied to every field you want; including Dothraki as example. While this doesn't make much sense, it would be possible.

Adding new fields to existing templates, right now, is not possible I believe, as is modifying templates in general as long as they are used. I think I've worked out the basics to avoid any complications while doing that, but I want to get rendering in order before I'll mess around with modifications to existing templates.

Quote from: `Eylan Ayfalulukanä on November 29, 2013, 02:40:46 AM
As far as the entries being too long, keep in mind that for editing you don't generally work with a large group of words at once. IMHO, having to do some scrolling is a fair price to pay for a clean, ordered, easy to work with UI.

Yes, but the complete list of words is quite long. The search function has been vastly improved. It is searching all fields in your translation, the parent (if existing) and the root (if not the same as your dictionary) and returns all words that match. It's a quite powerful tool, there's little reason to browse a huge list of words at once.

Of course, upon initially translating it, you will have to go through a bunch of words - for this reason I could imagine providing a redirect immediately after translating a word back to the list of words to be translated if that's where you just came from. So the process would be click on "Fill", fill out stuff, submit and repeat.
Eana Eltu: PDF/TSV/jMemorize

Tuiq



Looks quite okay. The numbering could be a bit odd though, I'm not sure on this. As you can see, it's simply "the first half of the items in column #1, the second half in column #2".

Alternatively, I guess, we could do column 1 / column 2 / column 1 / column 2, which could work too. I'm not sure, we'll have to see about how we manage the ordering itself. But I think it's fine for now, and a bit of trial & error won't hurt.
Eana Eltu: PDF/TSV/jMemorize

`Eylan Ayfalulukanä

Ma Tuiq, I really like your latest iteration. I like the items with the potential of being long at the bottom of the columns. I would say more, but I am competing with 2 cats for keyboard access tonight....

Yawey ngahu!
pamrel si ro [email protected]

Tìtstewan

I'm lurking over this thread and I must say that I like that new layout much more than the current thing!

-| Na'vi Vocab + Audio | Na'viteri as one HTML file | FAQ | Useful Links for Beginners |-
-| Kem si fu kem rä'ä si, ke lu tìfmi. |-

Tuiq

I thought about allowing a third-ish column, or rather a second row, where ultra-long entries could be stored. For example, all fields but "Comment" would be in those two columns, with "Comment" being below and stretched across both columns. This would certainly look nicer, but the question is whether this is useful or not - also, it would make fields more complex I think. Instead of having just an order number (which we are going to do..?), you would also need to have a checkbox for "This is a two-column-field" or something and filter it out at the end.
Eana Eltu: PDF/TSV/jMemorize