Eana Eltu: Translator, Dictionary, API and putxìng.

Started by Tuiq, January 07, 2010, 04:20:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Tuiq

Code (Perl) Select

# This is somewhere in the loading process, where @words is the complete list of words.
for my $word (@words) {
my @types = split ', ', $word->{type};
my @parts = map { chomp $_ } (split / +/, $word->{nav});
if ((scalar grep { $_ =~ /^s?v(?:tr|in)?\./ || $_ =~ /verb/ } @types) || $word->{nav} eq 'si' && scalar @parts <= 3) {
#~ print "Treating '$word->{nav}' | '$word->{eng}' as verb.\n";
# Wieder alle durch. Nicht-verben ignorieren.
for my $sword (@words) {
my @stype = split ', ', $sword->{type};
next if $sword->{nav} eq $word->{nav} || ((!scalar grep { $_ =~ /^(?:a?tr(?:in)?)?v\./ || $_ =~ /verb/ } @stype) && $sword->{nav} ne 'si');
#~ my $seword = quotemeta($sword->{nav});
#~ print "$word->{nav} =~ /$sword->{qnav}\$/i\n";
if ($word->{nav} =~ /$sword->{qnav}$/i) {
#~ print "'$word->{nav}' is a compound of '$sword->{qnav}' and something else.\n";
my ($nav) = $word->{nav} =~ /^(.*?)$sword->{qnav}$/i;
# Jemand tat die Arbeits bereit?
if (defined $sword->{vnav}) {
die "Undefined svnav for $sword->{nav}!" if !exists $sword->{svnav} || !$sword->{svnav};
$word->{vnav} = $nav . $sword->{vnav};
$word->{svnav} = $nav . $sword->{svnav};
$word->{composed} = $sword->{nav};
#~ print "Composed1: '$nav'-'$sword->{nav}' | $word->{eng} + $sword->{eng}\n";
last;
}
else {
my @psylls = desyll($sword->{nav});

if (scalar @psylls >= 2) {
$psylls[scalar @psylls != 2] =~ s/($VOWELS)/<1><2>$1/o;
$psylls[$#psylls] =~ s/($VOWELS)/<3>$1/o;
} else {
$psylls[0] =~ s/($VOWELS)/<1><2><3>$1/o;
}

$sword->{svnav} = join '', @psylls;
$sword->{vnav} = svnavToVNAV($sword->{svnav});

$word->{vnav} = $nav . $sword->{vnav};
$word->{svnav} = $nav . $sword->{svnav};
$word->{composed} = $sword->{nav};
#~ print "Composed2: '$nav'-'$sword->{nav}' | $word->{eng} + $sword->{eng}\n";
last;
}
}
}

if (!defined $word->{vnav}) {
#~ print "Composed3: '$word->{nav}' is not composed.\n";
my @sylls = SpeakNavi::desyll($word->{nav});
if (scalar @sylls >= 2) {
$sylls[scalar @sylls != 2] =~ s/($VOWELS)/<1><2>$1/o;
$sylls[$#sylls] =~ s/($VOWELS)/<3>$1/o;
} else {
$sylls[0] =~ s/($VOWELS)/<1><2><3>$1/o;
}

$word->{svnav} = join '', @sylls;
$word->{vnav} = svnavToVNAV($word->{svnav});
#~ print "Not composed: '$word->{nav}' | $word->{eng}\n";
}
}
# Here continues some changes (like shortened pronouns).
# Now to the funcs used:
sub svnavToVNAV {
my ($text) = @_;
my ($I1, $I2, $I3) = ($SpeakNavi::INFIXES1, $SpeakNavi::INFIXES2, $SpeakNavi::INFIXES3);
$text =~ s/<1>/($I1)?/o;
$text =~ s/<2>/($I2)?/o;
# VERY SPECIAL EXCEPTION
if ($text =~ /<3>i/) {
$I3 =~ s/i\|/iy\|/go;
$I3 =~ s/i$/iy/o;
}
$text =~ s/<3>/($I3)?/o;
return $text;
}


{
my $REGEX = undef;
# Splits a word into syllables
sub desyll {
my ($text) = @_;
my @syllables;
$REGEX = "^((?:$FRICATIVES|$CONSONANTS)?(?:$VOWELS)(?:$DIPHTONGS)?)" if !defined $REGEX;

#~ print "PW: $REGEX\n";
while ($text =~ s/$REGEX//io) {
#~ print "W\n";
my $syl = $1;
if ($text !~ /$REGEX/i) {
#~ print "1\n";
if ((my ($c) = $text =~ /^($CONSONANTS)/i) && $text !~ /^(?:$FRICATIVES)/i && $syl !~ /(?:$DIPHTONGS)$/i) {
$syl .= $c;
$text =~ s/^(?:$CONSONANTS)//io;
}
}

push @syllables, $syl;
if ($text !~ /$VOWELS/i) {
#~ print "2\n";
if (length $text) {
#~ print "2.1\n";
$syllables[$#syllables] .= $text;
$text = '';
last;
}
}
#~ print "EW\n";
}
#~ print "PW\n";
# If there is still text, it's in the last sybyll
if (length $text) {
#~ print "3\n";
push @syllables, $text;
}
return @syllables;
}
}



^- There you go. That's the code that processes the words and does the infix magic. The hash key for the ipa is called "ipa". Feel free to modify it.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

As I expected. Opening the source does not change a bit. Therefore, I do not see why I should give anybody the /whole/ source. That topic is closed for me, then. Don't ask again unless you can handle it.

And remember: OpenSource does not change the world. It doesn't change applications. It can. But most times, it doesn't.
Eana Eltu: PDF/TSV/jMemorize

Muzer

Right, so because nobody has replied in two days that must mean nobody is working on it? Talk about impatience! I was actually planning on having a look next weekend.

And if opening the source to something is so diabolical, why do so many other individuals and large companies do it?
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Tuiq

I assumed the error should be doable very easily, since it's "much more complicated to check for compound words". And well, it's just 120 lines of code. 60 of them are required to change. It's not that a big deal, it seems.

Since you want me to talk about impatience, I'll gladly do.

Impatience ^= !Patiente;

Done.


And no, I don't say it's diabolical. It's useful where appreciated, here, it's not. There are way too few people that would actually be able to do anything with it.
Eana Eltu: PDF/TSV/jMemorize

Seze

Quote from: Tuiq on July 05, 2010, 09:50:44 AM
And no, I don't say it's diabolical. It's useful where appreciated, here, it's not. There are way too few people that would actually be able to do anything with it.

I've had very few people help out on the iPhone side of the Learn Na'vi Mobile App, but I still think making it open source has been appreciated by others who are interested in learning how the App works.  Thats been the whole foundation of the project I started, its a learning project for me to learn how to develop for the iOS platform.  I made it open source so that if others wanted to help, they could, or if others just wanted to see how it works under the hood, they could do that as well.  Just my thoughts on the matter...


Learn Na'vi Mobile App - Now Available

Tuiq

Well yes, but still there's the "biggest" problem: The database itself. Tell me how you want to code or test your (major) modifications with no data available?
Eana Eltu: PDF/TSV/jMemorize

Muzer

I would have contributed to the Android LN app ages ago, but I can't get my dev environment working (I've never done anything Java-related before so I'm not really sure where the problem is). There should be no such issue with perl code as I've used that many times before.
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Kä'eng

My quick attempt at implementing the Omängum Fra'uti algorithm.


for my $word (@words) {
my @types = split ', ', $word->{type};
my @parts = split / /, $word->{nav};
if ((scalar grep { $_ =~ /^s?v(?:tr|in)?\./ || $_ =~ /verb/ } @types) || $word->{nav} eq 'si' && scalar @parts <= 3) {
my $ipa = $word->{ipa};
$ipa =~ s/\].*//; # Remove alternative pronunciations, just take first
my @ipaparts = split / /, $ipa;
die "Mismatched number of parts for $word->{nav}" if @parts != @ipaparts;
for my $i(0..$#parts) {
my @sylls = SpeakNavi::desyll($parts[$i]);
my $infixcount = () = $ipaparts[$i] =~ /\$\\cdot\$/g;
if ($infixcount == 2) {
$sylls[-2] =~ s/($VOWELS)/<1><2>$1/o;
$sylls[-1] =~ s/($VOWELS)/<3>$1/o;
} elsif ($infixcount == 1) {
$sylls[-1] =~ s/($VOWELS)/<1><2><3>$1/o;
}
$parts[$i] = join '', @sylls;
}
$word->{svnav} = join ' ', @parts;
$word->{vnav} = svnavToVNAV($word->{svnav});
}
}


It seems to work; here's the result (with @words and $FRICATIVES/etc filled in appropriately):
Ma evi, ke'u ke lu prrte' to fwa sim tuteot ayawne.
Slä txo tuteo fmi 'ivampi ngat ro seng, fu nìfya'o, a 'eykefu ngati vä', tsakem ke lu sìltsan.
Tsaw lu ngeyä tokx! Kawtu ke tsun nìmuiä 'ivampi ngat txo ngal ke new tsakemit.
Ha kempe si nga? Nì'awve, nga plltxe san kehe. Tsakrr, ngal tsatsengti hum!

Tuiq

It breaks for multiple words, for example


use utf8;
my @words = (
{
nav=> 'fyawìntxu',
ipa => "fja.w\$\\cdot\$\x{26a}n.\x{2c8}t'\$\\cdot\$u"
});
.

@parts = qw(fyawìntxu), @ipaparts = ("fja.w\$\\cdot\$\x{26a}n.\x{2c8}t'\$\\cdot\$u"), @sylls = ("fyawìntxu"). I'm not sure if desyll is failing right now, although it shouldn't.


Eana Eltu: PDF/TSV/jMemorize

Kä'eng

#149
Maybe desyll is having trouble with the fya syllable? I had set up $CONSONANTS so that it could match the legal consonant clusters:

$FRICATIVES = "f|h|s|ts|v|z";
$CONSONANTS = "'|(f|s|ts)?(kx?|l|m|n|ng|px?|r|tx?|w|y)";
$VOWELS = "a|ä|e|i|ì|o|u|rr|ll";
$DIPHTONGS = "w|y";


Actually, splitting into syllables is kind of overkill - you could just replace
my @sylls = SpeakNavi::desyll($parts[$i]);
with
my @sylls = split /(?=$VOWELS)/o, $parts[$i];
since for placing infixes, all that matters is the locations of vowels, not the exact syllables.
Ma evi, ke'u ke lu prrte' to fwa sim tuteot ayawne.
Slä txo tuteo fmi 'ivampi ngat ro seng, fu nìfya'o, a 'eykefu ngati vä', tsakem ke lu sìltsan.
Tsaw lu ngeyä tokx! Kawtu ke tsun nìmuiä 'ivampi ngat txo ngal ke new tsakemit.
Ha kempe si nga? Nì'awve, nga plltxe san kehe. Tsakrr, ngal tsatsengti hum!

Tuiq

In fact I'm using

our $VOWELS = "[eouìiaä]|ll|rr";
our $CONSONANTS = "[ptk]x|ng|[pmwtnrlkh']";
our $DIPHTONGS = "[ae][wy](?:$CONSONANTS)?";
our $FRICATIVES = "(?:(?:[fvszh]|ts)(?:[ptk]x?|rr|ll|[mnjw]|ng)?)";


That's what I got from reading Taronyu's grammar pdf.
Eana Eltu: PDF/TSV/jMemorize

Kä'eng

#151
Quote from: Tuiq on July 05, 2010, 05:02:58 PM
In fact I'm using

our $VOWELS = "[eouìiaä]|ll|rr";
our $CONSONANTS = "[ptk]x|ng|[pmwtnrlkh']";
our $DIPHTONGS = "[ae][wy](?:$CONSONANTS)?";
our $FRICATIVES = "(?:(?:[fvszh]|ts)(?:[ptk]x?|rr|ll|[mnjw]|ng)?)";

That's what I got from reading Taronyu's grammar pdf.

A few minor changes are necessary to be able to match all legal syllables:
in $CONSONANTS, y should be added
in $FRICATIVES, rr/ll/j should be replaced with r/l/y respectively
in $REGEX, (?:$VOWELS)(?:$DIPHTONGS)? should be replaced with (?:$DIPHTONGS|$VOWELS), since a syllable has one or the other, not both

I think that should give a syllabification for any legal word, which is enough to make infix placement work. (Still doesn't guarantee giving the correct syllabification, which is impossible to know from a word's spelling alone - but for infixes, this is irrelevant as all that matters is the vowels)
Ma evi, ke'u ke lu prrte' to fwa sim tuteot ayawne.
Slä txo tuteo fmi 'ivampi ngat ro seng, fu nìfya'o, a 'eykefu ngati vä', tsakem ke lu sìltsan.
Tsaw lu ngeyä tokx! Kawtu ke tsun nìmuiä 'ivampi ngat txo ngal ke new tsakemit.
Ha kempe si nga? Nì'awve, nga plltxe san kehe. Tsakrr, ngal tsatsengti hum!

Muzer

J could be in there to make the loanword "jakesully" happy (which IS in the dictionary - look it up :P).
[21:42:56] <@Muzer> Apple products used to be good, if expensive
[21:42:59] <@Muzer> now they are just expensive

Tuiq

After some painfully debugging the new infixsystem finally works. Thanks to Kä'eng once again.
Eana Eltu: PDF/TSV/jMemorize

Sh4rK


Tuiq

If you have noticed that the SQL file has become certainly bigger, it's because we now offer Hungarian as a new localized language. Thanks to Kifkeyä Nari and his team.

(Also, in the same turn, I removed allofixes. No idea why there were in there in the first place.)
Eana Eltu: PDF/TSV/jMemorize

Sh4rK


Tuiq

Postfixes, Prefixes. Things that change the meaning or type of a word.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

Changed the opensearch-xml-thing to be dynamic and fixed si. Once again.
Eana Eltu: PDF/TSV/jMemorize

Tuiq

Update. The noun inflections are now (like the infixes) read from the dictionary directly => really up to date now. Also, it's localized, means that if you translate to German you'll get German descriptions of the modifiers.

For the API does that mean that translate now has more "rmods" fields, better said "rmods$lc".
Eana Eltu: PDF/TSV/jMemorize