VrrtepCLI

Started by Tirea Aean, May 22, 2011, 03:40:58 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Tirea Aean

#400
Quote from: Blue Elf on April 23, 2012, 07:48:24 AM
How to discover what word is considered illegal? Can you detect all affixes?
[root@fedora ~]# vrrtepcli -g -sent="Fko tsun wivan pxaya 'ut mì fìlì'u"
Vrrtep Analytics

words:
[1]POS=e.           WORD=Fko 
[2]POS=vim.         WORD=tsun
[3]POS=vtr.         WORD=wivan
[4]POS=adj.         WORD=pxaya
[5]POS=n.           WORD='ut 
[6]POS=adp.         WORD=mì 
[7]POS=n.           WORD=fìlì'u

clauses:
[<=7]                Fko tsun wivan pxaya 'ut mì fìlì'u

INVALID: illegal word(s) used.


the word that it condems illegal is the one whose part of speech is "e." for error. The reason it thinks Fko is an error is because it is only trained to read lowercase input. I will fix that soon.

EDIT: Also, it does all affixes EXCEPT the newest set of prefixes/suffixes from the blog. I will add those in soon.

DOUBLE EDIT: the output I get is:


coreys1@eee ~ $ vrrtepcli -g -sent="fko tsun wivan pxaya 'ut mì fìlì'u"
Vrrtep Analytics

words:
[1]POS=pn.          WORD=fko 
[2]POS=vim.         WORD=tsun
[3]POS=vtr.         WORD=wivan
[4]POS=adj.         WORD=pxaya
[5]POS=n.           WORD='ut 
[6]POS=adp.         WORD=mì 
[7]POS=n.           WORD=fìlì'u

clauses:
[<=7]                fko tsun wivan pxaya 'ut mì fìlì'u

transitive validation:
[1] clause has 0 agent(s) and 1 patient(s)
valid.
verb/modal order validation:

coreys1@eee ~ $


AS you can see, this module of the program is by far the buggiest. and also disgustingly inefficient in runtime when it gets to the validation portion. I need to do much work with -g option.

Blue Elf

Today I crashed on quiz game (1.94.2 running on windows XP):
Traceback (most recent call last):
  File "vrrtepcli.py", line 407, in <module>
  File "vrrtepcli.py", line 393, in main
  File "scramble.pyc", line 117, in game
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte

Not sure what is wrong, maybe data files? character 0xC3 in CP437 is "├" what looks strange...
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


Tirea Aean

please when posting bug reports include entire output not just the python stack trace, exactly like this:

Quote from: Blue Elf on April 23, 2012, 07:48:24 AM
[root@fedora ~]# vrrtepcli -g -sent="Fko tsun wivan pxaya 'ut mì fìlì'u"
Vrrtep Analytics

words:
[1]POS=e.           WORD=Fko 
[2]POS=vim.         WORD=tsun
[3]POS=vtr.         WORD=wivan
[4]POS=adj.         WORD=pxaya
[5]POS=n.           WORD='ut 
[6]POS=adp.         WORD=mì 
[7]POS=n.           WORD=fìlì'u

clauses:
[<=7]                Fko tsun wivan pxaya 'ut mì fìlì'u

INVALID: illegal word(s) used.


ths way, i know exactly how to replicate to bug to confirm it, amd I know not just What happened, but more also about when and how.

irayo :)



Tirea Aean

I think 0xc3 is only half of a unicode character.

\xc3\xac is "ì"
\xc3\xa4 is "ä"
\xc3\x8c is "Ì"

in Python.

Your issue here in windows may be related to unicode parsing. We have seen in this thread before that windows has a problem with capital grave i. I am still not entirely sure what the issue is exactly here.

Swoka Ikran

Quote from: Tirea Aean on April 25, 2012, 09:28:55 AM
Your issue here in windows may be related to unicode parsing. We have seen in this thread before that windows has a problem with capital grave i.
Specifically, CP437 doesn't have the capital ì. Unicode errors from not being able to show something are different, so that's likely not the issue.

As TA said, this error is because you have half of a character. The second byte is missing/invalid. A corrupt dictionary is probably to blame.
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

Blue Elf

Quote from: Tirea Aean on April 25, 2012, 09:28:55 AM
I think 0xc3 is only half of a unicode character.

\xc3\xac is "ì"
\xc3\xa4 is "ä"
\xc3\x8c is "Ì"

in Python.

Your issue here in windows may be related to unicode parsing. We have seen in this thread before that windows has a problem with capital grave i. I am still not entirely sure what the issue is exactly here.
Ok - it could be caused by 'Ìnglìsì, but IMO it was already fixed??
C:\WINDOWS>vrrtepcli -l English
Vrrtep CLI v1.94.2 by Tirea Aean
Windows version by Swoka Ikran
Standalone version

Query matches:
n. 'ìnglìsì

No crash.... So from this point of view it really looks like datafile corruption ???
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


Blue Elf

And another small problem - "lok" as adposition is not found...
C:\Windows\System32>vrrtepcli -l
Vrrtep CLI v1.94.2 by Tirea Aean
Windows version by Swoka Ikran
Standalone version

vrrtep:> close to
Query matches:
adp. lok

C:\Windows\System32>vrrtepcli lok
Vrrtep CLI v1.94.2 by Tirea Aean
Windows version by Swoka Ikran
Standalone version

v. approach

Pelun?
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


Tirea Aean


coreys1@eee ~/base $ grep ";close to;" localizedWords.txt
4024;eng;close to;adp.
coreys1@eee ~/base $ grep ";approach;" localizedWords.txt
1008;eng;approach;v.
coreys1@eee ~/base $ grep ";lok;" metaWords.txt
1008;lok;l·ok̚;l<1><2><3>ok;v.
4024;lok;l·ok̚;\N;adp.


As you can see, lok is in there twice, and with different ID and definition. That's the problem.

Vrrtep, when translating from nav->eng it finds the first thing that matches and returns. when it goes from eng->nav, it loops and catches a list of things that seem to match.

Blue Elf

I see... can it be changed? IMHO there can be more such words like "lok", I can think just now about "ftxey"
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


Tirea Aean

Ftxey is listd twice? O.o being listed twice should be rare. If not a mistake. A word with two uses/definitions is still one word. Imo both defs shoupd be listed under one entry. Thatll also solve vrrtep's problem. Maybe Ill raise this in the dict thread.

Blue Elf

words are list more times probably because of being different word types (v., adp.), so it is easier for reader to notice the difference. But discussion about this would be usefull
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


Tirea Aean

We already have more instances of multiple typing in single entries than double entries, I think.

Blue Elf

A question: dictionary was regenerated and vrrtepcli data updated to 12.83. However new words from last Naviteri posts are not present. Srake tsun nga tìng tsaru nari?
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


Tirea Aean

I know, I noticed that immediately after updating. Seems The dictionary was edited but the database had not regenerated its SQL file yet. I need to do it again. I'll do it today.

Tirea Aean

New code committed, DICTIONARIES UPDATED! :D

Blue Elf

Quote from: Tirea Aean on October 22, 2012, 09:37:16 AM
New code committed, DICTIONARIES UPDATED! :D
Approved :) just words from the very new Naviteri post are missing (but they weren't merged into dict yet...)
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)


`Eylan Ayfalulukanä

Just as a note, ftxey has been listed twice for a very long time. AFAIK, this is the only word listed in this manner.

Yawey ngahu!
pamrel si ro [email protected]

Tirea Aean

Quote from: `Eylan Ayfalulukanä on November 02, 2012, 03:01:18 PM
Just as a note, ftxey has been listed twice for a very long time. AFAIK, this is the only word listed in this manner.

a and talun are others listed twice. I think I see the issue here. It chooses the first one listed every time. Which isn't really good. But it is so rare that a word is listed twice. Considering a fix. Thanks for bringing it up. :)

Tirea Aean

IMPORTANT NEWS UPDATE:

Semicolons have been spotted in the Database definition column. This means that VrrtepCLI should no longer use fields terminated by ";" dumps. I have CHANGED the latest set of dictionaries to be fields terminated by "\t" dumps.

your copy of vrrtepcli.py, and grammar.py must be updated. Except if you have not updated your dictionaries with -u yet.

Windows Standalone version will need to be recompiled by Swoka Ikran after I commit my changes. Because this is a new system and I do not have eclipse installed, and I want to get this update out IMMEDIATELY, I am attaching the scripts to this post. I will make the commits to the repository when I can.

Thank you all for using vrrtep. :D

Blue Elf

Quote from: Tirea Aean on December 08, 2012, 12:11:21 PM
Windows Standalone version will need to be recompiled by Swoka Ikran after I commit my changes.
Rutxe, do it SFSZ as I updated datafiles, but now vrrtepcli on Windows doesn't work (as data format changed..... :()
Oe lu skxawng skxakep. Slä oe nerume mi.
"Oe tasyätxaw ulte koren za'u oehu" (Limonádový Joe)