Porting Learn Na'vi to Android

hawnuyuna'viyä · June 03, 2010, 04:01:48 PM

I was considering making some changes to the search algorithm, so that you no longer have to tell it whether to search nì'Ìnglìsì or niNa'vi, but rather that it does both searches and then ranks each result using a form of the Levenshtein distance (how similar it is to the search request).

What are your thoughts on that?

omängum fra'uti · June 03, 2010, 11:12:50 PM

Quote from: hawnuyu na'viyä on June 03, 2010, 04:01:48 PM
I was considering making some changes to the search algorithm, so that you no longer have to tell it whether to search nì'Ìnglìsì or niNa'vi, but rather that it does both searches and then ranks each result using a form of the Levenshtein distance (how similar it is to the search request).

What are your thoughts on that?

How efficient would such a query be? I'm not familiar with such algorithms let alone their implementation against a SQL back-end. Admittedly the word list is quite small right now, but at some point it might be nice to expand the search to include all possible forms of the words after various affixes and other changes.

hawnuyuna'viyä · June 04, 2010, 04:01:51 PM

Quote from: omängum fra'uti on June 03, 2010, 11:12:50 PM
How efficient would such a query be? I'm not familiar with such algorithms let alone their implementation against a SQL back-end. Admittedly the word list is quite small right now, but at some point it might be nice to expand the search to include all possible forms of the words after various affixes and other changes.

I don't actually know how efficient it would be, the worst case scenario would be in O(M*N) (M & N are the lengths) which will need to be done once per result. If in the worst case someone searches for a single letter, which could easily return around 100 results per language, we are looking at ~200 to calculate. Since all of those will be O(M*1) it would actually be fairly quick to calculate.

Ultimately though, I don't know how much that will require of an embedded system. The only way is to try it and find out. I have started working on it on my local clone of the repository.

omängum fra'uti · June 04, 2010, 04:07:33 PM

Does it have to perform the O(MN) on every possible word, or does it narrow down the list somehow first?

Perhaps it might be a good idea to implement it for the full search, but keep the suggestion search simple since it basically runs every time you type a letter.

For the searching English vs Na'vi... My main concern is with potential confusion, especially on words like "new" - did you want "mip - new" or "new - want"... If I'm searching for a word, I know if it's a Na'vi or an English word I want to find, and switching modes is fairly simple. I actually had it search both originally (And the function to do so still exists) but I didn't like the result.

Edit: Oh yeah, would such an algorithm be suitable for rating words by total distance, and picking one at random weighted by distance? (Use case: Picking wrong alternate words for a multiple choice quiz.)

hawnuyuna'viyä · June 04, 2010, 05:11:55 PM

Quote from: omängum fra'uti on June 04, 2010, 04:07:33 PM
Does it have to perform the O(MN) on every possible word, or does it narrow down the list somehow first?

Perhaps it might be a good idea to implement it for the full search, but keep the suggestion search simple since it basically runs every time you type a letter.

For the searching English vs Na'vi... My main concern is with potential confusion, especially on words like "new" - did you want "mip - new" or "new - want"... If I'm searching for a word, I know if it's a Na'vi or an English word I want to find, and switching modes is fairly simple. I actually had it search both originally (And the function to do so still exists) but I didn't like the result.

Edit: Oh yeah, would such an algorithm be suitable for rating words by total distance, and picking one at random weighted by distance? (Use case: Picking wrong alternate words for a multiple choice quiz.)

It would have to perform it on the whole list.
The confusion between English or Na'vi words is only a case for a very small subset of the total words. In those cases, both words would show up. so we either have the returned results in two categories, or we use the same system with the larger text being Na'vi and the smaller English throughout.
The algorithm is designed to weight by similarity, so could be used for finding wrong alternate words (would just have a slightly higher distance than the correct word).

omängum fra'uti · June 04, 2010, 05:55:31 PM

I like the idea of the search methodology, but the cost and scalability on a mobile device concern me. Perhaps its not to bad, since it already has to search the entire list on a query. And I like that, theoretically, it would be able to find infixed words. I can't wait to try out what you come up with!

Muzer · June 04, 2010, 06:31:02 PM

If it turns out to be too much for a mobile device, couldn't we move all the processing to a server? We'd still keep something like what we've got now so that if you have no signal you can still use the dictionary in its most basic form, but for more complex requests we could query a server with much more processing power/storage space, and it will send back results.

omängum fra'uti · June 04, 2010, 06:43:32 PM

The network latency of the typical mobile device doesn't lend itself well to a fulfilling interactive experience. Even if its not efficient, it's likely to be faster than a mobile network.

On a phone, efficiency also translates to battery life. And network traffic drains the battery too.

Sіr. Ηaxalot · June 04, 2010, 06:48:32 PM

The biggest problem will probably be the devices which doesn't receive the 2.2 Froyo update. Which also tends to be slower devices. And such algorithms is a excellent example of where the Dalvik engine currently used by Android sucks.

omängum fra'uti · June 04, 2010, 06:51:53 PM

Well I've got a relatively low end phone which is currently stuck at 1.5 as my test bed (Allegedly to be upgraded to 2.1 Q3 this year, so probably sometime late 2011) so if it runs great on my phone, it should be sweet.

However, there is always the possibility of implementing select algorithms in native code and making native calls. The Android port of SCUMMVM took that to the extreme and implemented the entire thing as a JNI library, with a think java wrapper to the Android SDK. It would just have to be judiciously designed so as to reduce the java to native transitions.

Muzer · June 05, 2010, 07:56:56 AM

HTC Hero for me - 1.5 I think

hawnuyuna'viyä · June 05, 2010, 10:43:57 AM

Quote from: omängum fra'uti on June 04, 2010, 05:55:31 PM
I like the idea of the search methodology, but the cost and scalability on a mobile device concern me. Perhaps its not to bad, since it already has to search the entire list on a query. And I like that, theoretically, it would be able to find infixed words. I can't wait to try out what you come up with!

Finding infixed words would be a very simple -and useful- way to use the algorithm.
Unfortunately, for that to be feasable, we can't do a query on the sqlite database and then perform the distance calculations, because the query on the database, would not return the word.
The only way to do it would be to perform the distance calculations on effectively the whole database (twice!). Although it wouldn't have to be that bad, it would work just as well if it was done on all the words that start with the same letter.
E.g. to find tolaron: query the database to get all words in both english and na'vi that start with t. then do distance calculations. The likely root (taron) would have a distance of 2deletions (so if weighted, so that other operations are 'more difficult', that would stand out).

Having started work, I have noticed that another problem is words such as kxeyey, which have 2 possible translations. This requires breaking down the 2 translations around the comma, trimming the whitespace and then doing a distance for both.

Quote
However, there is always the possibility of implementing select algorithms in native code and making native calls. The Android port of SCUMMVM took that to the extreme and implemented the entire thing as a JNI library, with a think java wrapper to the Android SDK. It would just have to be judiciously designed so as to reduce the java to native transitions.

Creating the distance function in a native library could be something to look into if the JVM is too slow.

Sіr. Ηaxalot · June 05, 2010, 11:12:42 AM

I tried the algoritm on my old HTC Diamond, with a 528Mhz CPU.

When I compared on random string to 1000 other random strings, it took ~800 millisecounds.

This is with the .NET CF Platform which should have similar performance with the new Dalvik JIT engine.

So I guess that the performance on "mid-end" devices will be somewhere around 2000ms for a list of 1000 words. This is a pure guessing and it might take less or even more time. As said, a native library for this might be a good idea.

hawnuyuna'viyä · June 05, 2010, 03:51:44 PM

Preliminary testing suggests it will take ~1s to calculate.
Here is the signed build I tested on my 400MHz HTC Kaiser.
NOTE: This is not a release build, is not optimized and has some feature regressions.

~~Please test on your own devices.~~ Don't. This implementation doesn't work correctly.
This is a JVM implementation - going native may be worth it.

The number attached to the definitions is the distance.
The weight for substitution is 2, for all other actions: it is 1.
It only shows items with a threshold < 4, yet still does their calculations.
These numbers *will* need tweaking to make it work effectively.

omängum fra'uti · June 06, 2010, 07:16:08 AM

It seems a touch faster than 1s on my 528mhz backflip, snappy enough I'd call it usable, but it might be worth mocking up a huge (10k word) dictionary and seeing how it scales.

The one thing that worries me a little is the lack of substring search. For the Na'vi that may or may not be a problem... It certainly can help people searching for part of a word when they aren't sure of the root. For English with the way it's currently laid out, that wouldn't work well. From my understanding of the algorithm that may or may not be easy to fix. The results that come back also seem odd to me. If it came back sorted by distance it might make more sense, but since the list is sorted alphabetically, not so much.

I'm not sure off the top of my head how that can be improved though.

hawnuyuna'viyä · June 06, 2010, 07:27:41 AM

Quote from: omängum fra'uti on June 06, 2010, 07:16:08 AM
It seems a touch faster than 1s on my 528mhz backflip, snappy enough I'd call it usable, but it might be worth mocking up a huge (10k word) dictionary and seeing how it scales.

The one thing that worries me a little is the lack of substring search. For the Na'vi that may or may not be a problem... It certainly can help people searching for part of a word when they aren't sure of the root. For English with the way it's currently laid out, that wouldn't work well. From my understanding of the algorithm that may or may not be easy to fix. The results that come back also seem odd to me. If it came back sorted by distance it might make more sense, but since the list is sorted alphabetically, not so much.

I'm not sure off the top of my head how that can be improved though.

1) Just realised that my implementation doesn't properly work.

"tivaron" and ""taron return a distance of 10. Should be 2!
Fixing now.
2) Do you want to mock up a larger dictionary to test it against then?
3) Look at the little numbers, it is sorted by distance. It seems a little strange the ordering at first because: used to the old system, shorter words rank higher.
4) What do you mean by lack of substring search? "day" not returning yesterday?

omängum fra'uti · June 06, 2010, 07:31:49 AM

When you get it fixed I'll play around on an emulator (It's too much of a pain to install off market apps on my actual phone, and functionality testing doesn't matter for actual hardware). It might be worthwhile to create a branch for working on testing search changes as well, so the trunk can stay clean until it's ready, but the code can still be checked in.

However when I searched for anything shortish starting with "a", it would give "a" as the first option. When I put in part of a word, if it wasn't a significant enough part, it didn't seem to find the whole word.

omängum fra'uti · June 06, 2010, 07:32:49 AM

BTW on a different topic, I got to poke around the iPhone app finally today and made a small cosmetic tweak to the animations, as well as setting an option which should DRASTICALLY smooth them out. I've got another small change or two I may do as well.

hawnuyuna'viyä · June 06, 2010, 07:52:26 AM

Quote from: omängum fra'uti on June 06, 2010, 07:31:49 AM
When you get it fixed I'll play around on an emulator (It's too much of a pain to install off market apps on my actual phone, and functionality testing doesn't matter for actual hardware). It might be worthwhile to create a branch for working on testing search changes as well, so the trunk can stay clean until it's ready, but the code can still be checked in.

However when I searched for anything shortish starting with "a", it would give "a" as the first option. When I put in part of a word, if it wasn't a significant enough part, it didn't seem to find the whole word.

Ok. here is a debug signed one.
It isn't a problem for me, I can connect to mine via adb and deploy, just as I would to an emulator.

But since we know it has acceptable performance, using the emulators is fine.
The problem was my implementation of weights which completely destroyed the algorithm. Removed those and it now works. Try "tivaron".
I will check it into a new branch later. I really should be revising for my GCSEs tomorrow!

Seze · June 06, 2010, 10:43:28 AM

Quote from: omängum fra'uti on June 06, 2010, 07:32:49 AM
BTW on a different topic, I got to poke around the iPhone app finally today and made a small cosmetic tweak to the animations, as well as setting an option which should DRASTICALLY smooth them out. I've got another small change or two I may do as well.

Are you changing the iPhone version or changing the Android version to better match? I didn't see any changes in the iPhone trunk is why I ask.