Author Topic: Corpus Word Frequency  (Read 1179 times)

0 Members and 1 Guest are viewing this topic.

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Corpus Word Frequency
« on: February 05, 2010, 06:26:57 pm »
I was curious, so for no good reason I did this:

     16 oe
     13 oel
      6 oeyä
      4 oeru
      2 oer
      1 oeti
      1 oeri
      1 oeng
      1 oehu
      1 oheru
      1 ohel

      1 awngeyä
      1 awngar
      1 ayoer(u)
      1 ayoeng

      5 nga
      5 ngahu
      4 ngati
      3 ngaru
      1 ngeyä
      1 ngari
      1 ngal

      5 ayngaru
      2 ayngeyä
      1 ayngati
      1 ayngari
      1 ayngar
      1 ayngal
      1 aynga
      1 nìaynga

     11 lu
      1 livu
      1 layu
      1 längu

      9 a

      3 lena'vi
      2 nìna'vi
      1 na'viyä
      1 na'viru
      1 na'vi

      5 kameie
      3 kìyevame

      7 new

      2 lì'fyayä
      1 lì'fyari
      1 aylì'ufa
      1 fya'ot
      1 tìfyawìntxuri
      1 fyawivìntxu

      2 sengi
      2 seiyi
      1 sivi
      1 si

      5 ulte

      5 tsun

      5 ma

      4 ta

      4 kivä

      4 eywa

      3 pivlltxe
      1 plltxe

      2 lefpom
      2 fpom

      2 fìtsenge
      2 fìtseng

      2 aylì'ut
      2 aylì'u

      2 pivängkxo
      1 tirea
      1 tireapivängkxo

      3 zene

      3 srak

      3 slä

      3 nìwotx

      3 kaltxì

      3 futa

      2 perey
      1 pivey

      1 tìyìng
      1 tivìng
      1 tayìng

      2 ye'rìn

      2 tsap'alute

      2 tsakrr

      2 sì

      2 mì

      2 kxanì

      2 ke

      2 fpole'

      2 fayvrrtep

      2 eylan

      2 'upxaret

      2 'awsiteng

      2 fì'u

      2 prrte'

      1 zamolunge
      1 zamivunge

      1 uniltìrantokxit
      1 uniltaron

      1 tì'eyngit
      1 tì'eyng

      1 tolaron
      1 taronyu

      1 slu
      1 slivu

      1 kxeyeyri
      1 kxeyey

      1 kifkeyit
      1 kifkeyä

      1 ayftxozä
      1 ayftozä

      1 'eylanä
      1 'eylan

      1 trrit
      1 letrra

      1 sänume
      1 nume

      1 tìrey
      1 rivey

      1 sìltsana
      1 nìltsan

      1 pìyähem
      1 pamähängem

      1 zera'u
      1 za'u
      1 yerikit
      1 yawne
      1 vitrautral
      1 vay
      1 txo
      1 txana
      1 tsnì
      1 tsmukan
      1 tolel
      1 tokx
      1 tìyawnit
      1 tìkawng
      1 tìkangkem
      1 tìftia
      1 teya
      1 tewti
      1 te'lan
      1 tawsìp
      1 sweya
      1 stolawm
      1 spivaw
      1 sngeltseng
      1 skiva'a
      1 sìlpey
      1 sìfmetokit
      1 set
      1 sawtute
      1 saleu
      1 rutxe
      1 pxìm
      1 pxasìk
      1 prrton
      1 pohu
      1 peyä
      1 payeng
      1 paye'un
      1 pawl
      1 ontu
      1 omum
      1 nulnivew
      1 nìtxan
      1 nìteng
      1 nìftxavang
      1 nìawnomum
      1 nì'ul
      1 na
      1 mun'i
      1 mivakto
      1 krr
      1 kelutralti
      1 kea
      1 kawtu
      1 karyusì
      1 ireiyo
      1 irayo
      1 ikran
      1 hu
      1 horentisì
      1 hivum
      1 herahaw
      1 hapxì
      1 fte
      1 frapor
      1 foru
      1 fol
      1 fohu
      1 fmawn
      1 fìtxan
      1 fìskxawngìri
      1 fahew
      1 eywa'evengä
      1 eytukan
      1 eylanur
      1 eo
      1 emzola'u
      1 ayskxe
      1 ayeylanur
      1 ätxäle
      1 äo
      1 akewong
      1 a'ewan
      1 'ivong
      1 'ì'awn
      1 'efu

The source was the Corpus page, which doesn't include all of the Na'vi from the Frommer emails, and of course doesn't include any of the movie dialog (except The Jake Page) or the ASG songs etc etc.  Plus I probably made some mistakes.  Take it for what it is, which isn't much, except that I found it kind of interesting.

Stay tuned for something slightly more interesting.

  - Eri

Offline omängum fra'uti

  • Moderator Emeritus
  • Palulukan Makto
  • *****
  • *
  • Posts: 3804
  • Karma: 127
  • Na'vi's first grammar nazi
    • Pronounced Na'vi words
Re: Corpus Word Frequency
« Reply #1 on: February 05, 2010, 07:04:01 pm »
More interesting would be to do the same thing counting the words w/o inflections, just going off the roots.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #2 on: February 05, 2010, 08:41:18 pm »
More interesting work would be to do the same thing counting the words w/o inflections, just going off the roots.

Fixed that for ya.

  - Eri


(... or, since I've already grouped them by roots (kinda, mostly, close enough) you could just add 'em up by eyeball.  For example, "oe" is, um, lemme see... carry the 2... ... 47!  Not so hard after all!)

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #3 on: February 06, 2010, 01:11:20 am »
And now the slightly more interesting part.  Same thing as before, but with a different source.  I used the Na'vi Only forum as my corpus - all the Na'vi in all the posts in all the threads in that board.

There were ~ 8,000 words (total, not unique).  Of those, ~5,000 (about 63%) are accounted for by just seventy-one root words and their various inflected/derived/compounded forms.  And here they are:

    532 oe
      1 moe
      9 ayoe
      1 oeng
     51 ayoeng
     31 awnga
    173 nga
      6 menga
     38 aynga
     87 po, frapo, 'awpo, kepo, etc.
     24 fo, ayfo

    413 lu
    259 li'u
    240 ke
    204 a
    160 plltxe
    158 san
    146 sìk
    131 futa, fì'u
    111 slä
    105 ulte
     98 tsun
     98 Na'vi
     94 krr, tsakrr, frakrr, etc.
     92 si
     87 tseng, fìtseng, tsatseng, etc.
     76 tslam
     68 sì
     64 fko
     62 sìltsan, nìltsan, etc.
     60 mì
     59 ma
     57 txo
     56 ta
     54 ral
     53 srak
     52 zene
     52 new
     49 tok
     47 fu
     46 tìng
     42 set
     41 fpìl
     40 omum
     39 nì'aw
     37 nìtxan
     37 irayo
     36 rutxe
     35 fpi
     33 nì'ul
     33 ha
     33 fte
     31 ne
     30 nìwotx
     30 lahe
     28 nìteng
     28 muiä
     28 kxawm
     27 fa
     25 tsnì
     25 kop
     23 prrte'
     22 vay
     21 na
     20 nìmun
     19 nìhawng
     18 fìfya
     17 txoa
     17 srane
     17 prrton
      1 buzzlightyear


That means that someone who learns those seventy-one words (plus or minus two :) ) and who has a reasonable understanding of Na'vi grammar will be able to read about two-thirds of the Na'vi Only forum - said forum being the foremost location on this planet for conversation in Na'vi, with participation by the penultimate human Na'vi speakers.  Two-thirds is enough to start glipping the rest from context, or at least to cut way down on the amount of dictionary lookups you have to do.

So the next time a beginner asks "what words should I learn first?" - the answer is those words right there.

  - Eri

Offline AuLekye'ung

  • Uniltìranyu
  • **
  • Posts: 181
  • Karma: 9
  • Insane Drum
Re: Corpus Word Frequency
« Reply #4 on: February 06, 2010, 01:21:35 am »
Fascinating.  That is actually a very interesting bit of information.

EDIT:  Also, you do know penultimate means next to last, right?
Quote
.... said forum being the foremost location on this planet for conversation in Na'vi, with participation by the penultimate human Na'vi speakers.
« Last Edit: February 06, 2010, 01:35:39 am by Keye'unga Au »
Txo *fìzìsìst*it oel ke lu, kxawm oel tutet lepamtseo lu.  Oe pxìm fpìl nìpamtseo, oel rey letrra ayunil oeyä nìpamtseo.

- Älpert Aynstayn

Offline Eyaye Tskxe

  • Tawtute
  • *
  • Posts: 90
  • Karma: 10
  • Still learning...
Re: Corpus Word Frequency
« Reply #5 on: February 06, 2010, 01:47:52 am »
      1 buzzlightyear
...*snip*...
So the next time a beginner asks "what words should I learn first?" - the answer is those words right there.

Yes... A very important word to know. Now if only we knew Woody, that way we could dub over Toy Story. :P

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #6 on: February 06, 2010, 01:52:29 am »
Fascinating.  That is actually a very interesting bit of information.

Seiyi oe irayo!

EDIT:  Also, you do know penultimate means next to last, right?

But of course.  Surely you know who the ultimate speaker is, yes?  Thus the best of the rest of us must be, at best, the penultimate speakers.

  - Eri


(not to be confused with the pre-antepenultimate ones... the ones before the ones before the ones before the very last ones :) )

Offline Kaltxì Palulukan!

  • Palulukan Makto
  • *****
  • Posts: 1391
  • Karma: 182
  • My job is to teach 100000 people Na'vi-Wanna help?
    • AdvancedTarotSecrets.com
Re: Corpus Word Frequency
« Reply #7 on: February 06, 2010, 02:06:36 am »
Cookie to you Erimeyz.

I did something similar (letter frequency count) in Na'vi (by hand! Ugh!). This was very early on, and like a dope I made x its own letter. The point is I feel your work load. I got crapped on too for not going back and doing the whole of it from scratch, but it was an amusing piece of information. I have no idea what value any of it will be, but it is good to know. When I say "thanks for the effort," I seriously mean that. Researching arcane tidbits of information about this (growing) language is time consuming, but strangely addictive.
World's first na'vi podcast is here: http://media.podcastingmanager.com/9/0/7/3/4/253192-243709/Media/ATA-1.mp3

New Na'vi FUN activity book is here: Please click here to download your own (free) copy! I help you omum Na'vi! :-)

LOVE YOUR VEGGIES! Don't EAT them!     ----     Before Apollo there was Gaia.

[img]http://i22.photobucket.com/albums/b31

Offline AuLekye'ung

  • Uniltìranyu
  • **
  • Posts: 181
  • Karma: 9
  • Insane Drum
Re: Corpus Word Frequency
« Reply #8 on: February 06, 2010, 02:22:48 am »
Quote
EDIT:  Also, you do know penultimate means next to last, right?

But of course.  Surely you know who the ultimate speaker is, yes?  Thus the best of the rest of us must be, at best, the penultimate speakers.

  - Eri


(not to be confused with the pre-antepenultimate ones... the ones before the ones before the ones before the very last ones Smiley )

Well, what definition of "ultimate" are you using?  It could be Frommer, as he is the best speaker, or it could be people that haven't even seen the movie, so they would know nothing.  Best, or final.  Penultimate uses the "final" definition of ultimate.  Next to last.  So therefore, the "penultimate speakers" would be the people that have perhaps just started learning.  The best of us would therefore be the, rather boringly, second-best.

Interestingly, I've never found words for "first" or "second" or "last" in Na'vi.  Have I just missed them somehow?

I'm going to stop here as this is the learnNa'vi site and not the argueoverEnglish site, but I'm really bad about this sort of thing.  Oe tsap'alute si.
Txo *fìzìsìst*it oel ke lu, kxawm oel tutet lepamtseo lu.  Oe pxìm fpìl nìpamtseo, oel rey letrra ayunil oeyä nìpamtseo.

- Älpert Aynstayn

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #9 on: February 06, 2010, 02:56:55 am »
Well, what definition of "ultimate" are you using?

Well, yes, there's a bit of wordplay going on here - "ultimate" in the sense of "best" (a somewhat colloquial corruption of the original meaning) versus "ultimate" in the sense of "last of a series", with "penultimate" generally only being applicable to the latter sense, probably because the term is obscure enough that it never underwent the colloquial corruption that its shorter relative did... and hence arises the whimsy in my description of the Learn Na'vi Na'vi learners as "penultimate", as it (quite deliberately, if perhaps not quite transparently) invokes both senses at once.

Ah, well, it's not the first time my little jokes have fallen flat.  At least the buzzlightyear inclusion got a giggle out of someone. :)

Oe tsap'alute si.

Po leke'u lu.

  - Eri

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #10 on: February 06, 2010, 03:01:27 am »
Interestingly, I've never found words for "first" or "second" or "last" in Na'vi.  Have I just missed them somehow?

I think we all miss them, dearly.

We've got words for numerals, but we don't actually know how to use them, either as ordinals or cardinals.  We can say "five", but we can't say "five spears", let alone "the fifth spear".  There's reasonable guesses for both, but they're just guesses... and your guess is as good as anybody else's.

  - Eri

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #11 on: February 06, 2010, 03:06:58 am »
Cookie to you Erimeyz.

Yum!  Thanks!

(Yom! Irayo!)

I did something similar (letter frequency count) in Na'vi (by hand! Ugh!). This was very early on, and like a dope I made x its own letter. The point is I feel your work load. I got crapped on too for not going back and doing the whole of it from scratch, but it was an amusing piece of information. I have no idea what value any of it will be, but it is good to know. When I say "thanks for the effort," I seriously mean that. Researching arcane tidbits of information about this (growing) language is time consuming, but strangely addictive.

I remember your letter frequency chart!  I thought it was very, very cool.

I got lucky (or lazy) with this word frequency thing... I only had to do part of it by hand. :)  Like your chart, it was an obsessive-compulsive response to an idle curiosity a labor of love, and I had fun doing it.  Thanks for the kind words!

  - Eri

Offline Nume fpi sänume

  • Moderator Emeritus
  • Palulukan Makto
  • *****
  • Posts: 1487
  • Karma: 64
  • Like a Boss.
    • Project One FM
Re: Corpus Word Frequency
« Reply #12 on: February 06, 2010, 03:49:26 am »
Yep, awesome list. I also dont know what purpose it will serve, but im sure at some point it will come in handy for some reason. Thanks for your work on this :)

Offline omängum fra'uti

  • Moderator Emeritus
  • Palulukan Makto
  • *****
  • *
  • Posts: 3804
  • Karma: 127
  • Na'vi's first grammar nazi
    • Pronounced Na'vi words
Re: Corpus Word Frequency
« Reply #13 on: February 06, 2010, 06:09:07 am »
I got crapped on too for not going back and doing the whole of it from scratch
Ayoeru txoa livu if you felt like that from the comments.  They were meant purely as suggestions for how it could be more a more accurate reflection.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Offline Kaltxì Palulukan!

  • Palulukan Makto
  • *****
  • Posts: 1391
  • Karma: 182
  • My job is to teach 100000 people Na'vi-Wanna help?
    • AdvancedTarotSecrets.com
Re: Corpus Word Frequency
« Reply #14 on: February 06, 2010, 11:16:50 am »
I got crapped on too for not going back and doing the whole of it from scratch
Ayoeru txoa livu if you felt like that from the comments.  They were meant purely as suggestions for how it could be more a more accurate reflection.

I'm probably a whiny baby. Too much work. (6 hours last night on the book--and probably another 8 today.) I have been at this stupid activity guide for a month now and the more pages I complete, the more I realize how much there is that needs to be spelled out. If I had three months to really immerse myself in Na'vi affixes, grammar, and terminology I can't even pronounce, I *might* be able to produce a guide to learning (the basics of--what we know of) Na'vi that was fun to read and easy to accomplish mastery in. I need to toughen up and stop complaining. Thanks for all of the suggestions. I get so frustrated at this language sometimes. I can help people really get a solid grasp of "part a" (let's say one third of the equation) and no sooner than I am examining it for errors then I see gaping holes where "it could be improved by a massive lecture on infixes, tenses, particle matter, prepositional propositions, adposition positioning, and who can forget Gramma's grammar etiquette?" I swear... just a few more days (and 20 more pages), and I am done with (I can't even call it teaching--more like sharing root knowledge). I have something much more fun and less stressful I am working on that will hopefully bring Avatar fans together (here). I am happy that several people are attempting some kind of "Na'vi for beginners" projects. They can have at it!
« Last Edit: February 06, 2010, 11:19:24 am by Kaltxì Palulukan! »
World's first na'vi podcast is here: http://media.podcastingmanager.com/9/0/7/3/4/253192-243709/Media/ATA-1.mp3

New Na'vi FUN activity book is here: Please click here to download your own (free) copy! I help you omum Na'vi! :-)

LOVE YOUR VEGGIES! Don't EAT them!     ----     Before Apollo there was Gaia.

[img]http://i22.photobucket.com/albums/b31

Offline Erimeyz

  • Taronyu
  • ****
  • Posts: 555
  • Karma: 33
Re: Corpus Word Frequency
« Reply #15 on: February 07, 2010, 08:07:21 am »
I also dont know what purpose it will serve, but im sure at some point it will come in handy for some reason.

Here's a pretty good purpose for it, I do believe. :)

  - Eri

 

Become LearnNavi's friend on Facebook Follow LearnNavi on Twitter! Watch LearnNavi's videos on YouTube

SMF 2.0.17 | SMF © 2017, Simple Machines | XHTML | RSS | WAP2 | Site Rules

LearnNavi is not affiliated with the official Avatar website,
James Cameron, LightStorm Entertainment or The Walt Disney Company.
All trademarks and servicemarks are the properties of their respective owners.
Images in the LearnNavi.org Forums and Gallery may not be used without permission.

LearnNavi Affiliates:
ToS

LearnNavi is the community to learn Na'vi, the Avatar Language
"A place where real friendships are made." -Paul Frommer

AvatarMeet | Learn Na'vi Forum | Learn Na'vi Wiki | Na'viteri

LearnNavi