Robert's Blog

I Think I’m Learning Japapese

Posted in numbers, other languages by mrob27 on 2011.01.23

I Think I’m Learning Japanese

(I Only Think So)

2011 Jan 21st

“Munafo” = “Intellectual Prince”?

(from, a novelty yojijukugo generator)

For various reasons, Japanese language and culture have interested me throughout my life. (A few of the reasons, like トトロ, are commonly known and will be familiar to readers. Others, perhaps less so (possibly NSFW themes and language).)

For many years I only knew a few basic facts: there are three alphabets, all derived originally from Chinese, plus the Hindu-Arabic numerals and our own Latin alphabet, and you pretty much have to learn all of them to get along in daily life. The pronunciation of two of the alphabets is simple and logical, but the third (kanji, the Chinese characters) includes thousands of special cases with little structure or pattern.

The depth and complexity of Japanese, for which it is justly famous, have kept me from doing much more — until just recently, when I have decided to adopt a religion [1] and consequently travel to Japan to visit the head temple.

When I travel to a place that speaks a different language, I want to be able to read and write certain basic things: numbers, times and dates, the address(es) where I will be, etc. Apart from being prepared (the State Department emphasizes the importance of such things), it is fun, and seems to me an act of basic courtesy to try to learn at least a little bit of the local culture.

My ordinary way of learning involves a lot of computer and Internet use. Methods of entering Japanese katakana and hiragana are relatively direct and obvious. You can type totoro to get トトロ (in katakana, as this is a nontraditional proper name), or fujisann to get ふJiさん or 富士山 (the native pronunciation and spelling, respectively, of the famous mountain overlooking the temple I will be visiting [2]

entering katakana entering katakana entering hiragana entering hiragana

However, typing in katakana and hiragana is only good enough for looking up proper names. The majority of Japanese writing uses the kanji extensively. Most kanji have at least two pronunciations and multiple meanings. In addition, the partial redundancy [3] of hiragana and kanji means that there are usually two or more ways to write any given word or phrase.

When learning any language there are 4 things to learn (listening, speaking, reading and writing). Whereas in most languages this corresponds to 4 skills, which (for the adult learning a second language) can initially be approached through transliteration and translation respectively, the Japanese situation with three alphabets, two or more pronunciations and two or more spellings makes it more like 10 or 12 skills.

The learning curve is a seven-dimensional manifold.

These 10 or 12 skills are inter-related and interdependent. You don’t know which way something will be spelled or pronounced, so it is important to learn both (all) of the alternatives.

Let’s Just Learn the Writing

At this point I thought — Perhaps speech and listening/understanding can be put aside for the moment — what if I put the spoken skills aside and focus just on reading and writing?. I should still be able to look up Japanese kanji in a dictionary or on Google to find a definition. But that presents another problem, which may strike computer-savvy readers as uniquely odd or even impossible:

In order to be able to type in a kanji character, one must know either how to say it or how to write it.

And by “write”, I mean nothing less than the ancient traditional art of brush-and-ink calligraphy.

As it turns out, thousands of years of experience have led to a greatly standardized stroke order for each of the thousands of commonly-occuring characters, and the system is so useful that it is identical in all of the cultures that use the characters (primarily those who speak some form of Chinese, Japanese or Korean). The stroke order gives rise to a convenient and efficient software optimization for handwriting recognition, which can be adopted and used by all native speakers because no-one writes any of the kanji any other way.

So in order to so the most basic and modern task (say, looking up “富士山” on Google Images) I need to learn one of the most ancient and decidedly non-modern tasks (how to paint “富士山” with a brush!).

Entering a Kanji without knowing its pronunciation

(showing a partly-entered “億” (おく, “100,000,000“)

This need to know stroke-order in order to look things up in Google is both a curse and a blessing. Simply being able to produce an accurate drawing of the character is not good enough. You have to draw each line in the correct order. This can be very frustrating for beginners, but that is far outweighed by the benefit: one can practice and learn Kanji writing just by trial-and-error in the computer interface.

An Epiphany

It was sometime in the afternoon on January 10th, stumbling through a few of the dozens of equally unlikely ways of writing “蓮” (れん, “lotus”), that I had the stunning insight: Each part of each kanji has its own specific writing order, and is always drawn the same way each time it appears. In this particular case, for example, I can start by learning how to write the “車” (くるま, “car”) and the “辶” (チャク, “walk” [4] simplified as radical 162), and the first of these uses “日” (にち, “sun” or “day”). These smaller building blocks each have far fewer possibilities to try, and once I learn them I not only have a fighting chance at drawing “蓮”, but am also much more prepared to use any other character that uses any of the same parts (such as “億”, shown above, which also uses “日”).

This is of course only the second or third thing anyone is taught if they study Chinese writing the “proper” way (like, say, from a book or a teacher). In fact, a friend told me about this in 1982, when I first got curious about Chinese writing. But it kind of slipped my mind somewhere along the line, and it was really cool to figure it out (again) on my own. These kanji building blocks (many of which are “radicals”, but many are not) are like little graphical subroutines — very appealing to my computer programmer aspect.


[1] adopt a religion : I do not proselytize, but if you are curious it is Nichiren_Shōshū. The reasons it appeals to me are the relative peacefulness (and lack of political dictation) of Buddhism in general combined with the prominent role of large numbers [5] in the most important source text, the 16th chapter of the Lotus Sutra. (There is a widely available translation by Burton Watson).

[2] fujisan : Note that fujiyama (ふJiやま) is a common Western mistake: 山 is usually やま but not in this case.)

[3] redundancy : The written alphabets, including the kanji, came to Japan after there was already a distinct Japanese spoken language. The kanji were used wherever a Chinese character (or combination) was directly suited to represent a word. Often, but not always, the Chinese pronunciation was used for the Chinese character. Anything for which there was no word in China, including Japanese conjugations and declensions, etc. had to be represented with extra letters representing their sound (phonetic value). Since the kanji have a pronunciation (and usually two: Original Chinese and Japanese), they too have a phonetic value, and one could just use just a phonetic alphabet. Often you see both: little kana written above or next to the kanji. There are many situations in which that is preferred (writing by or for young or uneducated readers; texts that would otherwise use rare or obscure kanji, instant messaging, etc.) but the reader cannot count on it.

[4] : This is a derivative of “辵”, which my Kanji dictinary does not know about (probably because it is not taught in schools). I like it a lot better than the modern alternative, which seems to be “歩” (taught in grade 2) because it puts the “steps” (彳, “walk” in an idiosyncratic form 彡) before the “stopping” (止)

This character reveals some of the flaws in modern integration of computer technology in our culture. Your computer might show “辶” with three strokes or four:

Compare the page title (top) with the article title.

Wikipedia’s article on Hyōgaiji notes:

A related weakness (though less relevant to modern language use) is the inability of most commercially-available Japanese fonts to show the traditional forms of many Jōyō kanji, particularly those whose component radicals have been comprehensively altered (such as […], and 辵 in 運 or 連, rather than [the traditional form used in 迴]). This is mostly an issue in the verbatim reproduction of old texts, and for academic purposes.

These old and/or rare characters (hyōgaiji) are of great interest to me, as the primary application of my Chinese and Japanese learning will be research on large numbers [5].

[5] large numbers : The Chinese Buddhist quantity “阿僧祇” (Sanskrit asaṃkhyeya, Japanese asōgi) means “incalculable” or “innumerable”. As a number it can mean anything from 1056 (in common modern usage, see Wikipedia “Chinese numerals“) to 10140 (see asaṃkhyeya) to 107×2103 = 1070988433612780846483815379501056 (as seen in the Chinese Wikipedia article on Chinese numerals, item 103 of the long list in the “大數系統” section).

But there are far larger numbers, on the order of 10↑↑(105×2120) where ↑↑ represents the hyper4 operator or iterated exponential function. That is a “power-tower” of 10’s a googolplex of 10’s high. See novoloka’s article on Avatamsaka numbers and go down to the note at the bottom. Note the description that begins with “The first four verses of this poem are most challenging. They apply a superexponential iteration over an exponential one.” For more detail see their article measuring the asamkhyeya.


Comments Off on I Think I’m Learning Japapese

The contraction of curricula and galaxies

Posted in numbers, philosophy by mrob27 on 2010.10.29

2010 Oct 29th

A reader asked me what I thought about the limits of defining large numbers.

Such discussions begin with specific arithmetic operations and mathematical symbols in mind, and usually focus on comparing one system (such as Conway’s “chained arrow notation“) to another (such as “Bowers’ extended operators“). The choice of symbols and operations affects how high one can go, and such discussions usually devolve into competitive games, the limits of which are fairly well handled by the Turing machine and the Lin/Rado “busy beaver function“.

But such discussions usually come out of a more universal question, which regards the limits of human thought and perception in general.

Limits of human thought and perception are apparent throughout the history of numbers and mathematics. After a survey of early human developments (such as is presented in the nearly exhaustive “Universal History of Numbers” by Georges Ifrah, ISBN 0-471-37568-3) one might notice some patterns:

  • Perception and understanding are limited by the symbols in use and the concepts they represent,
  • Mastery of a given set of concepts leads to invention of new symbols and concepts.

At any point in history, or within any specific culture, there is a specific set of ideas and symbols which creates (or perhaps reflects, or both?) a natural limit of the capacity of the mind to perceive (say) large finite numbers.

It has been the trend throughout our history that the intellectual developments of earlier generations become assimilated into the body of common knowledge and added to the standard educational curriculum. As new material is added, earlier material is often compressed and taught (usually with greater efficiency) in a shorter period. So it is that the most advanced arithmetic of the early Babylonians is surpassed by that learned by today’s 8- and 9-year-old students, and most of the algebra techniques of 9th century Arabia are (typically) learned by 13- or 14-year-olds today, and so on. Both are aided by more recent developments (Indo-Arabic numerals aid arithmetic; certain new teaching methods address the abstraction of variables in algebra, etc.)

Speculating about the limits of the human mind (or brain, for reductionists) can lead to discussions that test or challenge religious beliefs. I suppose the majority opinion in most cultures would state that the human mind has some kind of ultimate limit, which can be compared to the limited physical size of the human brain. (Such a conclusion helps to distinguish believers from God, avoiding blasphemy).

A universe, assuming it is also limited in size (or a visible universe as limited by an event horizon or light cone) would therefore also have a finite limit.

The development of our culture over thousands of years is a bit like an expanding light cone. The contraction of the curriculum into ever-shorter stretches of childhood is like the Lorentz contraction of galaxies known to be much further away, and therefore seen in a remote past, when the universe and the visible universe (our view of the world and the sum total of knowledge) were both much smaller.

Comments Off on The contraction of curricula and galaxies

An “Official” Nomenclature for Large Numbers?

Posted in numbers by mrob27 on 2010.05.03

2010 May 3rd

A former co-worker recently told me that his son has been learning (with his help) about very large numbers, including Graham’s number, and asked me “if I know of any more ‘official’ nomenclature [for] numbers higher than centillion”.

The higher the numbers go, the less official the names get. I have written much on this in the first section of my Large Numbers page.

Most folks who ask this question want to go more than just a little bit beyond centillion (10303 or 10600). Let’s use 1012345 and 101027 as examples.

The only really official nomenclature is to say, for example, “ten to the power of ten to the power of twenty-seven”.

I would give the prize for “second place” to Conway and Guy, The Book of Numbers (1996) pp. 13-15, who set out the system that I describe here. Under thier system, 1012345 is “one quadrilliquattuordecicentillion” and 101027 is “ten trestrigintatrecentillitrestrigintatrecentillitrestrigintatrecentillitrestrigintatrecentillitrestrigintatrecentillitres- trigintatrecentillitrestrigintatrecentillitrestrigintatrecentilliduotrigintatrecentillion“.

I think the Knuth -yllion system would come in third; under his system, 1012345 is “ten myllion byllion tryllion decyllion undecyllion” and 101027 is “one quinvigintyllion septemvigintyllion octovigintyllion novemvigintyllion duotrigintyllion trestrigintyllion quattuortrigintyllion quintrigintyllion quinquadragintyllion quinquagintyllion duoquinquagintyllion tresquinquagintyllion quattuorquinquagintyllion quinquinquagintyllion sesquinquagintyllion septenquinquagintyllion octoquinquagintyllion unsexagintyllion quattuorsexagintyllion quinsexagintyllion sesexagintyllion septensexagintyllion unseptuagintyllion duoseptuagintyllion treseptuagintyllion quinseptuagintyllion octoseptuagintyllion novenseptuagintyllion unoctogintyllion duooctogintyllion tresoctogintyllion sexoctogintyllion septemoctogintyllion“.

As you can see, systematic names for large numbers become unwieldy if you attempt to follow the classical system of giving names to each power of 10 (or powers of 1000 like Americans do today, or of a myriad as the Greeks and Chinese did, or of a million like Chuquet).

All of the other systems I have encountered are ad-hoc, unresearched and/or poorly thought out, imitations of the Chuquet names with clumsy or inconsistent decisions regarding how to proceed once the Latin ordinal number names run out. I describe some of these here.

The names googolplex for 1010100 and googolplexplex or googolduplex for 101010100 are fairly well-known. The number 1010101010000000 appeared in a 1994 journal article by Zarko Bizaca. Going beyond these, to numbers that are unwieldy to represent even as a succession of exponents:

Several academics (mostly mathematicians like Graham) have had to invent recursive function definitions to describe large finite numerical quantities, as part of a proof of some kind. As far as I have been able to tell, each such system is incompatible with every other such system.

Jonathan Bowers seems to have given more thought to this than anyone I have read about or been in contact with. His names (like exillion, tripent, baggol, trissol, dutridecal, goppatoth, golapulus, meameamealokkapoowa, and so on) are just convenient, arbitrary nicknames for various specific examples of his array notation and its multidimensional extensions. The array notation, in turn, is shorthand for a very complex set of recursively-defined functions.

Recursively-defined functions like those Bowers develops are extremely difficult to understand, and given two different recursive definitions, it can be even more difficult to prove which produces the more quickly-growing function. I am not sure how he developed his functions but I am reasonably confident that most of his claims about them are accurate. Checking his work is well beyond my patience, if not my ability. Bowers’ keen abilities of comprehension are also evident in his descriptions of multi-dimensional geometric structures (“polychora”, which are like polyhedra but with more dimensions).

Comments Off on An “Official” Nomenclature for Large Numbers?