In my capacity as an invited expert to the world wide web consortium (with specialisation in internationalisation, I had some involvement in the creation of the specifications of, “Requirements for Japanese Text Layout” a technical document amounting to around 174 pages in the English edition and almost 200 pages in Japanese. This is now a definitive guide for Japanese text layout, both on the web and in paper form. It was formally published on 4th June 2009. Unfortunately, I didn’t contribute enough to get editorial credit :-(

As Richard Ishida said,

This document describes requirements for Japanese layout realized with technologies like CSS, SVG and XSL-FO. For non-Japanese speakers it provides access for the first time to a wealth of detailed and authoritative information about Japanese typesetting. The document is mainly based on a standard for Japanese layout, JIS X 4051 and was written by key contributors to that standard. However, it also addresses areas which are not covered by JIS X 4051.


Apparently Japanese telecoms companies are trying to convince the world that written Japanese does not already have enough characters.

These additional characters are used to depict emotions and other symbols in a similar manner to SMS emoticons.

Rather than being combinations of characters, such a :) , which is entered as a : followed by a ) ,  to represent a smiley in the Latin character sets, there is a movement to create a whole range of  new symbols, into Unicode, which include colour and animation.

At present, they are exchanged in SMS messages by using privately agreed character codes, but there is pressure to add these new emoji ideographs into the Unicode specification.

Some of the key problems that adding Emoji to the Unicode standards would present include:

  1. Adding shapes to Unicode, which has carefully remianed indepentant of how glyphs are drawn
  2. Adding colour requirements to Unicode, which again has had no logical need to specify colours for characters
  3. Adding the concept of animation definitions to characters, which is well outside the range of a character set definition

At the recent W3C Technical plenary in Cannes, I was discussing issues of literal translation. A Japanese delegate came up with a phrase that was new to me. “緑の黒髪 (みどりのくろいかみ)” The phrase literally means green/black hair, but has an idiomatic meaning that the hair is very dark and lustrous.


Having again hosted a pair of Japanese students, who are learning English, it still amazes me how they cope with the vagaries of English spelling.
Just when they were leaving, I remembered a few verses that were designed to try the patience of anyone learning our language.
The verses below have be variously attributed to NATO, in an attempt to get translators to discard an array of accents, to George Bernard Shaw and to a poem written in 1922 entitled “The Chaos” by Gerard Nolst Trenité a.k.a. “Charivarius” 1870 – 1946. I suspect that the version presented here, is an updated edition of “Charivarius” as the original contains some fairly antiquated wording.
Regardless of their original source, it is an amazing achievement for a non-native speaker of English to read these verses intelligibly.

English is tough

Dearest creature in creation,
Study English pronunciation.
I will teach you in my verse
Sounds like corpse, corps, horse, and worse.
I will keep you, Suzy, busy,
Make your head with heat grow dizzy.
Tear in eye, your dress will tear.
So shall I! Oh hear my prayer.

Just compare heart, beard, and heard,
Dies and diet, lord and word,
Sword and sward, retain and Britain.
(Mind the latter, how it’s written.)
Now I surely will not plague you
With such words as plaque and ague.
But be careful how you speak:
Say break and steak, but bleak and streak;
Cloven, oven, how and low,
Script, receipt, show, poem, and toe.

Hear me say, devoid of trickery,
Daughter, laughter, and Terpsichore,
Typhoid, measles, topsails, aisles,
Exiles, similes, and reviles;
Scholar, vicar, and cigar,
Solar, mica, war and far;
One, anemone, Balmoral,
Kitchen, lichen, laundry, laurel;
Gertrude, German, wind and mind,
Scene, Melpomene, mankind.

Billet does not rhyme with ballet,
Bouquet, wallet, mallet, chalet.
Blood and flood are not like food,
Nor is mould like should and would.
Viscous, viscount, load and broad,
Toward, to forward, to reward.
And your pronunciation’s OK
When you correctly say croquet,
Rounded, wounded, grieve and sieve,
Friend and fiend, alive and live.

Ivy, privy, famous; clamour
And enamour rhyme with hammer.
River, rival, tomb, bomb, comb,
Doll and roll and some and home.
Stranger does not rhyme with anger,
Neither does devour with clangour.
Souls but foul, haunt but aunt,
Font, front, wont, want, grand, and grant,
Shoes, goes, does. Now first say finger,
And then singer, ginger, linger,
Real, zeal, mauve, gauze, gouge and gauge,
Marriage, foliage, mirage, and age.

Query does not rhyme with very,
Nor does fury sound like bury.
Dost, lost, post and doth, cloth, loth.
Job, nob, bosom, transom, oath.
Though the differences seem little,
We say actual but victual.
Refer does not rhyme with deafer.
Foeffer does, and zephyr, heifer.
Mint, pint, senate and sedate;
Dull, bull, and George ate late.
Scenic, Arabic, Pacific,
Science, conscience, scientific.

Liberty, library, heave and heaven,
Rachel, ache, moustache, eleven.
We say hallowed, but allowed,
People, leopard, towed, but vowed.
Mark the differences, moreover,
Between mover, cover, clover;
Leeches, breeches, wise, precise,
Chalice, but police and lice;
Camel, constable, unstable,
Principle, disciple, label.

Petal, panel, and canal,
Wait, surprise, plait, promise, pal.
Worm and storm, chaise, chaos, chair,
Senator, spectator, mayor.
Tour, but our and succour, four.
Gas, alas, and Arkansas.
Sea, idea, Korea, area,
Psalm, Maria, but malaria.
Youth, south, southern, cleanse and clean.
Doctrine, turpentine, marine.

Compare alien with Italian,
Dandelion and battalion.
Sally with ally, yea, ye,
Eye, I, ay, aye, whey, and key.
Say aver, but ever, fever,
Neither, leisure, skein, deceiver.
Heron, granary, canary.
Crevice and device and aerie.

Face, but preface, not efface.
Phlegm, phlegmatic, ass, glass, bass.
Large, but target, gin, give, verging,
Ought, out, joust and scour, scourging.
Ear, but earn and wear and tear
Do not rhyme with here but ere.
Seven is right, but so is even,
Hyphen, roughen, nephew Stephen,
Monkey, donkey, Turk and jerk,
Ask, grasp, wasp, and cork and work.

Pronunciation — think of Psyche!
Is a paling stout and spikey?
Won’t it make you lose your wits,
Writing groats and saying grits?
It’s a dark abyss or tunnel:
Strewn with stones, stowed, solace, gunwale,
Islington and Isle of Wight,
Housewife, verdict and indict.

Finally, which rhymes with enough –
Though, through, plough, or dough, or cough?
Hiccough has the sound of cup.
My advice is to give up!!!


I recently obtained a copy of a Java IDE that calls itself “J Creator Pro”. Considering that Java has native Unicode support, I was amazed to find that J Creator Pro does not.

It only supports a handful of character sets, such as ASCII, ISO8859-1, ISO8859-2 and a few Mac encodings. Obviously the developers of the product are unaware that there is any non-Latin character set. Since my existing source includes some Japanese, the IDE was completely unable to recognise the characters from UTF-8 files.

This hurdle stopped any possible testing, and has create an impression that the product should be called “JCreator very amateur“.