<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Localizing Japan</title>
	<atom:link href="http://www.localizingjapan.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.localizingjapan.com/blog</link>
	<description>Blog about localizating, translation, and the Japanese language.</description>
	<lastBuildDate>Sun, 05 Feb 2012 00:35:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Detecting and Conveting Japanese Multibyte Encodings in PHP</title>
		<link>http://www.localizingjapan.com/blog/2012/01/30/detecting-and-conveting-japanese-multibyte-encodings-in-php/</link>
		<comments>http://www.localizingjapan.com/blog/2012/01/30/detecting-and-conveting-japanese-multibyte-encodings-in-php/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 01:40:14 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=882</guid>
		<description><![CDATA[PHP has a large collection of multibyte functions in the standard library for handling multibyte strings such as Japanese. Two useful multibyte functions that PHP provides are for detecting the encoding of a multibyte string, and converting from one multibyte encoding to another. mb_check_encoding() mb_convert_encoding() To check if $string is in UTF-8 encoding, we call [...]]]></description>
			<content:encoded><![CDATA[<p>PHP has a large collection of multibyte functions in the standard library for handling multibyte strings such as Japanese. Two useful multibyte functions that PHP provides are for detecting the encoding of a multibyte string, and converting from one multibyte encoding to another.</p>
<ul>
<li><a href="http://php.net/manual/en/function.mb-check-encoding.php">mb_check_encoding()</a></li>
<li><a href="http://php.net/manual/en/function.mb-convert-encoding.php">mb_convert_encoding()</a></li>
</ul>
<p>To check if <em>$string</em> is in UTF-8 encoding, we call <em>mb_check_encoding()</em> like this:</p>
<pre>if (mb_check_encoding($string, "UTF-8")) { // do_something(); }</pre>
<p>To convert <em>$string</em>, which is currently Shift-JIS, to UTF-8, we call <em>mb_convert_encoding()</em> like this:</p>
<pre>$convertedString = mb_convert_encoding($string, "UTF-8", "Shift-JIS);</pre>
<p>A convenient feature of <em>mb_convert_encoding()</em> is that you can generalize the function by adding a list of character encodings to convert from. This can come in very handy if you want to convert all Japanese multibyte string encodings to UTF-8, or something else. There are actually 18 Japanese-specific multibyte encodings (that I know of), not including all the Unicode variants like UTF-8, UTF-16, etc. A lot of them come from the Japanese mobile phone carriers.</p>
<p>Let&#8217;s put all of this together and check if a string is UTF-8, and if it&#8217;s not, meaning it is one of the other 18 Japanese encoding types, let&#8217;s convert it to UTF-8.</p>
<pre>if (!mb_check_encoding($string, "UTF-8")) {

   $string = mb_convert_encoding($string, "UTF-8",
      "Shift-JIS, EUC-JP, JIS, SJIS, JIS-ms, eucJP-win, SJIS-win, ISO-2022-JP,
       ISO-2022-JP-MS, SJIS-mac, SJIS-Mobile#DOCOMO, SJIS-Mobile#KDDI,
       SJIS-Mobile#SOFTBANK, UTF-8-Mobile#DOCOMO, UTF-8-Mobile#KDDI-A,
       UTF-8-Mobile#KDDI-B, UTF-8-Mobile#SOFTBANK, ISO-2022-JP-MOBILE#KDDI");
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2012/01/30/detecting-and-conveting-japanese-multibyte-encodings-in-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Regular Expressions for Japanese Text</title>
		<link>http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/</link>
		<comments>http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 00:33:10 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=847</guid>
		<description><![CDATA[Regular expressions are extremely useful for matching patterns in text. But when it comes to Japanese Unicode text, it isn&#8217;t obvious what you should do to create regular expressions to match a range of Japanese characters. You can try something like [あ-ん] to match all hiragana characters—and you would be close—but it isn&#8217;t the best [...]]]></description>
			<content:encoded><![CDATA[<p>Regular expressions are extremely useful for matching patterns in text. But when it comes to Japanese Unicode text, it isn&#8217;t obvious what you should do to create regular expressions to match a range of Japanese characters. You can try something like [あ-ん] to match all hiragana characters—and you would be close—but it isn&#8217;t the best way to do it. Also, direct input of Japanese isn&#8217;t always an option.</p>
<p>To deal with this, know that each character in Unicode has a hexadecimal code point. For example, the code point for the hiragana あ is 3042, and this is designated by U+3042. This code point can be used in a regular expression like this: \x3042. This will match a hiragana あ. This is very useful for programmers who must code pattern matching for Japanese on a system where they cannot input or display Japanese text, or have the know-how to do it (See some of my great Japanese input posts if you need to know how!).</p>
<p>Additionally, some flavors of regular expressions have what are known as Unicode block properties, or Unicode scripts. These are pre-defined blocks of regex Unicode character classes. Hiragana, katakana, and kanji are included in the block properties—very convenient if you need the full-script match in your regular expression.</p>
<p>With this basic knowledge, the following is a thorough list of different Japanese character classes and the various Japanese regular expressions that match those character classes. And further down, a few programming examples showing them in use.</p>
<h3>Hiragana</h3>
<p>Unicode code points regex: [\x3041-\x3096]<br />
Unicode block property regex: \p{Hiragana}</p>
<p>ぁ あ ぃ い ぅ う ぇ え ぉ お か が き ぎ く ぐ け げ こ ご さ ざ し じ す ず せ ぜ そ ぞ た だ ち ぢ っ つ づ て で と ど な に ぬ ね の は ば ぱ ひ び ぴ ふ ぶ ぷ へ べ ぺ ほ ぼ ぽ ま み む め も ゃ や ゅ ゆ ょ よ ら り る れ ろ ゎ わ ゐ ゑ を ん ゔ ゕ ゖ  ゙ ゚ ゛ ゜ ゝ ゞ ゟ</p>
<h3>Katakana (Full Width)</h3>
<p>Unicode code points regex: [\x30A0-\x30FF]<br />
Unicode block property regex: \p{Katakana}</p>
<p>゠ ァ ア ィ イ ゥ ウ ェ エ ォ オ カ ガ キ ギ ク グ ケ ゲ コ ゴ サ ザ シ ジ ス ズ セ ゼ ソ ゾ タ ダ チ ヂ ッ ツ ヅ テ デ ト ド ナ ニ ヌ ネ ノ ハ バ パ ヒ ビ ピ フ ブ プ ヘ ベ ペ ホ ボ ポ マ ミ ム メ モ ャ ヤ ュ ユ ョ ヨ ラ リ ル レ ロ ヮ ワ ヰ ヱ ヲ ン ヴ ヵ ヶ ヷ ヸ ヹ ヺ ・ ー ヽ ヾ ヿ</p>
<h3>Kanji</h3>
<p>Unicode code points regex: [\x3400-\x4DB5\x4E00-\x9FCB\xF900-\xFA6A]<br />
Unicode block property regex: \p{Han}</p>
<p>漢字 日本語 文字 言語 言葉 etc. Too many characters to list.</p>
<p>This regular expression will match all the kanji, including those used in Chinese.</p>
<h3>Kanji Radicals<strong></strong></h3>
<p>Unicode code points regex: [\x2E80-\x2FD5]</p>
<p>⺀ ⺁ ⺂ ⺃ ⺄ ⺅ ⺆ ⺇ ⺈ ⺉ ⺊ ⺋ ⺌ ⺍ ⺎ ⺏ ⺐ ⺑ ⺒ ⺓ ⺔ ⺕ ⺖ ⺗ ⺘ ⺙ ⺚ ⺛ ⺜ ⺝ ⺞ ⺟ ⺠ ⺡ ⺢ ⺣ ⺤ ⺥ ⺦ ⺧ ⺨ ⺩ ⺪ ⺫ ⺬ ⺭ ⺮ ⺯ ⺰ ⺱ ⺲ ⺳ ⺴ ⺵ ⺶ ⺷ ⺸ ⺹ ⺺ ⺻ ⺼ ⺽ ⺾ ⺿ ⻀ ⻁ ⻂ ⻃ ⻄ ⻅ ⻆ ⻇ ⻈ ⻉ ⻊ ⻋ ⻌ ⻍ ⻎ ⻏ ⻐ ⻑ ⻒ ⻓ ⻔ ⻕ ⻖ ⻗ ⻘ ⻙ ⻚ ⻛ ⻜ ⻝ ⻞ ⻟ ⻠ ⻡ ⻢ ⻣ ⻤ ⻥ ⻦ ⻧ ⻨ ⻩ ⻪ ⻫ ⻬ ⻭ ⻮ ⻯ ⻰ ⻱ ⻲ ⻳<br />
⼀ ⼁ ⼂ ⼃ ⼄ ⼅ ⼆ ⼇ ⼈ ⼉ ⼊ ⼋ ⼌ ⼍ ⼎ ⼏ ⼐ ⼑ ⼒ ⼓ ⼔ ⼕ ⼖ ⼗ ⼘ ⼙ ⼚ ⼛ ⼜ ⼝ ⼞ ⼟ ⼠ ⼡ ⼢ ⼣ ⼤ ⼥ ⼦ ⼧ ⼨ ⼩ ⼪ ⼫ ⼬ ⼭ ⼮ ⼯ ⼰ ⼱ ⼲ ⼳ ⼴ ⼵ ⼶ ⼷ ⼸ ⼹ ⼺ ⼻ ⼼ ⼽ ⼾ ⼿ ⽀ ⽁ ⽂ ⽃ ⽄ ⽅ ⽆ ⽇ ⽈ ⽉ ⽊ ⽋ ⽌ ⽍ ⽎ ⽏ ⽐ ⽑ ⽒ ⽓ ⽔ ⽕ ⽖ ⽗ ⽘ ⽙ ⽚ ⽛ ⽜ ⽝ ⽞ ⽟ ⽠ ⽡ ⽢ ⽣ ⽤ ⽥ ⽦ ⽧ ⽨ ⽩ ⽪ ⽫ ⽬ ⽭ ⽮ ⽯ ⽰ ⽱ ⽲ ⽳ ⽴ ⽵ ⽶ ⽷ ⽸ ⽹ ⽺ ⽻ ⽼ ⽽ ⽾ ⽿ ⾀ ⾁ ⾂ ⾃ ⾄ ⾅ ⾆ ⾇ ⾈ ⾉ ⾊ ⾋ ⾌ ⾍ ⾎ ⾏ ⾐ ⾑ ⾒ ⾓ ⾔ ⾕ ⾖ ⾗ ⾘ ⾙ ⾚ ⾛ ⾜ ⾝ ⾞ ⾟ ⾠ ⾡ ⾢ ⾣ ⾤ ⾥ ⾦ ⾧ ⾨ ⾩ ⾪ ⾫ ⾬ ⾭ ⾮ ⾯ ⾰ ⾱ ⾲ ⾳ ⾴ ⾵ ⾶ ⾷ ⾸ ⾹ ⾺ ⾻ ⾼ ⾽ ⾾ ⾿ ⿀ ⿁ ⿂ ⿃ ⿄ ⿅ ⿆ ⿇ ⿈ ⿉ ⿊ ⿋ ⿌ ⿍ ⿎ ⿏ ⿐ ⿑ ⿒ ⿓ ⿔ ⿕</p>
<h3>Katakana and Punctuation (Half Width)<strong></strong></h3>
<p>Unicode code points regex: [\xFF5F-\xFF9F]</p>
<p>｟ ｠ ｡ ｢ ｣ ､ ･ ｦ ｧ ｨ ｩ ｪ ｫ ｬ ｭ ｮ ｯ ｰ ｱ ｲ ｳ ｴ ｵ ｶ ｷ ｸ ｹ ｺ ｻ ｼ ｽ ｾ ｿ ﾀ ﾁ ﾂ ﾃ ﾄ ﾅ ﾆ ﾇ ﾈ ﾉ ﾊ ﾋ ﾌ ﾍ ﾎ ﾏ ﾐ ﾑ ﾒ ﾓ ﾔ ﾕ ﾖ ﾗ ﾘ ﾙ ﾚ ﾛ ﾜ ﾝ ﾞ</p>
<h3>Japanese Symbols and Punctuation</h3>
<p>Unicode code points regex: [\x3000-\x303F]</p>
<p>、 。 〃 〄 々 〆 〇 〈 〉 《 》 「 」 『 』 【 】 〒 〓 〔 〕 〖 〗 〘 〙 〚 〛 〜 〝 〞 〟 〠 〡 〢 〣 〤 〥 〦 〧 〨 〩 〪 〫 〬 〭 〮 〯 〰 〱 〲 〳 〴 〵 〶 〷 〸 〹 〺 〻 〼 〽 〾 〿</p>
<h3>Miscellaneous Japanese Symbols and Characters<strong></strong></h3>
<p>Unicode code points regex: [\x31F0-\x31FF\x3220-\x3243\x3280-\x337F]</p>
<p>ㇰ ㇱ ㇲ ㇳ ㇴ ㇵ ㇶ ㇷ ㇸ ㇹ ㇺ ㇻ ㇼ ㇽ ㇾ ㇿ<br />
㈠ ㈡ ㈢ ㈣ ㈤ ㈥ ㈦ ㈧ ㈨ ㈩ ㈪ ㈫ ㈬ ㈭ ㈮ ㈯ ㈰ ㈱ ㈲ ㈳ ㈴ ㈵ ㈶ ㈷ ㈸ ㈹ ㈺ ㈻ ㈼ ㈽ ㈾ ㈿ ㉀ ㉁ ㉂ ㉃<br />
㊀ ㊁ ㊂ ㊃ ㊄ ㊅ ㊆ ㊇ ㊈ ㊉ ㊊ ㊋ ㊌ ㊍ ㊎ ㊏ ㊐ ㊑ ㊒ ㊓ ㊔ ㊕ ㊖ ㊗ ㊘ ㊙ ㊚ ㊛ ㊜ ㊝ ㊞ ㊟ ㊠ ㊡ ㊢ ㊣ ㊤ ㊥ ㊦ ㊧ ㊨ ㊩ ㊪ ㊫ ㊬ ㊭ ㊮ ㊯ ㊰ ㊱ ㊲ ㊳ ㊴ ㊵ ㊶ ㊷ ㊸ ㊹ ㊺ ㊻ ㊼ ㊽ ㊾ ㊿<br />
㋀ ㋁ ㋂ ㋃ ㋄ ㋅ ㋆ ㋇ ㋈ ㋉ ㋊ ㋋  ㋐ ㋑ ㋒ ㋓ ㋔ ㋕ ㋖ ㋗ ㋘ ㋙ ㋚ ㋛ ㋜ ㋝ ㋞ ㋟ ㋠ ㋡ ㋢ ㋣ ㋤ ㋥ ㋦ ㋧ ㋨ ㋩ ㋪ ㋫ ㋬ ㋭ ㋮ ㋯ ㋰ ㋱ ㋲ ㋳ ㋴ ㋵ ㋶ ㋷ ㋸ ㋹ ㋺ ㋻ ㋼ ㋽ ㋾<br />
㌀ ㌁ ㌂ ㌃ ㌄ ㌅ ㌆ ㌇ ㌈ ㌉ ㌊ ㌋ ㌌ ㌍ ㌎ ㌏ ㌐ ㌑ ㌒ ㌓ ㌔ ㌕ ㌖ ㌗ ㌘ ㌙ ㌚ ㌛ ㌜ ㌝ ㌞ ㌟ ㌠ ㌡ ㌢ ㌣ ㌤ ㌥ ㌦ ㌧ ㌨ ㌩ ㌪ ㌫ ㌬ ㌭ ㌮ ㌯ ㌰ ㌱ ㌲ ㌳ ㌴ ㌵ ㌶ ㌷ ㌸ ㌹ ㌺ ㌻ ㌼ ㌽ ㌾ ㌿ ㍀ ㍁ ㍂ ㍃ ㍄ ㍅ ㍆ ㍇ ㍈ ㍉ ㍊ ㍋ ㍌ ㍍ ㍎ ㍏ ㍐ ㍑ ㍒ ㍓ ㍔ ㍕ ㍖ ㍗ ㍘ ㍙ ㍚ ㍛ ㍜ ㍝ ㍞ ㍟ ㍠ ㍡ ㍢ ㍣ ㍤ ㍥ ㍦ ㍧ ㍨ ㍩ ㍪ ㍫ ㍬ ㍭ ㍮ ㍯ ㍰ ㍱ ㍲ ㍳ ㍴ ㍵ ㍶  ㍻ ㍼ ㍽ ㍾ ㍿</p>
<h3>Alphanumeric and Punctuation (Full Width)</h3>
<p>Unicode code points regex: [\xFF01-\xFF5E]</p>
<p>！ ＂ ＃ ＄ ％ ＆ ＇ （ ） ＊ ＋ ， － ． ／ ０ １ ２ ３ ４ ５ ６ ７ ８ ９ ： ； ＜ ＝ ＞ ？<br />
＠ Ａ Ｂ Ｃ Ｄ Ｅ Ｆ Ｇ Ｈ Ｉ Ｊ Ｋ Ｌ Ｍ Ｎ Ｏ Ｐ Ｑ Ｒ Ｓ Ｔ Ｕ Ｖ Ｗ Ｘ Ｙ Ｚ ［ ＼ ］ ＾ ＿<br />
｀ ａ ｂ ｃ ｄ ｅ ｆ ｇ ｈ ｉ ｊ ｋ ｌ ｍ ｎ ｏ ｐ ｑ ｒ ｓ ｔ ｕ ｖ ｗ ｘ ｙ ｚ ｛ ｜ ｝ ～</p>
<p>&nbsp;</p>
<h3>Japanese RegEx Code Examples</h3>
<p><strong>Find all hiragana in a text string</strong></p>
<pre>// PHP
$pattern = "/[\x{3041}-\x{3096}]/u";
preg_match_all($pattern, $text, $matches);
print_r($matches);</pre>
<pre># Perl
if ($text =~ m/[\x{3041}-\x{3096}]/) { print $text; }</pre>
<p><strong>Remove all hiragana from a text string</strong></p>
<pre>//PHP
$pattern = "/\p{Hiragana}/u";
$text = preg_replace($pattern, "", $text);</pre>
<pre># Perl
$text =~ s/\p{Hiragana}//g;</pre>
<p><strong>Remove everything but Kanji</strong></p>
<pre>// PHP
// <strong>\P</strong>{Han} matches everything other than kanji
$pattern = "/\P{Han}/u";
$text = preg_replace($pattern, "", $text);</pre>
<p><strong>Note:</strong> In PHP and Perl, the Unicode code block regular expression is written with curly braces around the hexadecimal codes. So the regex of \x3041 becomes \x{3041} and so on.</p>
<p><strong>Note:</strong> In Perl you have to make sure you have Unicode set up properly to get regular expressions to work over Japanese. You may also have to run perl with the -CS options (<em>perl -CS</em>) to get rid of any <em>Wide character in print warnings</em>. See <a href="http://ahinea.com/en/tech/perl-unicode-struggle.html" target="_blank">http://ahinea.com/en/tech/perl-unicode-struggle.html</a> for more information.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Kanji Usage Count Concordance in PHP</title>
		<link>http://www.localizingjapan.com/blog/2011/12/07/kanji-usage-count-concordance-in-php/</link>
		<comments>http://www.localizingjapan.com/blog/2011/12/07/kanji-usage-count-concordance-in-php/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 22:25:04 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=775</guid>
		<description><![CDATA[A concordance is a list of all words used in a document, Web site, or publication, and some additional useful information about those words. A useful concordance in translation and localization is a list of the most frequently used words. This can be used to identify important terms that should be picked up for a [...]]]></description>
			<content:encoded><![CDATA[<p>A concordance is a list of all words used in a document, Web site, or publication, and some additional useful information about those words. A useful concordance in translation and localization is a list of the most frequently used words. This can be used to identify important terms that should be picked up for a glossary. It can also be used by students of a foreign language to identify important vocabulary words they should dedicate time to study.</p>
<p>Students of Japanese often wonder what kanji they should learn. It can be hard to identify what kanji is most important. And even between subject matters what kanji is more important will differ.</p>
<p>To help with this, I&#8217;ve created a kanji concordance application in PHP to create a list of kanji and their usage counts in descending order.</p>
<h3>Example</h3>
<p>If you had this Japanese text:</p>
<p>私の名前はマークです。私はテキサス大学を卒業しました。すしが大好きです。</p>
<p>The kanji concordance would generate a list that looked like this:</p>
<p>2 私<br />
2 大<br />
1 名<br />
1 前<br />
1 学<br />
1 卒<br />
1 業<br />
1 好</p>
<p>The kanji 私 and 大 are both used twice, so they are at the top of the list with the number 2 for the usage count. The rest of the kanji are used once and show a usage count of 1.</p>
<h3>Kanji Concordance Code Explanation</h3>
<p>The first thing we do in PHP is set the language locale with the <em>setlocale()</em> function. This is always good practice when dealing with language-related applications.</p>
<pre>setlocale(LC_ALL, "ja_JP.utf8");</pre>
<p>The <em>LC_ALL</em> parameter sets the locale for all categories, and the <em>ja_JP.utf8</em> parameter sets the language and locale to Japanese/Japan in Unicode UTF-8.</p>
<p>Next, we will need some string of Japanese text that we want to examine and create our kanji concordance from. In our simple example we will use a hard-coded string. But in a real application we would probably dynamically input the string from some source.</p>
<pre>$string = "私の名前はマークです。私はテキサス大学を卒業しました。私はすしが大好きです。
           私は漢字が好きです。";</pre>
<p>Once we have our input string, we need to strip it of everything but the kanji, since that is all we are interested in. Japanese text can have hiragana, katakana, English characters, and various punctuation. If we remove all of those, we&#8217;ll be left with just the kanji. We will define a regular expression to match these unwanted characters, and then replace them with nothing.</p>
<pre>$pattern = "/[a-zA-Z0-9０-９あ-んア-ンー。、？！＜＞： 「」（）｛｝≪≫〈〉《》【】
            『』〔〕［］・\n\r\t\s\(\)　]/u";</pre>
<p>This regular expression pattern is fairly straight forward. <em>a-zA-Z0-9</em> matches the English alphanumeric characters. We also match the double-byte numbers with <em>０-９</em>. あ-ん will match all the hiragana, and ア-ン will match all of the katakana. Finally, we match the various punctuation marks and other special characters we can expect to find. The<em> u</em> at the end of the pattern is a pattern modifier that tells PHP that this pattern is Unicode UTF-8. I&#8217;ve probably left out some punctuation characters but for our example purposes this will do. (We can actually do a much simplier regex than this. For a throrough discussion of regular expressions for Japanese text, see <a title="Regular Expressions for Japanese Text" href="http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/">this post on Japanese regex</a>.)</p>
<p>We will use the regular expression search and replace function <em>preg_replace()</em> to match our input string against the regex pattern to remove the unwanted characters.</p>
<pre>$kanjiString = preg_replace($pattern, "", $string);</pre>
<p>The first parameter, <em>$pattern</em>, is the regular expression pattern to match against. The second paramter <em>&#8220;&#8221;</em> is an empty string that we use to replace the regex matches. We match an unwanted non-kanji character and replace it with nothing—in other words, we delete it. The last parameter <em>$string</em> is the input string of Japanese to match against the regex pattern and remove everything but the kanji.</p>
<p>The variable<em> $kanjiString</em> now contains only the kanji characters from our original input string.</p>
<pre>// $kanjiString = "私名前私大学卒業私大好私漢字好";</pre>
<p>Our next step is to split up all the kanji characters and insert them into an array. We will do this in one step with the split by regular expression function <em>preg_split()</em>.</p>
<pre>$kanjiArray = preg_split("//u", $kanjiString, -1, PREG_SPLIT_NO_EMPTY);</pre>
<p>The first parameter <em>&#8220;//u&#8221;</em> is a regular expression that will match everything, and the <em>u</em> pattern modifer argument puts it in Unicode match mode. The second parameter <em>$kanjiString</em> is the input string to match against the regular expression. The third parameter<em> -1</em> is the limit parameter, and <em>-1</em> indicates no limit. This means it will parse the entire string. The final parameter is the<em> PREG_SPLIT_NO_EMPTY</em> flag. This flag sets it so only non-empty items will be returned.</p>
<p>Now that we have an array full of individual kanji, we want to count them to get our kanji usage numbers. The<em> array_count_values()</em> function will count all the values of our input array, and return a new array with those values and their usage count.</p>
<pre>$countedArray = array_count_values($kanjiArray);</pre>
<p>With our array of kanji and their usage counts, we just need to sort them in reverse order with the <em>arsort()</em> function.</p>
<pre>arsort($countedArray);</pre>
<p>Our counted_array now contains a list of all the kanji used from our input string in order of their usage counts. In other words, we have successfully built a kanji usage count concordance.</p>
<p>The final step is to iterate through the array and display our concordance to the screen. We will do this with a simple foreach loop over our array.</p>
<pre>foreach ($countedArray as $kanji =&gt; $count) {
   echo "$count $kanji &lt;br/&gt;";
}</pre>
<p>Our kanji usage count concordance will display like this:</p>
<p>4 私<br />
2 好<br />
2 大<br />
1 漢<br />
1 字<br />
1 業<br />
1 学<br />
1 名<br />
1 前<br />
1 卒</p>
<p>There we go. We have a list of all the kanji used and in order from most used to least used. As we can see, 私 seems to be a pretty important kanji. Better put it on your list to study.</p>
<p>In this example our Japanese input string was hard coded, but we can easily expand this code to take in input from a file or even screen scrape a Web site and see what their most used kanji are. With a large enough input sample, we can get a pretty good list of kanji and usage counts for our concordance.</p>
<h3>PHP Source Code</h3>
<p>Here is the full source code for the kanji count usage concordance in PHP that we built.</p>
<pre>&lt;?php
setlocale(LC_ALL, "ja_JP.utf8");

$string = "私の名前はマークです。私はテキサス大学を卒業しました。私はすしが大好きです。
           私は漢字が好きです。";

$pattern = "/[a-zA-Z0-9０-９あ-んア-ンー。、？！＜＞： 「」（）｛｝≪≫〈〉《》【】
            『』〔〕［］・\n\r\t\s\(\)　]/u";
$kanjiString = preg_replace($pattern, "", $string);

$kanjiArray = preg_split("//u", $kanjiString, -1, PREG_SPLIT_NO_EMPTY);

$countedArray = array_count_values($kanjiArray);
arsort($countedArray);

foreach ($countedArray as $kanji =&gt; $count) {
   echo "$count $kanji &lt;br/&gt;";
}
?&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/12/07/kanji-usage-count-concordance-in-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Japanese Input on OpenSUSE Linux 12.1 (KDE 4.7)</title>
		<link>http://www.localizingjapan.com/blog/2011/12/06/japanese-input-on-opensuse-linux-12-1-kde-4-7/</link>
		<comments>http://www.localizingjapan.com/blog/2011/12/06/japanese-input-on-opensuse-linux-12-1-kde-4-7/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 01:07:07 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=772</guid>
		<description><![CDATA[Setting up Japanese input IME (日本語入力方法) on openSUSE Linux 12.1 is not difficult, but it requires a little know-how of what packages need to be installed. It only takes a few minutes to download all the files and get it set up. Once installed and configured, you will be able to type in Japanese in [...]]]></description>
			<content:encoded><![CDATA[<p>Setting up Japanese input IME (日本語入力方法) on openSUSE Linux 12.1 is not difficult, but it requires a little know-how of what packages need to be installed. It only takes a few minutes to download all the files and get it set up. Once installed and configured, you will be able to type in Japanese in your Linux applications. If you&#8217;ve used the previous <a title="Japanese Input on OpenSUSE Linux 11.4 (KDE 4.6)" href="http://www.localizingjapan.com/blog/2011/10/03/japanese-input-on-opensuse-linux-11-4-kde-4-6/">11.4 version of openSUSE</a>, it&#8217;s exactly the same, although some icons look a little different.</p>
<h3>Prerequisites</h3>
<ul>
<li>YaST software repositories are configured properly.</li>
</ul>
<h3>Setup Procedure</h3>
<p>Click on the <strong>Kickoff Application Launcher</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-1.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-1.png" alt="" width="30" height="30" /></a></p>
<p>On the <em>Computer</em> tab, click <strong>Install/Remove Software</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-2.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-2-225x300.png" alt="" width="225" height="300" /></a></p>
<p>On the <em>Search</em> tab, search for <strong>anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-3.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-3.png" alt="" width="273" height="79" /></a></p>
<p>In the search results window showing the matching packages, select the <strong>anthy</strong> and <strong>ibus-anthy</strong> packages.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-4.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-4-300x44.png" alt="" width="300" height="44" /></a></p>
<p>Press the <strong>Accept</strong> button on the bottom right of the window.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-5.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-5.png" alt="" width="166" height="34" /></a></p>
<p>YaST will now download, install, and configure the anthy packages.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-6.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-6-300x215.png" alt="" width="300" height="215" /></a></p>
<p>Do the same for <strong>ibus</strong>. Open <em>Install/Remove Software</em>, search for <strong>ibus</strong>, and select the package for <strong>ibus</strong>. Press Accept to install.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-7.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-7-300x10.png" alt="" width="300" height="10" /></a></p>
<p>Click on the<em> Kickoff Application Launcher</em>, and from the <em>Leave</em> tab, click <strong>Restart</strong> to restart openSUSE with the new configuration.</p>
<p style="text-align: center"> <a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-8.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-8-300x156.png" alt="" width="300" height="156" /></a></p>
<p>After restarting, log back in.</p>
<p>You will now have the <em>IBus input method framework</em> keyboard icon in the bottom panel.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-2.png"><img class="aligncenter size-full wp-image-813" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-2.png" alt="" width="27" height="23" /></a></p>
<p>Right click the <em>IBus input method framework</em> keyboard icon and click on <strong>Preferences</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-6.png"><img class="aligncenter size-full wp-image-817" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-6.png" alt="" width="131" height="117" /></a></p>
<p>On the <em>Input Method</em> tab, select <strong>Japanese → Anthy</strong> from the dropdown menu.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-1.png"><img class="aligncenter size-full wp-image-812" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-1.png" alt="" width="305" height="140" /></a></p>
<p>Press the <strong>Add</strong> button to add Japanese Anthy input method, and then press <strong>Close</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-12.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-12.png" alt="" width="90" height="29" /></a></p>
<p>Open up a text editor or any application with a text input window, and click on the <em>IBus input method framework</em> icon and select <strong>Japanese &#8211; Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-3.png"><img class="aligncenter size-full wp-image-814" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-3.png" alt="" width="213" height="70" /></a></p>
<p>The IBus input method framework keyboard icon will change to the Anthy Aち icon.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-4.png"><img class="aligncenter size-full wp-image-815" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-4.png" alt="" width="24" height="23" /></a></p>
<p>You can now type in Japanese.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-15.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/openSuse11.4-15.png" alt="" width="278" height="97" /></a></p>
<p>Click the <strong>Anthy Aち</strong> icon to select between the various Japanese input modes.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-5.png"><img class="aligncenter size-full wp-image-816" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/OpenSUSE-12-1-5.png" alt="" width="380" height="199" /></a></p>
<p>That&#8217;s it. Setting up Japanese input on openSUSE 12.1 is not very difficult. When you try to type Japanese, make sure the cursor is in a text box in an application, or you may get an error saying <em>No input window</em>. 日本語入力方法を楽しんでください。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/12/06/japanese-input-on-opensuse-linux-12-1-kde-4-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Japanese Input on Linux Mint 12 Lisa</title>
		<link>http://www.localizingjapan.com/blog/2011/12/03/japanese-input-on-linux-mint-12-lisa/</link>
		<comments>http://www.localizingjapan.com/blog/2011/12/03/japanese-input-on-linux-mint-12-lisa/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 00:43:09 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=726</guid>
		<description><![CDATA[This tutorial will show you how to set up and install Japanese input method IME (日本語入力方法) on Linux Mint 12 Lisa so you can type in Japanese. Linux Mint is quickly becoming one of the more popular Linux distributions. Linux Mint 12 comes in a Gnome 2 and Gnome 3 variety. This tutorial works for [...]]]></description>
			<content:encoded><![CDATA[<p>This tutorial will show you how to set up and install Japanese input method IME (日本語入力方法) on Linux Mint 12 Lisa so you can type in Japanese. Linux Mint is quickly becoming one of the more popular Linux distributions. Linux Mint 12 comes in a Gnome 2 and Gnome 3 variety. This tutorial works for either version, however, the menus look a little different in Gnome 2.</p>
<h3>Linux Mint 12 Japanese IME Setup Procedure</h3>
<p>Click on the <em>Mint Menu</em> and navigate to <strong>Other → Software Manager</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-1.png"><img class="aligncenter size-full wp-image-728" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-1.png" alt="" width="280" height="209" /></a></p>
<p>In the <em>Software Manager</em>, search for <strong>ibus</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-2.png"><img class="aligncenter size-full wp-image-729" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-2.png" alt="" width="362" height="49" /></a></p>
<p>Select <strong>ibus</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-3.png"><img class="aligncenter size-full wp-image-730" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-3.png" alt="" width="168" height="35" /></a></p>
<p>Click <strong>Install</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-4.png"><img class="aligncenter size-full wp-image-731" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-4.png" alt="" width="321" height="79" /></a></p>
<p>In the <em>Authentication Required</em> dialog box, enter your system password and press <strong>Authenticate</strong>.</p>
<p>Sotware Manager will now download and install IBus in the background.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-5.png"><img class="aligncenter size-full wp-image-732" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-5.png" alt="" width="352" height="20" /></a></p>
<p>While IBus is installing, search for <strong>anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-6.png"><img class="aligncenter size-full wp-image-733" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-6.png" alt="" width="360" height="49" /></a></p>
<p>Select <strong>ibus-anthy</strong> and click <strong>Install</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-7.png"><img class="aligncenter size-full wp-image-734" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-7.png" alt="" width="320" height="78" /></a></p>
<p>In the <em>Authentication Required</em> dialog box, enter your system password and press <strong>Authenticate</strong>.</p>
<p>Software Manager will now download and install ibus-anthy in the background.</p>
<p>When the activity bar on the bottom shows <em>0 ongoing actions</em>, installation is complete.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-8.png"><img class="aligncenter size-full wp-image-735" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-8.png" alt="" width="350" height="17" /></a></p>
<p>Close Software Manager.</p>
<p>From the <em>Mint Menu</em>, navigate to <strong>System Tools → System Settings</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-9.png"><img class="aligncenter size-full wp-image-736" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-9.png" alt="" width="259" height="211" /></a></p>
<p>Open <strong>Language Support</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-10.png"><img class="aligncenter size-full wp-image-737" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-10.png" alt="" width="65" height="64" /></a></p>
<p><strong>Note:</strong> If language support was not installed during the Mint install process you may get a pop up dialog indicating that the language support is not installed completely. In that case, select <strong>Install</strong> to install the language support. In the <em>Authentication Required</em> dialog box, enter your system password and press <strong>Authenticate</strong>. The <em>Applying changes</em> screen will display and show the installation progress. When the language support has been fully installed, the <em>Language Support</em> screen will display.</p>
<p>On the <em>Language Support</em> screen, select <strong>Install / Remove Languages&#8230;</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-11.png"><img class="aligncenter size-full wp-image-738" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-11.png" alt="" width="179" height="25" /></a></p>
<p>Scroll down and check <strong>Japanese</strong>, and then press <strong>Apply Changes</strong>.</p>
<p style="text-align: center"><img class="aligncenter size-full wp-image-739" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-12.png" alt="" width="402" height="26" /></p>
<p>On the <em>Language Support</em> screen, press the <em>Keyboard input method system:</em> drop down and select <strong>ibus</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-13.png"><img class="aligncenter size-full wp-image-740" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-13.png" alt="" width="295" height="33" /></a></p>
<p>Then press <strong>Close</strong>.</p>
<p>Click on the <em>Mint Menu</em> and select <strong>System Tools → IBus</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-ibus.png"><img class="aligncenter size-full wp-image-749" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-ibus.png" alt="" width="256" height="197" /></a></p>
<p>You should now have the little IBus keyboard icon displayed somewhere on the right side of your Gnome top panel.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-15.png"><img class="aligncenter size-full wp-image-742" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-15.png" alt="" width="32" height="26" /></a></p>
<p>Click on the <em>IBus keyboard </em>icon and select <strong>Preferences</strong>.</p>
<p>On the <em>IBus Preferences</em> screen, go to the <em>Input Method</em> tab.</p>
<p>Press the <strong>Select an input method</strong> dropdown and select <strong>Japanese</strong> → <strong>Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-14.png"><img class="aligncenter size-full wp-image-741" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-14.png" alt="" width="384" height="107" /></a></p>
<p>Press <strong>Add</strong> on the <em>IBus Preferences</em> screen to add the Anthy Japanese input method.</p>
<p>Open a text application like Text Editor. While the cursor is in the text field, press the <em>Ibus keyboard</em> icon in the top panel and select <strong>Japanese – Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-16.png"><img class="aligncenter size-full wp-image-743" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-16.png" alt="" width="139" height="146" /></a></p>
<p>The Japanese Anthy toolbar should appear and you can now type in Japanese. Place the cursor in a text input application like Text Edit and try to type in Japanese.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-17.png"><img class="aligncenter size-full wp-image-744" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-17.png" alt="" width="348" height="74" /></a></p>
<p>That&#8217;s all there is to it. Linux Mint is known to be a very easy to use distribution, but it takes quite a few more steps to install Japanese input than the latest versions of Ubuntu or Fedora.</p>
<p><strong>Note:</strong> I had issues with the Anthy toolbar not appearing and instead showed this icon which usually means no input window found. But, I could still type in Japanese in this mode, so no worries if this happens to you.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-18.png"><img class="aligncenter size-full wp-image-745" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/12/Mint12-18.png" alt="" width="27" height="26" /></a></p>
<p><strong>Note:</strong> I had trouble when trying to add Japanese on the <em>Install / Remove Languages</em> screen. It worked fine in the Gnome 2 version of Mint 12 I installed in a virtual machine, but it gave a <em>Software database is broken</em> error message in the Gnome 3 version of Mint 12 I installed on a physical laptop. I tried reinstalling twice but I kept getting the same problem. It may have been a problem with the laptop because I also had issues when trying to install drivers for the wireless card. I had no issues with Japanese input on Mint in a VM.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/12/03/japanese-input-on-linux-mint-12-lisa/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>ALC Advanced Search Options (英辞郎 on the Web)</title>
		<link>http://www.localizingjapan.com/blog/2011/11/23/alc-advanced-search-options-%e8%8b%b1%e8%be%9e%e9%83%8e-on-the-web/</link>
		<comments>http://www.localizingjapan.com/blog/2011/11/23/alc-advanced-search-options-%e8%8b%b1%e8%be%9e%e9%83%8e-on-the-web/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 01:55:47 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Translation]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=667</guid>
		<description><![CDATA[Space ALC (英辞郎 on the Web) is one of the most useful Japanese translation tools on the Internet. It is a translation dictionary and translation memory that can be searched in both Japanese and English. It has everything from highly technical terminology to colloquial spoken slang. The key feature that separates ALC from all the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.alc.co.jp/index.html">Space ALC (英辞郎 on the Web)</a> is one of the most useful Japanese translation tools on the Internet. It is a translation dictionary and translation memory that can be searched in both Japanese and English. It has everything from highly technical terminology to colloquial spoken slang. The key feature that separates ALC from all the other online dictionaries is the huge set of example sentences it has in its database. Whether you are looking up a word or phrase, ALC returns results for what you looked up as well as in-context example sentences.</p>
<p>ALC has many advanced search options similar to search engines like Google that you can use to refine your search queries. Let&#8217;s take a look at some of these search options.</p>
<h3>Basic Search Options</h3>
<h4>And Search (Word1 Word2)</h4>
<p>Search for phrases containing two or more search terms in the results. The search results will contain all the search terms.</p>
<p>Instructions: Put a space between each search term to be included in the search result.</p>
<p>Example: <a href="http://eow.alc.co.jp/%E9%87%8E%E7%90%83+%E3%82%B5%E3%83%83%E3%82%AB%E3%83%BC/UTF-8/">野球 サッカー</a></p>
<p>Example: <a href="http://eow.alc.co.jp/up+down/UTF-8/">up down</a></p>
<h4>Or Operator [Standalone] (Word1 | Word2)</h4>
<p>Search for phrases containing one or more search terms in the results. The search results will contain at least one of the search terms.</p>
<p>Instructions: Put a | (vertical bar) between the search terms.</p>
<p>Example: <a href="http://eow.alc.co.jp/%E8%A3%BD%E9%80%A0%E8%A3%85%E7%BD%AE+|+%E8%A3%BD%E9%80%A0%E8%A8%AD%E5%82%99/UTF-8/">製造装置 | 製造設備</a></p>
<p>Example: <a href="http://eow.alc.co.jp/USPS+|+FedEx/UTF-8/">USPS | FedEx</a></p>
<h4>Or Operator [Within Phrase] (Word1 | Word2)</h4>
<p>Search for different variations of phrases containing one of the terms in the parenthesis.</p>
<p>Instructions: Put a | (vertical bar) between the search terms that are inside of ().</p>
<p>Example: <a href="http://eow.alc.co.jp/%28%E3%82%B1%E3%83%BC%E3%82%AD%7C%E3%83%94%E3%82%B6%29%E3%82%92%E9%A3%9F%E3%81%B9%E3%81%BE%E3%81%99/UTF-8/">(ケーキ|ピザ)を食べます</a></p>
<p>Example: <a href="http://eow.alc.co.jp/do+%28one%27s+|+my+|+your+|+his+|+her+|+its+|+our+|+their%29+best/UTF-8/?pg=3">do (one&#8217;s | my | your | his | her | its | our | their) best</a></p>
<h4>Exact Phrase Search (&#8220;Phrase&#8221;)</h4>
<p>Search for an exact phrase.</p>
<p>Instructions: Put the phrase within double quotes &#8220;&#8221;.</p>
<p>Example: <a href="http://eow.alc.co.jp/%22open+source+software%22/UTF-8/">&#8220;open source software&#8221;</a></p>
<h3>Advanced Search Options</h3>
<h4>Designating Number of Words In Between (Word {#} Word)</h4>
<p>Specify a certain number of words between search terms.</p>
<p>Instructions: Put the number of words you want to appear between words in braces. For a specific number of words, put one number, like {2}. For a range of possibilities, put the end limits in braces, like {1,3}.</p>
<p>Example: <a href="http://eow.alc.co.jp/make+%7B2%7D+request/UTF-8/">make {2} request</a></p>
<p>This example will find phrases like <em>make a personal request</em> that have two words between <em>make</em> and <em>request</em>, but will not find phrases like <em>make a request</em> that only have one word in between.</p>
<p>Example: <a href="http://eow.alc.co.jp/thank+you+%7B2%2c4%7D+cooperation/UTF-8/">thank you {2,4} cooperation</a></p>
<h4>Search All Conjugations ([Word])</h4>
<p>Search for all variations of an English word such as verb conjugations and plurals, etc.</p>
<p>Instructions: Put the variable word in brackets [].</p>
<p>Example: <a href="http://eow.alc.co.jp/%22%5Bgo%5D+the+distance%22/UTF-8/">&#8220;[go] the distance&#8221;</a></p>
<p>This example finds all forms of the word <em>go</em>, including the past tense <em>went the distance</em>. Notice we put the entire search query in quotes to find the full phrase.</p>
<p>Example: <a href="http://eow.alc.co.jp/%5Btake%5D+pictures+of/UTF-8/">[take] pictures of</a></p>
<p>This example fines <em>take</em>, <em>takes</em>, <em>taking</em>, <em>took</em>, etc.</p>
<h4>Terms to Exclude (-Word)</h4>
<p>Exclude certain translations from your search results. Useful to narrow your focus when there are multiple translations for a word.</p>
<p>Instructions: Put a dash &#8211; before the word to exclude.</p>
<p>Example: <a href="http://eow.alc.co.jp/%E3%82%B5%E3%83%83%E3%82%AB%E3%83%BC+-soccer/UTF-8/">サッカー -soccer</a></p>
<p>This example will find examples of the word 「サッカー」 that exclude the American translation of <em>soccer</em>, and finds those examples that use <em>football</em> instead.</p>
<p>Example: <a href="http://eow.alc.co.jp/Diet+-%E5%9B%BD%E4%BC%9A/UTF-8/">diet -国会</a></p>
<p>This example will exclude the Japanese governmental body the Diet. This is useful if you are looking for food and diet related translations.</p>
<h4>Multiple Search Options</h4>
<p>You can combine search options for really advanced search queries.</p>
<p>Example: <a href="http://eow.alc.co.jp/%22%5Btake%5D+%28my+|+our+|+your+|+his+|+her%29+picture%22+-can/UTF-8/">&#8220;[take] (my | our | your | his | her) picture -can&#8221;</a></p>
<p>This example uses the exact phrase quotes, the conjugation search [], the or operator within a phrase, and the not operator to remove phrases containing the word <em>can</em>.</p>
<p>Searching ALC can often find hundreds of translations. These advanced search options are easy to use and can help narrow down what you are looking for.</p>
<p>For more information, refer to the ALC <a href="http://eowimg.alc.co.jp/content/help/">Help &#8211; Basic Usage</a>, <a href="http://eowimg.alc.co.jp/content/help/howto3.html">Help &#8211; High Level Usage,</a> and <a href="http://eowimg.alc.co.jp/content/help/tips/index.html">Search Tips</a> pages. These ALC help pages are all in Japanese.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/11/23/alc-advanced-search-options-%e8%8b%b1%e8%be%9e%e9%83%8e-on-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Japanese Input on Fedora 16 Linux (Gnome 3)</title>
		<link>http://www.localizingjapan.com/blog/2011/11/12/japanese-input-on-fedora-16-linux-gnome-3/</link>
		<comments>http://www.localizingjapan.com/blog/2011/11/12/japanese-input-on-fedora-16-linux-gnome-3/#comments</comments>
		<pubDate>Sat, 12 Nov 2011 22:03:00 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=669</guid>
		<description><![CDATA[Setting up Japanese input (IME) on Fedora 16 Linux is really easy and only takes a few minutes.. Fedora still uses the IBus keyboard input method system and uses the Anthy Japanese input method for the Japanese keyboard input, so it will be a familiar process to set up and use if you have done [...]]]></description>
			<content:encoded><![CDATA[<p>Setting up Japanese input (IME) on Fedora 16 Linux is really easy and only takes a few minutes..</p>
<p>Fedora still uses the IBus keyboard input method system and uses the Anthy Japanese input method for the Japanese keyboard input, so it will be a familiar process to set up and use if you have done it on earlier Fedora Linux distributions.</p>
<p>For previous versions of Fedora, refer to:</p>
<ul>
<li><a title="Japanese Input on Fedora 15 Linux Gnome 3" href="http://www.localizingjapan.com/blog/2011/06/12/japanese-input-on-fedora-15-linux-gnome-3/">Fedora 15 (Gnome 3)</a></li>
<li><a title="Japanese Input on Fedora 14 Linux" href="http://www.localizingjapan.com/blog/2011/02/20/japanese-input-on-fedora-14-linux/">Fedora 14 (Gnome 2)</a></li>
</ul>
<h3>Fedora 16 Japanese IME Setup Procedure</h3>
<p>To start, open <strong>Activities</strong> from the <em>Top Panel</em>.</p>
<p>In the <em>Search Box</em>, type <em>Input Method</em> and select the <strong>Input Method Selector</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-1.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-1-300x187.png" alt="" width="300" height="187" /></a></p>
<p>In the<em> Input Method Selector</em> screen, select<strong> Use IBus (recommended)</strong>.</p>
<p style="text-align: center"> <a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/11/fedora16-1.png"><img class="aligncenter size-full wp-image-671" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/11/fedora16-1.png" alt="" width="347" height="159" /></a></p>
<p>Press the <strong>Preference&#8230;</strong> link to the right of <em>Use IBus (recommended)</em> to open the <strong>IBus Preferences</strong> screen.</p>
<p>On the<em> Input Method</em> tab, check the <strong>Customize active input methods check box.</strong></p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-3.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-3-300x79.png" alt="" width="300" height="79" /></a></p>
<p>Press the <em>Select an input method </em>dropdown and select<strong> Show all input methods</strong>.</p>
<p>Press the<em> Select an input method</em> dropdown once again and now select <strong>Japanese → Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-4.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-4-300x116.png" alt="" width="300" height="116" /></a></p>
<p>Press the <strong>Add</strong> button, and then press <strong>Close</strong>.</p>
<p>You must log out for the changes to take effect, so press the <strong>Log Out</strong> button on the <em>Input Method Selector</em> screen.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-5.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-5.png" alt="" width="188" height="46" /></a></p>
<p>When you log back in you will now have the <strong>IBus input method framework</strong> button on the Gnome top panel (It looks like a small keyboard). This is the button to change input modes. Open a text editor such as gedit or some other application with a text input window.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-6.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-6-300x91.png" alt="" width="300" height="91" /></a></p>
<p>Press the <strong>IBus input method framework</strong> button and select <strong>Japanese – Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/11/fedora16-2.png"><img class="aligncenter size-full wp-image-672" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/11/fedora16-2.png" alt="" width="233" height="113" /></a></p>
<p>The keyboard icon has now changed to Aち, which shows the letter A and the hiragana character chi, which probably is trying to get something close the the pronunciation of Anthy while indicating Japanese/English input modes.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-13.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-13.png" alt="" width="31" height="24" /></a></p>
<p>You should now be able to type in Japanese.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-9.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/06/fedora15-9.png" alt="" width="234" height="67" /></a></p>
<p>Use the <strong>Anthy Aち</strong> button to toggle between Japanese, English, and other Japanese IME modes.</p>
<p><strong>Note:</strong> I did not have to log out and log back in for the changes to take effect to allow me to type in Japanese in Firefox. However, there may be applications that cannot take advantage of the IME changes until after logging out.</p>
<p><strong>Note</strong>: If you get the message <em>No input window</em> when you try to select Japanese Anthy, make sure you have the mouse cursor in an application with a text input box, such as a text editor or a Web browser.</p>
<p>That&#8217;s it. You should be able to type in Japanese now. Setting up Japanese IME input on Fedora Linux is simple and very similar to previous versions of Fedora.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/11/12/japanese-input-on-fedora-16-linux-gnome-3/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Localizing a CakePHP Application</title>
		<link>http://www.localizingjapan.com/blog/2011/11/10/localizing-a-cakephp-application/</link>
		<comments>http://www.localizingjapan.com/blog/2011/11/10/localizing-a-cakephp-application/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 02:08:18 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Localization]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=642</guid>
		<description><![CDATA[If you build a PHP application using the CakePHP framework, it is easy to localize the application into multiple languages, provided you have the proper translations for those languages. If you want to internationalize your application to a global market, it is important to localize it for each language and region you want to target. [...]]]></description>
			<content:encoded><![CDATA[<p>If you build a PHP application using the <a href="http://cakephp.org/">CakePHP framework</a>, it is easy to localize the application into multiple languages, provided you have the proper translations for those languages. If you want to internationalize your application to a global market, it is important to localize it for each language and region you want to target.</p>
<p>Fortunately, <a href="http://book.cakephp.org/2.0/en/core-libraries/internationalization-and-localization.html">CakePHP</a> and <a href="http://php.net/manual/en/function.setlocale.php">PHP</a> itself provide us with some easy mechanisms to provide translations and localize our code without much effort. You do not have to make copies of HTML or PHP files. Everything will be done with the PHP files you already have, and the translated text strings will dynamically be inserted at render time, ready for the user in their localized language and format.</p>
<p>In this example, we will localize a CakePHP application that was written in English into Japanese. Let&#8217;s assume we have a menu for e-mail functions that we want to localize. This example assumes CakePHP version 2.0 (But earlier and perhaps later versions of CakePHP will work in a similar manner).</p>
<h3>Wrapping Translatable Text in __() Functions</h3>
<p>The first step in the localization process is to identify the text strings that will need to be translated, and replace them with CakePHP&#8217;s localized string __() function. Supposing our menu looked like the following:</p>
<pre>&lt;ul&gt;
   &lt;li&gt;Send&lt;/li&gt;
   &lt;li&gt;Reply&lt;/li&gt;
   &lt;li&gt;Forward&lt;/li&gt;
   &lt;li&gt;Delete&lt;/li&gt;
&lt;/ul&gt;</pre>
<p>We would wrap each text string inside the __() function like as follows:</p>
<pre>&lt;ul&gt;
   &lt;li&gt;&lt;?php echo __('Send') ?&gt;&lt;/li&gt;
   &lt;li&gt;&lt;?php echo __('Reply') ?&gt;&lt;/li&gt;
   &lt;li&gt;&lt;?php echo __('Forward') ?&gt;&lt;/li&gt;
   &lt;li&gt;&lt;?php echo __('Delete') ?&gt;&lt;/li&gt;
&lt;/ul&gt;</pre>
<p>The __() function identifies these strings as translatable text that will differ by language locale and uses the text within the __() function as the message ID. If we define the translations for a certain language, those translations will appear in place of these functions. If we do not define the translations for that language, the text within the __() function will display instead by default.</p>
<h3>Creating the Localized PO Files</h3>
<p>The next step is to create the PO files which will contain the translations for each language to be dynamically inserted in each of the __() functions. CakePHP has two ways you can do this: automatically using the console shell; or manually.</p>
<h4>Using the I18N Shell</h4>
<p>CakePHP has some console shell programs that you can run on the command line, including one to generate the PO file to use as the original language source file for translations. In our case it will be a file with all the English text strings.</p>
<p>To run the i18n shell command, type the following on the Linux command line in your CakePHP application directory:</p>
<pre>./Console/cake i18n extract</pre>
<p>Then follow the onscreen menu.</p>
<p>The shell command will examine all of your application files for instances of the __() function and generate a PO file for the original source language that you can use to create the PO files for each of the translations you are going to use.</p>
<h4>Creating the PO Files Manually</h4>
<p>If you want to do this manually—for example you don&#8217;t have many translatable text strings like in our example—you can create the PO files by hand in a text editor.</p>
<p>First we will create the original source language English version here:</p>
<pre>/app/Locale/eng/LC_MESSAGES/default.po</pre>
<p>The <em>default.po</em> file will have this format:</p>
<pre>msgid "ID"
msgstr "STRING"</pre>
<p>Where <em>msgid</em> is the ID within the __() function; <em>msgstr</em> is the localized translation that should appear as output.</p>
<p>Our full English source PO file will look like this:</p>
<pre>msgid "Send"
msgstr "Send"

msgid "Reply"
msgstr "Reply"

msgid "Forward"
msgstr "Forward"

msgid "Delete"
msgstr "Delete"</pre>
<p>To create the Japanese localized version, we copy the English PO file to the Japanese directory, and then replace the English strings in the <em>msgstr</em> field with the Japanese translations. (If you have a large application being localized into dozens of languages, it is at this point that you send the PO files to a language service provider to translate the localized strings.)</p>
<p>Our localized PO file with Japanese translations will go here:</p>
<pre>/app/Locale/jpn/LC_MESSAGES/default.po</pre>
<p>Our final localized Japanese PO file will look like this.</p>
<pre>msgid "Send"
msgstr "送信"

msgid "Reply"
msgstr "返信"

msgid "Forward"
msgstr "転送"

msgid "Delete"
msgstr "削除"</pre>
<p>That&#8217;s it. The translations for English and Japanese will display appropriately for the proper locale. If you want to add translations for other languages, you do the same process and put the new PO file in the directory that corresponds with that language code. CakePHP uses the <a href="http://www.loc.gov/standards/iso639-2/php/code_list.php">ISO 639-2 </a>standard for naming locales. Follow that standard for naming your localized directories. Make sure you save these files at UTF-8.</p>
<h3>Detecting and Changing Languages and Locales</h3>
<p>Having the translations ready is nice, but you still have to detect and change to the proper language in your PHP code to get the translations to appear. Detecting the user&#8217;s language is tricky. You could do it in JavaScript or try to use something like the following PECL function:</p>
<pre>locale_accept_from_http($_SERVER['HTTP_ACCEPT_LANGUAGE']);</pre>
<p>In either case, you have to trust the user agent to report back the right language.</p>
<p>Another option is to simply have a button or menu for the user to select their language. Flag icons usually work well for this. Then you can set the language manually. In CakePHP it is easy to do with the <em>CakeSession</em> class:</p>
<pre>CakeSession::write('Config.language', 'jpn);</pre>
<p>Finally, you also have to set the locale in PHP. Since you already know this from however you determined the language above, you can use PHP&#8217;s <em>setlocale</em> function to do this. This is important for localization of date, time, money, and numeric separator formats among others.</p>
<pre>setlocale("LC_ALL", "ja_JP.utf8");</pre>
<p>That&#8217;s all there is to localizing a CakePHP application with proper translations and other locale-specific customizations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/11/10/localizing-a-cakephp-application/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Japanese Input on Ubuntu Linux 11.10 Oneiric Ocelot</title>
		<link>http://www.localizingjapan.com/blog/2011/10/16/japanese-input-on-ubuntu-linux-11-10-oneiric-ocelot/</link>
		<comments>http://www.localizingjapan.com/blog/2011/10/16/japanese-input-on-ubuntu-linux-11-10-oneiric-ocelot/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 18:44:28 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=591</guid>
		<description><![CDATA[This tutorial will show you how to set up Japanese input (IME) on Ubuntu 11.10 from the Unity interface. The installation procedure is very similar to the previous Unity release of Ubuntu 11.04. In fact, it is a little bit easier on 11-10. For Ubuntu 10.04 under Gnome, refer to this post. Setup Procedure To [...]]]></description>
			<content:encoded><![CDATA[<p>This tutorial will show you how to set up Japanese input (IME) on Ubuntu 11.10 from the Unity interface. The installation procedure is very similar to the previous Unity release of <a title="Japanese Input on Ubuntu Linux 11.04 Natty Narwhal" href="http://www.localizingjapan.com/blog/2011/05/07/japanese-input-on-ubuntu-linux-11-04-natty-narwhal/">Ubuntu 11.04</a>. In fact, it is a little bit easier on 11-10. For Ubuntu 10.04 under Gnome, refer to <a title="Japanese Input on Ubuntu Linux 10.04 LTS Lucid Lynx" href="http://www.localizingjapan.com/blog/2010/06/15/setting-up-japanese-input-on-ubuntu-linux-10-04-lts-lucid-lynx/">this post</a>.</p>
<h3>Setup Procedure</h3>
<p>To start, select <strong>Dash home</strong> from the <em>Unity Launcher</em>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-1.png"><img class="aligncenter size-full wp-image-592" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-1.png" alt="" width="216" height="60" /></a></p>
<p>From the <em>Dash home</em>, select <strong>More Apps</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-2.png"><img class="aligncenter size-full wp-image-593" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-2.png" alt="" width="114" height="138" /></a></p>
<p>From the <em>Installed</em> menu area, select <strong>See more results</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-3.png"><img class="aligncenter size-full wp-image-594" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-3.png" alt="" width="244" height="29" /></a></p>
<p>Scroll down and select <strong>Language Support</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-4.png"><img class="aligncenter size-full wp-image-595" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-4.png" alt="" width="78" height="109" /></a></p>
<p>On the <em>Language</em> tab of the <em>Language Support</em> screen, press I<strong>nstall / Remove Languages&#8230;</strong></p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-3.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-3.png" alt="" width="209" height="35" /></a></p>
<p>On the <em>Installed Languages</em> screen, scroll down to <strong>Japanese</strong> and check <strong>Installed</strong>, and then press <strong>Apply Changes</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-5.png"><img class="aligncenter size-medium wp-image-596" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-5-300x251.png" alt="" width="300" height="251" /></a></p>
<p>Enter your password on the <em>Authenticate</em> screen.</p>
<p>It will take a few moments to download and install the Japanese IME packages.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-6.png"><img class="aligncenter size-medium wp-image-597" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-6-300x139.png" alt="" width="300" height="139" /></a></p>
<p>Back on the <em>Language Support</em> screen, select <strong>ibus</strong> for the <strong>Keyboard input method system</strong>, and then press <strong>Close</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-6.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-6-300x67.png" alt="" width="300" height="67" /></a></p>
<p>Once again select <strong>Dash home </strong>from the <em>Unity Launcher</em>.</p>
<p>From the <em>Dash home</em>, select <strong>More Apps</strong>.</p>
<p>From the <em>Installed</em> menu area, select <strong>See more results</strong>.</p>
<p>Scroll down and select <strong>Keyboard Input Methods</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-7.png"><img class="aligncenter size-full wp-image-598" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/10/Ubuntu11-10-7.png" alt="" width="109" height="112" /></a></p>
<p>You may get a pop up message saying <em>Keyboard Input Methods (IBus Daemon) has not been started. Do you want to start it now?</em> Select <strong>Yes</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-8.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-8-300x119.png" alt="" width="300" height="119" /></a></p>
<p>On the <em>Input Method</em> tab of the <em>Ibus Preferences</em> screen, press <strong>Select an input method</strong> and select <strong>Japanese → Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-9.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-9-300x121.png" alt="" width="300" height="121" /></a></p>
<p>Press <strong>Add</strong> and then press <strong>Close</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-10.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-10-300x29.png" alt="" width="300" height="29" /></a></p>
<p>The Ibus keyboard icon will now display on the top panel.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-11.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-11-300x25.png" alt="" width="300" height="25" /></a></p>
<p>Open up any application with a text box such as Tomboy Notes and place the cursor in the text box.</p>
<p>Press the <strong>Ibus keyboard icon</strong> on the tap panel and select <strong>Japanese-Anthy</strong>.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-12.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-12.png" alt="" width="162" height="56" /></a></p>
<p>The Ibus keyboard icon will now change to the <strong>Anthy Aち</strong> icon.</p>
<p style="text-align: center"><a href="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-13.png"><img class="aligncenter" src="http://www.localizingjapan.com/blog/wp-content/uploads/2011/05/Ubuntu11-04-13.png" alt="" width="31" height="24" /></a></p>
<p>That&#8217;s it. You can now type in Japanese in Ubuntu 11.10. 簡単にできますね。</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/10/16/japanese-input-on-ubuntu-linux-11-10-oneiric-ocelot/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Special Concerns for Translating Japanese Using Translation Memory</title>
		<link>http://www.localizingjapan.com/blog/2011/10/06/special-concerns-for-translating-japanese-using-translation-memory/</link>
		<comments>http://www.localizingjapan.com/blog/2011/10/06/special-concerns-for-translating-japanese-using-translation-memory/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 04:21:17 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[Japanese]]></category>
		<category><![CDATA[Localization]]></category>
		<category><![CDATA[Trados]]></category>
		<category><![CDATA[Translation]]></category>

		<guid isPermaLink="false">http://www.localizingjapan.com/blog/?p=567</guid>
		<description><![CDATA[The use of translation memory, such as software products like SDL Trados, greatly increase the speed and efficiency of a translator. However, there are special concerns that must be taken into account when translating with a translation memory where Japanese is the source language. Japanese has some linguistic characteristics that are significantly different from English, [...]]]></description>
			<content:encoded><![CDATA[<p>The use of translation memory, such as software products like SDL Trados, greatly increase the speed and efficiency of a translator. However, there are special concerns that must be taken into account when translating with a translation memory where Japanese is the source language. Japanese has some linguistic characteristics that are significantly different from English, and when using a Japanese to English translation memory, you can run into trouble if you are not careful.</p>
<p>The biggest benefit a translation memory can bring you is providing you with a 100% match and eliminating any translation work for that sentence. Best practices say you should always proofread your translations, even if it is from a 100% TM match—although it is hardly ever done. With Japanese, however, you check your 100% matches because the translations may not be accurate for reasons we will discuss.</p>
<h3>Plurals</h3>
<p>Japanese does not have different singular and plural forms of nouns the same way English does. There are specific instances where a plural-like form is used, but these are the exception rather than the norm. Let&#8217;s look at a simple example:</p>
<p>ねじを取り外す。</p>
<p>You could translate this sentence two different ways:</p>
<ul>
<li>Remove the screw.</li>
<li>Remove the screws.</li>
</ul>
<p>Which is correct? Well, that depends on how many screws there are. In Japanese, this one sentence covers both instances. Suppose your translation memory had only this translation pair in the database:</p>
<ul>
<li>JA: ねじを取り外す。</li>
<li>EN: Remove the screw.</li>
</ul>
<p>If the sentence you are currently translating matches the Japanese, but in this present context there are multiple screws, the matching 100% translation is not correct.</p>
<p>This shows why context is important—even more so in Japanese. And if you use software like SDL Trados or some other CAT tool that only provides you with an XLIFF file, you may not have the surrounding images and context to know whether there is one or many screws.</p>
<p>How can we remedy this for the next person that uses our translation memory? We can definitely save a new translation for this sentence, and our TM now looks like this.</p>
<ul>
<li>JA: ねじを取り外す。</li>
<li>EN: Remove the screw.</li>
<li>EN: Remove the screws.</li>
</ul>
<p>This is fine for the translator—they can cycle through the multiple translations and select the best one, assuming they know the context to be able to pick the right one. On the other hand, this is not ideal for the person paying for the translation. Generally only 100% matches are done for free or at a greatly reduced price. When duplicate translations exists for a single source segment, SDL Trados and other software will flag this with some sort of penalty so it will be less than a 100% match, often a 99% match, which will cost more to translate.</p>
<p>The best way to deal with this, and the hardest to implement, is for the original Japanese language authors to write with context, knowing that their documents will be the source language for translation. Ideally, there should be multiple versions of the Japanese sentence. For example</p>
<ul>
<li>JA: ねじ(１本)を取り外す。</li>
<li>EN: Remove the screw.</li>
</ul>
<ul>
<li>JA: ねじ(２本)を取り外す。</li>
<li>EN: Remove the two screws.</li>
</ul>
<ul>
<li>JA: ねじ(３本)を取り外す。</li>
<li>EN: Remove the three screws.</li>
</ul>
<p>If the source Japanese text is written with specific contextual information, this solves the problem and there will not be any ambiguous 100% hits in the translation memory. Unfortunately, original source texts are hardly ever written with translation in mind.</p>
<h3>Capital and Lowercase Letters</h3>
<p>Japanese similarly does not have an equivalent of capital and lowercase letters. A hiragana is a hiragana and a kanji character is a kanji character. In English, usually only the first word of a sentence is capitalized. This is called sentence capitalization. However, titles, headings, etc. have all the major words capitalized. This is called heading capitalization.</p>
<p>Japanese will have the same exactly sentence whether it is a title/heading sentence or it is a normal sentence in the text body. English will have two variations. One with heading capitalization, and one with sentence capitalization. Similar to the ambiguity with plurals, we have ambiguity with capitalization. If we only have the heading capitalization style sentence in the translation memory, that will hit a 100% match when the same Japanese sentence appears in the text body, but the corresponding English 100% match will have the wrong capitalization.</p>
<p>Unlike the pluralization problem, there is no clear fix to avoid the capitalization problem. There is no simple and obvious way we could rewrite the original Japanese text to have multiple variations for heading and sentence style contexts. In these instances, it is important to verify all 100% matches in the translation memory for the proper context.</p>
<h3>Sentences with No Subject or Object</h3>
<p>This is something uniquely Japanese: sentences with no subject. This is completely normal in Japanese—and absolutely unheard of in English. Sentences can also have direct object verbs with no object whatsoever. There is nothing wrong with sentences without subjects or objects in Japanese. The problem, however, is when translating these sentences. It is difficult without proper context. Now, consider translating with a translation memory, and you can begin to understand the complexity of the situation.</p>
<p>Consider how you would translate this Japanese sentence:</p>
<p>終わったら、取り外す。</p>
<p>This sentence has no subject and no object. In context it is probably clear what the meaning is, but by itself it is all sorts of vague. Let&#8217;s imagine two completely different, but totally reasonable translations for this sentence:</p>
<ul>
<li>When it&#8217;s done, remove it.</li>
<li>When you are finished, take apart the pieces</li>
</ul>
<p>Both of these are reasonable translations in two completely different contexts. However if the first English translation is registered in the translation memory and it came up as a 100% match, it would be totally wrong if the context were the second sentence.</p>
<p>Context is everything when translating ambiguous Japanese sentences. But a translation memory does not preserve that context. Even if you know what came before and what comes afterwards, that still may not be enough to know the full context of the original Japanese meaning. Even though you are getting a 100% match in the translation memory, instead of being just a little wrong such as singular/plural or capitalization mistakes, it may be completely wrong in terms of meaning!</p>
<h3>Same Words Written Differently</h3>
<p>In English we have many words and expressions that have the same meaning, and therefore, we can say the same thing many different ways. But Japanese takes this a step further: you can write the same word many different ways!</p>
<p>For example, the word <em>screw</em> could be written: ねじ、ネジ、ﾈｼﾞ、螺子. That&#8217;s four different ways to write the same word.</p>
<p>Another example, the word <em>install</em> could be written: 取り付ける、取付ける、取りつける、とりつける. Again, that is four different ways to write the same word, and we didn&#8217;t even consider other forms such as です・ます調 or 敬語.</p>
<p>Now, take these two words and construct the same sentence, and look at the number of possibilities you have to say the same exact thing with the same exact words, only written differently.</p>
<ul>
<li>ねじを取り付ける。</li>
<li>ねじを取付ける。</li>
<li>ねじを取りつける。</li>
<li>ねじをとりつける。</li>
<li>ネジを取り付ける。</li>
<li>ネジを取付ける。</li>
<li>ネジを取りつける。</li>
<li>ネジをとりつける。</li>
<li>ﾈｼﾞを取り付ける。</li>
<li>ﾈｼﾞを取付ける。</li>
<li>ﾈｼﾞを取りつける。</li>
<li>ﾈｼﾞをとりつける。</li>
<li>ﾈ螺子を取り付ける。</li>
<li>ﾈ螺子を取付ける。</li>
<li>ﾈ螺子を取りつける。</li>
<li>ﾈ螺子をとりつける。</li>
</ul>
<p>That is 16 different possibilities! Now imagine your translation memory only has one of these variations registered in the database. When you come across the exact same sentence, but only written differently, you will not get a 100% match, even though you have basically the exact same sentence in one form or another right in front of you. And this sentence is so short, you might not even get any match at all if the hit percentage is set high.</p>
<p>Author variation and style guides conformance are very important in the original source language to prevent these kinds of problems. This is big issue in itself that I&#8217;ll take up in another article.</p>
<p>Japanese is very different from English, and when translating, you have to take into greater account the textual context and other issues. And this becomes even more so when dealing with a translation memory containing Japanese as a source language.</p>
<p>Translation memory software such as SDL Trados is very useful, and can be used to great benefit even in Japanese. However, you must be aware of these kinds of issues and double check all of your translations, especially your 100% matches.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.localizingjapan.com/blog/2011/10/06/special-concerns-for-translating-japanese-using-translation-memory/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

