Archive for October, 2011

Japanese Input on Ubuntu Linux 11.10 Oneiric Ocelot

Sunday, October 16th, 2011

This tutorial will show you how to set up Japanese input (IME) on Ubuntu 11.10 from the Unity interface. The installation procedure is very similar to the previous Unity release of Ubuntu 11.04. In fact, it is a little bit easier on 11-10. For Ubuntu 10.04 under Gnome, refer to this post.

Setup Procedure

To start, select Dash home from the Unity Launcher.

From the Dash home, select More Apps.

From the Installed menu area, select See more results.

Scroll down and select Language Support.

On the Language tab of the Language Support screen, press Install / Remove Languages…

On the Installed Languages screen, scroll down to Japanese and check Installed, and then press Apply Changes.

Enter your password on the Authenticate screen.

It will take a few moments to download and install the Japanese IME packages.

Back on the Language Support screen, select ibus for the Keyboard input method system, and then press Close.

Once again select Dash home from the Unity Launcher.

From the Dash home, select More Apps.

From the Installed menu area, select See more results.

Scroll down and select Keyboard Input Methods.

You may get a pop up message saying Keyboard Input Methods (IBus Daemon) has not been started. Do you want to start it now? Select Yes.

On the Input Method tab of the Ibus Preferences screen, press Select an input method and select Japanese → Anthy.

Press Add and then press Close.

The Ibus keyboard icon will now display on the top panel.

Open up any application with a text box such as Tomboy Notes and place the cursor in the text box.

Press the Ibus keyboard icon on the tap panel and select Japanese-Anthy.

The Ibus keyboard icon will now change to the Anthy Aち icon.

That’s it. You can now type in Japanese in Ubuntu 11.10. 簡単にできますね。

Special Concerns for Translating Japanese Using Translation Memory

Thursday, October 6th, 2011

The use of translation memory, such as software products like SDL Trados, greatly increase the speed and efficiency of a translator. However, there are special concerns that must be taken into account when translating with a translation memory where Japanese is the source language. Japanese has some linguistic characteristics that are significantly different from English, and when using a Japanese to English translation memory, you can run into trouble if you are not careful.

The biggest benefit a translation memory can bring you is providing you with a 100% match and eliminating any translation work for that sentence. Best practices say you should always proofread your translations, even if it is from a 100% TM match—although it is hardly ever done. With Japanese, however, you check your 100% matches because the translations may not be accurate for reasons we will discuss.

Plurals

Japanese does not have different singular and plural forms of nouns the same way English does. There are specific instances where a plural-like form is used, but these are the exception rather than the norm. Let’s look at a simple example:

ねじを取り外す。

You could translate this sentence two different ways:

  • Remove the screw.
  • Remove the screws.

Which is correct? Well, that depends on how many screws there are. In Japanese, this one sentence covers both instances. Suppose your translation memory had only this translation pair in the database:

  • JA: ねじを取り外す。
  • EN: Remove the screw.

If the sentence you are currently translating matches the Japanese, but in this present context there are multiple screws, the matching 100% translation is not correct.

This shows why context is important—even more so in Japanese. And if you use software like SDL Trados or some other CAT tool that only provides you with an XLIFF file, you may not have the surrounding images and context to know whether there is one or many screws.

How can we remedy this for the next person that uses our translation memory? We can definitely save a new translation for this sentence, and our TM now looks like this.

  • JA: ねじを取り外す。
  • EN: Remove the screw.
  • EN: Remove the screws.

This is fine for the translator—they can cycle through the multiple translations and select the best one, assuming they know the context to be able to pick the right one. On the other hand, this is not ideal for the person paying for the translation. Generally only 100% matches are done for free or at a greatly reduced price. When duplicate translations exists for a single source segment, SDL Trados and other software will flag this with some sort of penalty so it will be less than a 100% match, often a 99% match, which will cost more to translate.

The best way to deal with this, and the hardest to implement, is for the original Japanese language authors to write with context, knowing that their documents will be the source language for translation. Ideally, there should be multiple versions of the Japanese sentence. For example

  • JA: ねじ(1本)を取り外す。
  • EN: Remove the screw.
  • JA: ねじ(2本)を取り外す。
  • EN: Remove the two screws.
  • JA: ねじ(3本)を取り外す。
  • EN: Remove the three screws.

If the source Japanese text is written with specific contextual information, this solves the problem and there will not be any ambiguous 100% hits in the translation memory. Unfortunately, original source texts are hardly ever written with translation in mind.

Capital and Lowercase Letters

Japanese similarly does not have an equivalent of capital and lowercase letters. A hiragana is a hiragana and a kanji character is a kanji character. In English, usually only the first word of a sentence is capitalized. This is called sentence capitalization. However, titles, headings, etc. have all the major words capitalized. This is called heading capitalization.

Japanese will have the same exactly sentence whether it is a title/heading sentence or it is a normal sentence in the text body. English will have two variations. One with heading capitalization, and one with sentence capitalization. Similar to the ambiguity with plurals, we have ambiguity with capitalization. If we only have the heading capitalization style sentence in the translation memory, that will hit a 100% match when the same Japanese sentence appears in the text body, but the corresponding English 100% match will have the wrong capitalization.

Unlike the pluralization problem, there is no clear fix to avoid the capitalization problem. There is no simple and obvious way we could rewrite the original Japanese text to have multiple variations for heading and sentence style contexts. In these instances, it is important to verify all 100% matches in the translation memory for the proper context.

Sentences with No Subject or Object

This is something uniquely Japanese: sentences with no subject. This is completely normal in Japanese—and absolutely unheard of in English. Sentences can also have direct object verbs with no object whatsoever. There is nothing wrong with sentences without subjects or objects in Japanese. The problem, however, is when translating these sentences. It is difficult without proper context. Now, consider translating with a translation memory, and you can begin to understand the complexity of the situation.

Consider how you would translate this Japanese sentence:

終わったら、取り外す。

This sentence has no subject and no object. In context it is probably clear what the meaning is, but by itself it is all sorts of vague. Let’s imagine two completely different, but totally reasonable translations for this sentence:

  • When it’s done, remove it.
  • When you are finished, take apart the pieces

Both of these are reasonable translations in two completely different contexts. However if the first English translation is registered in the translation memory and it came up as a 100% match, it would be totally wrong if the context were the second sentence.

Context is everything when translating ambiguous Japanese sentences. But a translation memory does not preserve that context. Even if you know what came before and what comes afterwards, that still may not be enough to know the full context of the original Japanese meaning. Even though you are getting a 100% match in the translation memory, instead of being just a little wrong such as singular/plural or capitalization mistakes, it may be completely wrong in terms of meaning!

Same Words Written Differently

In English we have many words and expressions that have the same meaning, and therefore, we can say the same thing many different ways. But Japanese takes this a step further: you can write the same word many different ways!

For example, the word screw could be written: ねじ、ネジ、ネジ、螺子. That’s four different ways to write the same word.

Another example, the word install could be written: 取り付ける、取付ける、取りつける、とりつける. Again, that is four different ways to write the same word, and we didn’t even consider other forms such as です・ます調 or 敬語.

Now, take these two words and construct the same sentence, and look at the number of possibilities you have to say the same exact thing with the same exact words, only written differently.

  • ねじを取り付ける。
  • ねじを取付ける。
  • ねじを取りつける。
  • ねじをとりつける。
  • ネジを取り付ける。
  • ネジを取付ける。
  • ネジを取りつける。
  • ネジをとりつける。
  • ネジを取り付ける。
  • ネジを取付ける。
  • ネジを取りつける。
  • ネジをとりつける。
  • ネ螺子を取り付ける。
  • ネ螺子を取付ける。
  • ネ螺子を取りつける。
  • ネ螺子をとりつける。

That is 16 different possibilities! Now imagine your translation memory only has one of these variations registered in the database. When you come across the exact same sentence, but only written differently, you will not get a 100% match, even though you have basically the exact same sentence in one form or another right in front of you. And this sentence is so short, you might not even get any match at all if the hit percentage is set high.

Author variation and style guides conformance are very important in the original source language to prevent these kinds of problems. This is big issue in itself that I’ll take up in another article.

Japanese is very different from English, and when translating, you have to take into greater account the textual context and other issues. And this becomes even more so when dealing with a translation memory containing Japanese as a source language.

Translation memory software such as SDL Trados is very useful, and can be used to great benefit even in Japanese. However, you must be aware of these kinds of issues and double check all of your translations, especially your 100% matches.

Japanese Input on OpenSUSE Linux 11.4 (KDE 4.6)

Monday, October 3rd, 2011

Setting up Japanese input on OpenSUSE Linux is not difficult, but it requires knowing what to install and when to restart the Linux system. It only takes a few minutes to download all the files and get it set up. Once installed and configured, you will be able to input Japanese characters and type Japanese and English whenever you want.

Prerequisites

  • YaST software repositories are configured properly.

Setup Procedure

Click on the Kickoff Application Launcher.

On the Computer tab, click Install/Remove Software.

On the Search tab, search for anthy.

In the search results window showing the matching packages, select the anthy and ibus-anthy packages.

Press the Accept button on the bottom right of the window.

YaST will now download, install, and configure the anthy packages.

Do the same for ibus. Open Install/Remove Software, search for ibus, and select the package for ibus. Press Accept to install.

Click on the Kickoff Application Launcher, and from the Leave tab, click Restart to restart openSUSE with the new configuration.

 

After restarting, log back in.

You will now have the IBus input method framework icon in the bottom panel.

Right click the IBus input method framework icon and click on Preferences.

On the Input Method tab, select Japanese → Anthy from the dropdown menu.

Press the Add button to add Japanese Anthy input method, and then press Close.

Open up a text editor or any application with a text input window, and click on the IBus input method framework icon and select Japanese – Anthy.

You can now type in Japanese.

Click the Anthy crown icon to select between the various Japanese input modes.

That’s it. Setting up Japanese input on openSUSE 11.4 is not very difficult. When you try to type Japanese, make sure the cursor is in a text box in an application, or you may get an error saying No input window. Now enjoy your international Linux distribution.