[go: nahoru, domu]

If you are targeting users in more than one country, chances are you already heard about rel-alternate-hreflang. If you haven't, in short, this annotation enables Google and other search engines to serve the correct language or regional version of pages to searchers, which can lead to increased user satisfaction.

Making sure the deployed annotations are usable by search engines can be rather difficult, especially on sites with many pages, and site owners all around the world haven’t been shy telling us about this. Today we're releasing a feature that should make debugging rel-alternate-hreflang annotations much easier.

The Language Targeting section in the International Targeting feature enables you to identify two of the most common issues with hreflang annotations:
  • Missing return links: annotations must be confirmed from the pages they are pointing to. If page A links to page B, page B must link back to page A, otherwise the annotations may not be interpreted correctly.
    For each error of this kind we report where and when we detected them, as well as where the return link is expected to be.
incorrect_backlinks.png

  • Incorrect hreflang values: The value of the hreflang attribute must either be a language code in ISO 639-1 format such as "es", or a combination of language and country code such as "es-AR", where the country code is in ISO 3166-1 Alpha 2 format.
    In case our indexing systems detect language or country codes that are not in these formats, we provide example URLs to help you fix them.

incorrect_language.png

Additionally, we've moved the geographic targeting setting to this part of Webmaster Tools, so that you can find all information relevant to international and multilingual targeting in the same place.

We hope you'll find this new feature useful and that it helps you to identify issues with the rel-hreflang-implementation on your site. If you have comments or questions about the feature, please post in our Webmaster Help Forum.

Posted by Gary Illyes, Webmaster Trends


If you are doing business in more than one country or targeting different languages, we recommend having separate sites or sections with specific content on each URLs targeted for individual countries or languages. For instance one page for US and english-speaking visitors, and a different page for France and french-speaking users. While we have information on handling multi-regional and multilingual sites, the homepage can be a bit special. This post will help you create the right homepage on your website to serve the appropriate content to users depending on their language and location.

There are three ways to configure your homepage / landing page when your users access it:
  • Show everyone the same content.
  • Let users choose.
  • Serve content depending on users’ localization and language.
Let’s have a look at each in detail.
Show users worldwide the same content 
In this scenario, you decide to serve specific content for one given country and language on your homepage / generic URL (http://www.example.com). This content will be available to anyone who accesses that URL directly in their browser or those who search for that URL specifically. As mentioned above, all country & language versions should also be accessible on their own unique URLs.


Note: You can show a banner on your page to suggest a more appropriate version to users from other locations or with different language settings.
Let users choose which local version and which language they want 
In this configuration, you decide to serve a country selector page on your homepage / generic URL and to let users choose which content they want to see depending on country and language. All users who type in that URL can access the same page.

If you implement this scenario on your international site, remember to use the x-default rel-alternate-hreflang annotation for the country selector page, which was specifically created for these kinds of pages. The x-default value helps us recognize pages that are not specific to one language or region.

Automatically redirect users or dynamically serve the appropriate HTML content depending on users’ location and language settings
A third scenario would be to automatically serve the appropriate HTML content to your users depending on their location and language settings. You will either do that by using server-side 302 redirects or by dynamically serving the right HTML content.

Remember to use x-default rel-alternate-hreflang annotation on the homepage / generic page even if the latter is a redirect page that is not accessible directly for users.

Note: Think about redirecting users for whom you do not have a specific version. For instance, French-speaking users on a website that has English, Spanish and Chinese versions. Show them the content that you consider the most appropriate.

Whatever configuration you decide to go with, you should make sure all the pages – including country and language selector pages:
  • Have rel-alternate-hreflang annotations.
  • Are accessible for Googlebot's crawling and indexing: do not block the crawling or indexing of your localized pages.
  • Always allow users to switch local version or language: you can do that using a drop down menu for instance.
Reminder: As mentioned in the beginning, remember that you must have separate URLs for each country and language version. 
About rel-alternate-hreflang annotations
Remember to annotate all your pages - whatever method you choose. This will greatly help search engines to show the right results to your users.

Country selector pages and redirecting or dynamically serving homepages should all use the x-default hreflang, which was specifically designed for auto-redirecting homepages and country selectors. 

Finally, here are a few useful reminders about rel-alternate-hreflang annotations in general:
  • Your annotations must be confirmed from the other pages. If page A links to page B, page B must link back to page A, otherwise, your annotations may not be interpreted correctly.
  • Your annotations should be self-referential. Page A should use rel-alternate-hreflang annotation linking to itself.
  • You can specify the rel-alternate-hreflang annotations in the HTTP header, in the head section of the HTML, or in a sitemap file. We strongly recommend that you choose only one way to implement the annotations, in order to avoid inconsistent signals and errors.
  • The value of the hreflang attribute must be in ISO 639-1 format for the language, and in ISO 3166-1 Alpha 2 format for the region. Specifying only the region is not supported. If you wish to configure your site only for a country, use the geotargeting feature in Webmaster Tools
Following these recommendations will help us better understand your localized content and serve more relevant results to your users in our search results. As always, if you have any questions or feedback, please tell us in the internationalization Webmaster Help Forum.

Note from the editors: After previously looking into various ways to handle internationalization for Google’s web-search, here’s a post from Google Web Studio team members with tips for web developers.

Many websites exist in more than one language, and more and more websites are made available for more than one language. Yet, building a website for more than one language doesn’t simply mean translation, or localization (L10N), and that’s it. It requires a few more things, all of which are related to internationalization (I18N). In this post we share a few tips for international websites.

1. Make pages I18N-ready in the markup, not the style sheets

Language and directionality are inherent to the contents of the document. If possible you should hence always use markup, not style sheets, for internationalization purposes. Use @lang and @dir, at least on the html element:

<html lang="ar" dir="rtl">

Avoid coming up with your own solutions like special classes or IDs.

As for I18N in style sheets, you can’t always rely on CSS: The CSS spec defines that conforming user agents may ignore properties like direction or unicode-bidi. (For XML, the situation changes again. XML doesn’t offer special internationalization markup, so here it’s advisable to use CSS.)

2. Use one style sheet for all locales

Instead of creating separate style sheets for LTR and RTL directionality, or even each language, bundle everything in one style sheet. That makes your internationalization rules much easier to understand and maintain.

So instead of embedding an alternative style sheet like

<link href="default.rtl.css" rel="stylesheet">

just use your existing

<link href="default.css" rel="stylesheet">

When taking this approach you’ll need to complement existing CSS rules by their international counterparts:

3. Use the [dir='rtl'] attribute selector

Since we recommend to stick with the style sheet you have (tip #2), you need a different way of selecting elements you need to style differently for the other directionality. As RTL contents require specific markup (tip #1), this should be easy: For most modern browsers, we can simply use [dir='rtl'].

Here’s an example:

aside {
 float: right;
 margin: 0 0 1em 1em;
}

[dir='rtl'] aside {
 float: left;
 margin: 0 1em 1em 0; 
}

4. Use the :lang() pseudo class

To target documents of a particular language, use the :lang() pseudo class. (Note that we’re talking documents here, not text snippets, as targeting snippets of a particular language makes things a little more complex.)

For example, if you discover that bold formatting doesn’t work very well for Chinese documents (which indeed it does not), use the following:

:lang(zh) strong,
:lang(zh) b {
 font-weight: normal;
 color: #900;
}

5. Mirror left- and right-related values

When working with both LTR and RTL contents it’s important to mirror all the values that change directionality. Among the properties to watch out for is everything related to borders, margins, and paddings, but also position-related properties, float, or text-align.

For example, what’s text-align: left in LTR needs to be text-align: right in RTL.

There are tools to make it easy to “flip” directionality. One of them is CSSJanus, though it has been written for the “separate style sheet” realm, not the “same style sheet” one.

6. Keep an eye on the details

Watch out for the following items:
  • Images designed for left or right, like arrows or backgrounds, light sources in box-shadow and text-shadow values, and JavaScript positioning and animations: These may require being swapped and accommodated for in the opposite directionality.
  • Font sizes and fonts, especially for non-Latin alphabets: Depending on the script and font, the default font size may be too small. Consider tweaking the size and, if necessary, the font.
  • CSS specificity: When using the [dir='rtl'] (or [dir='ltr']) hook (tip #2), you’re using a selector of higher specificity. This can lead to issues. Just have an eye out, and adjust accordingly.

If you have any questions or feedback, check the Internationalization Webmaster Help Forum, or leave your comments here.

Webmaster Level: All

The homepages of multinational and multilingual websites are sometimes configured to point visitors to localized pages, either via redirects or by changing the content to reflect the user’s language. Today we’ll introduce a new rel-alternate-hreflang annotation that the webmaster can use to specify such homepages that is supported by both Google and Yandex.

To see this in action, let’s look at an example. The website example.com has content that targets users around the world as follows:

Map of the world illustrating which hreflang code to use for which locale

In this case, the webmaster can annotate this cluster of pages using rel-alternate-hreflang using Sitemaps or using HTML link tags like this:


<link rel="alternate" href="http://example.com/en-gb" hreflang="en-gb" />
<link rel="alternate" href="http://example.com/en-us" hreflang="en-us" />
<link rel="alternate" href="http://example.com/en-au" hreflang="en-au" />
<link rel="alternate" href="http://example.com/" hreflang="x-default" />

The new x-default hreflang attribute value signals to our algorithms that this page doesn’t target any specific language or locale and is the default page when no other page is better suited. For example, it would be the page our algorithms try to show French-speaking searchers worldwide or English-speaking searchers on google.ca.

The same annotation applies for homepages that dynamically alter their contents based on a user’s perceived geolocation or the Accept-Language headers. The x-default hreflang value signals to our algorithms that such a page doesn’t target a specific language or locale.

As always, if you have any questions or feedback, please tell us in the Internationalization Webmaster Help Forum.


Webmaster level: All
(Cross-posted on the Google Translate Blog)

Since we first launched the Website Translator plugin back in September 2009, more than a million websites have added the plugin. While we’ve kept improving our machine translation system since then, we may not reach perfection until someone invents full-blown Artificial Intelligence. In other words, you’ll still sometimes run into translations we didn’t get quite right.

So today, we’re launching a new experimental feature (in beta) that lets you customize and improve the way the Website Translator translates your site. Once you add the customization meta tag to a webpage, visitors will see your customized translations whenever they translate the page, even when they use the translation feature in Chrome and Google Toolbar. They’ll also now be able to ‘suggest a better translation’ when they notice a translation that’s not quite right, and later you can accept and use that suggestion on your site.

To get started:
  1. Add the Website Translator plugin and customization meta tag to your website
  2. Then translate a page into one of 60+ languages using the Website Translator
To tweak a translation:
  1. Hover over a translated sentence to display the original text
  2. Click on ‘Contribute a better translation’
  3. And finally, click on a phrase to choose an automatic alternative translation -- or just double-click to edit the translation directly.
For example, if you’re translating your site into Spanish, and you want to translate Cat not to gato but to Cat, you can tweak it as follows:


If you’re signed in, the corrections made on your site will go live right away -- the next time a visitor translates a page on your website, they’ll see your correction. If one of your visitors contributes a better translation, the suggestion will wait until you approve it. You can also invite other editors to make corrections and add translation glossary entries. You can learn more about these new features in the Help Center.

This new experimental feature is currently free of charge. We hope this feature, along with Translator Toolkit and the Translate API, can provide a low cost way to expand your reach globally and help to break down language barriers.

Webmaster level: All
In December 2011 we announced annotations for sites that target users in many languages and, optionally, countries. These annotations define a cluster of equivalent pages that target users around the world, and were implemented using rel-alternate-hreflang link elements in the HTML of each page in the cluster.
Based on webmaster feedback and other considerations, today we’re adding support for specifying the rel-alternate-hreflang annotations in Sitemaps. Using Sitemaps instead of HTML link elements offers many advantages including smaller page size and easier deployment for some websites.
To see how this works, let's take a simple example: We wish to specify that for the URL http://www.example.com/en, targeting English language users, the equivalent URL targeting German language speakers http://www.example.com/de. Up till now, the only way to add such annotation is to use a link element, either as an HTTP header or as HTML elements on both URLs like this:
<link rel="alternate" hreflang="en" href="http://www.example.com/en" >
<link rel="alternate" hreflang="de" href="http://www.example.com/de" >
As of today, you can alternately use the following equivalent markup in Sitemaps:
<url>
  <loc>http://www.example.com/en</loc>
  <xhtml:link 
    rel="alternate"
    hreflang="de"
    href="http://www.example.com/de" />
  <xhtml:link
    rel="alternate"
    hreflang="en"
    href="http://www.example.com/en" />
</url>
<url>
  <loc>http://www.example.com/de</loc>
  <xhtml:link
    rel="alternate"
    hreflang="de"
    href="http://www.example.com/de" />
  <xhtml:link
    rel="alternate"
    hreflang="en"
    href="http://www.example.com/en" />
</url>
Briefly, the new Sitemaps tags shown in bold function in the same way as the HTML link tags, with both using the same attributes. The full technical details of how the annotations are implemented in Sitemaps, including how to implement the xhtml namespace for the link tag, are in our new Help Center article.
A more detailed example can be found in our new Help Center article, and if you need more help, please ask in our brand new internationalization help forum.

Many websites serve users from around the world. There are different approaches to serving content appropriate to your users' language and/or region. Last year, we launched support for explicit annotations for web pages rendering the same content with different language templates.
Today we're going further with our support for multilingual content with improved handling for these two scenarios:
  • Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price
  • Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French

Specifying language and location

We've expanded our support of the rel="alternate" hreflang link element to handle content that is translated or provided for multiple geographic regions. The hreflang attribute can specify the language, optionally the country, and URLs of equivalent content. By specifying these alternate URLs, our goal is to be able to consolidate signals for these pages, and to serve the appropriate URL to users in search. Alternative URLs can be on the same site or on another domain.

Annotating pages as substantially similar content

Optionally, for pages that have substantially the same content in the same language and are targeted at multiple countries, you may use the rel="canonical" link element to specify your preferred version. We’ll use that signal to focus on that version in search, while showing the local URLs to users where appropriate. For example, you could use this if you have the same product page in German, but want to target it separately to users searching on the Google properties for Germany, Austria, and Switzerland.
Update: to simplify implementation, we no longer recommend using rel=canonical.

Example usage

To explain how it works, let’s look at some example URLs:
  • http://www.example.com/ - contains the general homepage of a website, in Spanish
  • http://es-es.example.com/ - is the version for users in Spain, in Spanish
  • http://es-mx.example.com/ - is the version for users in Mexico, in Spanish
  • http://en.example.com/ - is the generic English language version
On all of these pages, we could use the following markup to specify language and optionally the region:

<link rel="alternate" hreflang="es" href="http://www.example.com/" />
<link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" />
<link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" /> 
<link rel="alternate" hreflang="en" href="http://en.example.com/" />

If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.

More help

As always, if you need more help correctly implementing multiregional and multilingual websites, please see our Help Center article about this topic, or ask in our Webmaster Help Forum.

Webmaster level: Intermediate

So you’re going global, and you need your website to follow. Should be a simple case of getting the text translated and you’re good to go, right? Probably not. The Google Webmaster Team frequently builds sites that are localized into over 40 languages, so here are some things that we take into account when launching our pages in both other languages and regions.

(Even if you think you might be immune to these issues because you only offer content in English, it could be that non-English language visitors are using tools like Google Translate to view your content in their language. This traffic should show up in your analytics dashboard, so you can get an idea of how many visitors are not viewing your site in the way it’s intended.)
More languages != more HTML templates
We can’t recommend this enough: reuse the same template for all language versions, and always try to keep the HTML of your template simple.

Keeping the HTML code the same for all languages has its advantages when it comes to maintenance. Hacking around with the HTML code for each language to fix bugs doesn’t scale–keep your page code as clean as possible and deal with any styling issues in the CSS. To name just one benefit of clean code: most translation tools will parse out the translatable content strings from the HTML document and that job is made much easier when the HTML is well-structured and valid.
How long is a piece of string?
If your design relies on text playing nicely with fixed-size elements, then translating your text might wreak havoc. For example, your left-hand side navigation text is likely to translate into much longer strings of text in several languages–check out the difference in string lengths between some English and Dutch language navigation for the same content. Be prepared for navigation titles that might wrap onto more than one line by figuring out your line height to accommodate this (also worth considering when you create your navigation text in English in the first place).

Variable word lengths cause particular issues in form labels and controls. If your form layout displays labels on the left and fields on the right, for example, longer text strings can flow over into two lines, whereas shorter text strings do not seem associated with their form input fields–both scenarios ruin the design and impede the readability of the form. Also consider the extra styling you’ll need for right-to-left (RTL) layouts (more on that later). For these reasons we design forms with labels above fields, for easy readability and styling that will translate well across languages.

Screenshots of Chinese and German versions of web forms
click to enlarge


Also avoid fixed-height columns–if you’re attempting to neaten up your layout with box backgrounds that match in height, chances are when your text is translated, the text will overrun areas that were only tall enough to contain your English content. Think about whether the UI elements you’re planning to use in your design will work when there is more or less text–for instance, horizontal vs. vertical tabs.
On the flip side
Source editing for bidirectional HTML can be problematic because many editors have not been built to support the Unicode bidirectional algorithm (more research on the problems and solutions). In short, the way your markup is displayed might get garbled:

<p>ابةتث <img src="foo.jpg" alt=" جحخد"< ذرزسش!</p>

Our own day-to-day usage has shown the following editors to currently provide decent solutions for bidirectional editing: particularly Coda, and also Dreamweaver, IntelliJ IDEA and JEditX.

When designing for RTL languages you can build most of the support you need into the core CSS and use the directional attribute of the html element (for backwards compatibility) in combination with a class on the body element. As always, keeping all styles in one core stylesheet makes for better maintainability.

Some key styling issues to watch out for: any elements floated right will need to be floated left and vice versa; extra padding or margin widths applied to one side of an element will need to be overridden and switched, and any text-align attributes should be reversed.

We generally use the following approach, including using a class on the body tag rather than a html[dir=rtl] CSS selector because this is compatible with older browsers:

Elements:

<body class="rtl">
<h1><a href="http://www.blogger.com/"><img alt="Google" src="http://www.google.com/images/logos/google_logo.png" /></a> Heading</h1>

Left-to-right (default) styling:

h1 {
  height: 55px;
  line-height: 2.05;
  margin: 0 0 25px;
  overflow: hidden;
}
h1 img {
  float: left;
  margin: 0 43px 0 0;
  position: relative;
}

Right-to-left styling:

body.rtl {
  direction: rtl;
}
body.rtl h1 img {
  float: right;
  margin: 0 0 0 43px;
}

(See this in action in English and Arabic.)

One final note on this subject: most of the time your content destined for right-to-left language pages will be bidirectional rather than purely RTL, because some strings will probably need to retain their LTR direction–for example, company names in Latin script or telephone numbers. The way to make sure the browser handles this correctly in a primarily RTL document is to wrap the embedded text strings with an inline element using an attribute to set direction, like this:

<h2>‫עוד ב- <span dir="ltr">Google</span>‬</h2>

In cases where you don’t have an HTML container to hook the dir attribute into, such as title elements or JavaScript-generated source code for message prompts, you can use this equivalent to set direction where &#x202B; and &#x202C;‬ are Unicode control characters for right-to-left embedding:

<title>&#x202B;‫הפוך את Google לדף הבית שלך‬&#x202C;</title>

Example usage in JavaScript code:
var ffError = '\u202B' +'כדי להגדיר את Google כדף הבית שלך ב\x2DFirefox, לחץ על הקישור \x22הפוך את Google לדף הבית שלי\x22, וגרור אותו אל סמל ה\x22בית\x22 בדפדפן שלך.'+ '\u202C';

(For more detail, see the W3C’s articles on creating HTML for Arabic, Hebrew and other right-to-left scripts and authoring right-to-left scripts.)
It’s all Greek to me…
If you’ve never worked with non-Latin character sets before (Cyrillic, Greek, and a myriad of Asian and Indic), you might find that both your editor and browser do not display content as intended.

Check that your editor and browser encodings are set to UTF-8 (recommended) and consider adding a element and the lang attribute of the html element to your HTML template so browsers know what to expect when rendering your page–this has the added benefit of ensuring that all Unicode characters are displayed correctly, so using HTML entities such as &eacute; (é) will not be necessary, saving valuable bytes! Check the W3C’s tutorial on character encoding if you’re having trouble–it contains in-depth explanations of the issues.
A word on naming
Lastly, a practical tip on naming conventions when creating several language versions. Using a standard such as the ISO 639-1 language codes for naming helps when you start to deal with several language versions of the same document.

Using a conventional standard will help users understand your site’s structure as well as making it more maintainable for all webmasters who might develop the site, and using the language codes for other site assets (logo images, PDF documents) is handy to be able to quickly identify files.

See previous Webmaster Central posts for advice about URL structures and other issues surrounding working with multi-regional websites and working with multilingual websites.

That’s a summary of the main challenges we wrestle with on a daily basis; but we can vouch for the fact that putting in the planning and work up front towards well-structured HTML and robust CSS pays dividends during localization!

Webmaster Level: Advanced

Warning: This specific configuration of rel-alternate-hreflang markup has been replaced by the general guidelines for multilingual and multi-regional content. Please see using hreflang for language and regional URLs for our current recommendations.


If you have a global site containing pages where the:
  • template (i.e. side navigation, footer) is machine-translated into various languages,
  • main content remains unchanged, creating largely duplicate pages,
and sometimes search results direct users to the wrong language, we’d like to help you better target your international/multilingual audience through:

<link rel=”alternate” hreflang="a-different-language" href="http://url-of-the-different-language-page" />

As you know, when rel=”canonical” or a 301 response code is properly implemented, we become more precise in clustering information from duplicate URLs, such as consolidating their linking properties. Now, when rel=”alternate” hreflang=”x” is included in conjunction with rel=”canonical” or 301s, not only will our indexing and linking properties be more accurate, but we can better serve users the URL of their preferred language.

Sample configuration that’s prime for rel=”alternate” hreflang=”x”

How does this all work? Imagine that you’re the proud owner of example.com, a site called “The Network” where you allow users to create their very own profile. Let’s say Javier Lopez, a Spanish speaker, makes his page at http://es.example.com/javier-lopez:


Because you’re trying to target a multilingual audience, once Javier hits “Publish,” his profile becomes immediately available in other languages with the translated templates. Also, each of the new language versions is served on a separate URL.


Two localized versions, http://en.example.com/javier-lopez in English and http://fr.example.com/javier-lopez in French

Background on the old issue: duplicate content caused by language variations

The configuration above allowed visitors speaking different languages to more easily interpret the content, but for search engines it was slightly problematic: there are three URLs (English, French, and Spanish versions) for the same main content in Javier’s profile. Webmasters wanted to avoid duplicate content issues (such as PageRank dilution) from these multiple versions and still ensure that we would serve the appropriate version to the user.

A new solution for localized templates

First of all, just to be clear, the strategy we’re proposing isn’t appropriate for multilingual sites that completely translate each page’s content. We’re trying to specifically improve the situation where the template is localized but the main content of a page remains duplicate/identical across language/country variants.

Before we get into the specific steps, our prior advice remains applicable:
  • Have one URL associated with one piece of content. We recommend against using the same URL for multiple languages, such as serving both French and English versions on example.com/page.html based on user information (IP address, Accept-Language HTTP header).

  • When multiple languages are at play, it’s best to include the language or country indication in the URL, e.g., example.com/en/welcome.html and example.com/fr/accueil.html (which specify “en” and “fr”) rather than example.com/welcome.html and example.com/accueil.html (which don’t contain an explicit country/language specification). More suggestions can be found in our blog posts about designing localized URLs and multilingual sites.
For the new feature:
Step 1: Select the proper canonical.
The canonical designates the version of your content you’d like indexed and returned to users.
The first step towards making the right content indexable is to pick one canonical URL that best reflects the genuine locale of the page’s main content. In the example above, since Javier is a Spanish-speaking user and he created his profile on es.example.com, http://es.example.com/javier-lopez is the logical canonical. The title and snippet in all locales will be selected from the canonical URL.

Once you have the canonical URL picked out, you can either:
A. 301 (permanent redirect) from the language variants to the canonical

As an example, if a French speaker visits fr.example.com/javier-lopez (not the canonical), have this page include a cookie to remember the user's language preference of French. Then permanently redirect from fr.example.com/javier-lopez to the canonical at es.example.com/javier-lopez. Because of the cookie, es.example.com/javier-lopez will still render its boilerplate in French (even on the es.example.com subdomain!). Similarly, en.example.com/javier-lopez would set the value of this cookie to English and then 301 redirect to es.example.com/javier-lopez.

Including a language selection link is also helpful should a multilingual user prefer a different experience of your site.

B. Use rel=”canonical”

On the other language variants, include a link rel=”canonical” tag pointing to your chosen canonical. In our example, since the canonical for Javier’s profile is the Spanish version, the English and French pages (and optionally even the Spanish page itself) would include <link rel=”canonical” href="http://es.example.com/javier-lopez" />.

Cookies are not involved in this setup. Therefore, a French speaker will be served es.example.com/javier-lopez with a Spanish template. Implement step 2 if you want the French speakers to be served the French version at fr.example.com/javier-lopez in Google search results.
Step 2: In the canonical URL, specify the various language versions via the rel=”alternate” link tag, using its hreflang attribute.

rel=”alternate” URLs can be displayed in search results in accordance with a user’s language preference. The title and snippet, however, remain generated from the canonical URL (as is customary with rel=”canonical”), not from the content of any rel=”alternate”.
You can help Google display the correctly localized variant of your URL to our international users by adding the following tags to http://es.example.com/javier-lopez, the selected canonical:

<link rel=”alternate” hreflang="en" href="http://en.example.com/javier-lopez" />

<link rel=”alternate” hreflang="fr" href="http://fr.example.com/javier-lopez" />

rel=”alternate” indicates that the URL contains an alternate version located at the URI of the href value. hreflang identifies the language code of the alternate URL and can be specified with ISO-639.

Please note: If your site supports many languages and you’re worried about the increased file size when declaring numerous rel=”alternate” URLs, please see our Help Center article about configuring rel=”alternate” with file size constraints.
Once the steps are completed, the configuration on “The Network” would look like this:
  • http://en.example.com/javier-lopez
    either 301s with a language cookie or contains <link rel=”canonical” href=”http://es.example.com/javier-lopez” />
  • http://fr.example.com/javier-lopez
    either 301s with a language cookie or contains <link rel=”canonical” href=”http://es.example.com/javier-lopez” />
  • http://es.example.com/javier-lopez
    is the canonical and contains
    <link rel=”alternate” hreflang="en" href="http://en.example.com/javier-lopez" />
    and
    <link rel=”alternate” hreflang="fr" href="http://fr.example.com/javier-lopez" />

Results of the above implementation
  • When your content is returned in search results, users will likely see the URL that corresponds to their language preference, whether or not it’s the canonical. (Good news!) This is because with with rel=”canonical” or a 301 redirect, we can cluster the language variations with the canonical. With rel=”alternate” hreflang=”x” at serve-time we can deliver the URL of the most appropriate language to the user: English speakers will be served en.example.com/javier-lopez as the result for the URL in Javier’s profile, French speakers will see fr.example.com/javier-lopez, Spanish speakers will see es.example.com/javier-lopez.

  • By implementing step 1, only content from the canonical version will be available for users in search results (i.e. content from the duplicate versions won’t be searchable). Because the Spanish version es.example.com/javier-lopez is the canonical, queries that include template content from this page, e.g. [Javier Lopez familia] -- when using any language preference -- may return his profile (content from the canonical version). On the other hand, queries that include template content of the “duplicate” version, e.g. [Javier Lopez family], are less likely to return his profile page. If you would like the other language versions indexed separately and searchable, avoid using rel=”canonical” and rel=”alternate”.

  • Indexing properties, such as linking information, from the duplicate language variants will be consolidated with the canonical.

To recap (one more time, with feeling!)

For sites that have their template localized but the keep their pages’ main content untranslated:

Step 1: Once you have the canonical picked out you can use either rel=”canonical” or a 301 (permanent redirect) from the various localized pages to the canonical URL.

Step 2: On the canonical URL, specify the language-specific duplicated content with different boilerplate via the rel=”alternate” link tag, using its hreflang attribute. This way, Google can show the correctly-localized variant of your URLs to our international users.

We realize this can be a little complicated, so if you have questions, please ask in our webmaster forum!

Webmaster Level: Intermediate

A multilingual website is any website that offers content in more than one language. Examples of multilingual websites might include a Canadian business with an English and a French version of its site, or a blog on Latin American soccer available in both Spanish and Portuguese.

Usually, it makes sense to have a multilingual website when your target audience consists of speakers of different languages. If your blog on Latin American soccer aims to reach the Brazilian audience, you may choose to publish it only in Portuguese. But if you’d like to reach soccer fans from Argentina also, then providing content in Spanish could help you with that.

Google and language recognition


Google tries to determine the main languages of each one of your pages. You can help to make language recognition easier if you stick to only one language per page and avoid side-by-side translations. Although Google can recognize a page as being in more than one language, we recommend using the same language for all elements of a page: headers, sidebars, menus, etc.

Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.

Someone who comes to Google and does a search in their language expects to find localized search results, and this is where you, as a webmaster, come in: if you’re going to localize, make it visible in the search results with some of our tips below.

The anatomy of a multilingual site: URL structure


There's no need to create special URLs when developing a multilingual website. Nonetheless, your users might like to identify what section of your website they’re on just by glancing at the URL. For example, the following URLs let users know that they’re on the English section of this site:

http://example.ca/en/mountain-bikes.html
http://
en.example.ca/mountain-bikes.html

While these other URLs let users know that they’re viewing the same page in French:

http://example.ca/fr/mountain-bikes.html
http://fr.example.ca/mountain-bikes.html


Additionally, this URL structure will make it easier for you to analyze the indexing of your multilingual content.

If you want to create URLs with non-English characters, make sure to use UTF-8 encoding. UTF-8 encoded URLs should be properly escaped when linked from within your content. Should you need to escape your URLs manually, you can easily find an online URL encoder that will do this for you. For example, if I wanted to translate the following URL from English to French,

http://example.ca/fr/mountain-bikes.html

It might look something like this:

http://example.ca/fr/vélo-de-montagne.html

Since this URL contains one non-English character (é), this is what it would look like properly escaped for use in a link on your pages:

http://example.ca/fr/v%C3%A9lo-de-montagne

Crawling and indexing your multilingual website


We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam. More importantly, the point of making a multilingual website is to reach a larger audience by providing valuable content in several languages. If your users can’t understand an automated translation or if it feels artificial to them, you should ask yourself whether you really want to present this kind of content to them.

If you’re going to localize, make it easy for Googlebot to crawl all language versions of your site. Consider cross-linking page by page. In other words, you can provide links between pages with the same content in different languages. This can also be very helpful to your users. Following our previous example, let’s suppose that a French speaker happens to land on http://example.ca/en/mountain-bikes.html; now, with one click he can get to http://example.ca/fr/vélo-de-montagne.html where he can view the same content in French.

To make all of your site's content more crawlable, avoid automatic redirections based on the user's perceived language. These redirections could prevent users (and search engines) from viewing all the versions of your site.

And last but not least, keep the content for each language on separate URLs - don't use cookies to show translated versions.

Working with character encodings


Google directly extracts character encodings from HTTP headers, HTML page headers, and content. There isn’t much you need to do about character encoding, other than watching out for conflicting information - for example, between content and headers. While Google can recognize different character encodings, we recommend that you use UTF-8 on your website whenever possible.

If your tongue gets twisted...


Now that you know all of this, your tongue may get twisted when you speak many languages, but your website doesn’t have to!

For more information, read our post on multi-regional sites and stay tuned for our next post, where we'll delve into special situations that may arise when working with global websites. Until then, don't hesitate to drop by the Help Forum and join the discussion!

Webmaster Level: Intermediate

Did you know that a majority of users surveyed feel that having information in their own language was more important than a low price? Living in a non-English-speaking country, I've seen friends and family members explicitly look for and use local and localized websites—properly localized sites definitely have an advantage with users. Google works hard to show users the best possible search results. Many times those are going to be pages that are localized, for the user's location and/or in the user's language.

If you're planning to take the time to create and maintain a localized version of your website, making it easy to recognize and find is a logical part of that process. In this blog post series, we'll take a look at what is involved with multi-regional and multi-lingual websites from a search engine point of view. A multi-regional website is one that explicitly targets users in various regions (generally different countries); we call it multilingual when it is available in multiple languages, and sometimes, the website targets both multiple regions and is in multiple languages. Let's start with some general preparations and then look at websites that target multiple regions.

Preparing for global websites

Expanding a website to cover multiple regions and/or languages can be challenging. By creating multiple versions of your website, any issues with the base version will be multiplied; make sure that you have everything working properly before you start. Given that this generally means you'll suddenly be working with a multiplied number of URLs, don't forget that you'll need appropriate infrastructure to support the website.

Planning multi-regional websites

When planning sites for multiple regions (usually countries), don't forget to research legal or administrative requirements that might come into play first. These requirements may determine how you proceed, for instance whether or not you would be eligible to use a country-specific domain name.

All websites start with domain names; when it comes to domain names, Google differentiates between two types of domain names:
  • ccTLDs (country-code top level domain names): These are tied to a specific country (for example .de for Germany, .cn for China). Users and search engines use this as a strong sign that your website is explicitly for a certain country.
  • gTLDs (generic top level domain names): These are not tied to a specific country. Examples of gTLds are .com, .net, .org, .museum. Google sees regional top level domain names such as .eu and .asia as gTLDs, since they cannot be tied to a specific country. We also treat some vanity ccTLDs (such as .tv, .me, etc.) as gTLDs as we've found that users and webmasters frequently see these as being more generic than country-targeted (we don't have a complete list of such vanity ccTLDs that we treat as gTLDs as it may change over time). You can set geotargeting for websites with gTLDs using the Webmaster Tools Geographic Target setting.

Geotargeting factors

Google generally uses the following elements to determine the geotargeting of a website (or a part of a website):
  1. Use of a ccTLD is generally a strong signal for users since it explicitly specifies a single country in an unmistakable way.
    or
    Webmaster Tools' manual geotargeting for gTLDs (this can be on a domain, subdomain or subdirectory level); more information on this can be found in our blog post and in the Help Center. With region tags from geotargeting being shown in search results, this method is also very clear to users. Please keep in mind that it generally does not make sense to set a geographic target if the same pages on your site target more than a single country (say, all German-speaking countries) — just write in that language and do not use the geotargeting setting (more on writing in other languages will follow soon!).
  2. Server location (through the IP address of the server) is frequently near your users. However, some websites use distributed content delivery networks (CDNs) or are hosted in a country with better webserver infrastructure, so we try not to rely on the server location alone.
  3. Other signals can give us hints. This could be from local addresses & phone numbers on the pages, use of local language and currency, links from other local sites, and/or the use of Google's Local Business Center (where available).

Note that we do not use locational meta tags (like "geo.position" or "distribution") or HTML attributes for geotargeting. While these may be useful in other regards, we've found that they are generally not reliable enough to use for geotargeting.

URL structures

The first three elements used for geotargeting are strongly tied to the server and to the URLs used. It's difficult to determine geotargeting on a page by page basis, so it makes sense to consider using a URL structure that makes it easy to segment parts of the website for geotargeting. Here are some of the possible URL structures with pros and cons with regards to geotargeting:

ccTLDs
eg: example.de, example.fr
Subdomains with gTLDs
eg: de.site.com, fr.site.com, etc.
Subdirectories with gTLDs
eg: site.com/de/, site.com/fr/, etc.
URL parameters
eg: site.com?loc=de, ?country=france, etc.
pros (+)
- clear geotargeting
- server location is irrelevant
- easy separation of sites
- legal requirements (sometimes)
pros (+)
- easy to set up
- can use Webmaster Tools geotargeting
- allows different server locations
- easy separation of sites
pros (+)
- easy to set up
- can use Webmaster Tools geotargeting
- low maintenance (same host)
pros (+)
(not recommended)
cons (-)
- expensive (+ availability)
- more infrastructure
- ccTLD requirements (sometimes)
cons (-)
- users might not recognize geotargeting from the URL alone (is "de" the language or country?)
cons (-)
- users might not recognize geotargeting from the URL alone
- single server location
- separation of sites harder
cons (-)
- segmentation based on the URL is difficult
- users might not recognize geotargeting from the URL alone
- geotargeting in Webmaster Tools is not possible

As you can see, geotargeting is not an exact science (even sites using country-code top level domain names can be global in nature), so it's important that you plan for the users from the "wrong" location. One way to do this could be to show links on all pages for users to select their region and language of choice. We'll look at some other possible solutions further on in this blog post series.

Dealing with duplicate content on global websites

Websites that provide content for different regions and in different languages sometimes create content that is the same or similar but available on different URLs. This is generally not a problem as long as the content is for different users in different countries. While we strongly recommend that you provide unique content for each different group of users, we understand that this may not always be possible for all pages and variations from the start. There is generally no need to "hide" the duplicates by disallowing crawling in a robots.txt file or by using a "noindex" robots meta tag. However, if you're providing the same content to the same users on different URLs (for instance, if both "example.de/" and "example.com/de/" show German language content for users in Germany), it would make sense to choose a preferred version and to redirect (or use the "rel=canonical" link element) appropriately.

Do you already have a website that targets multiple regions or do you have questions about the process of planning one? Come to the Help Forum and join the discussion. In following posts, we'll take a look at multi-lingual websites and then look at some special situations that can arise with global websites. Bis bald!

(This has been cross-posted from the Official Google Blog)

How long would it take to translate all the world's web content into 50 languages? Even if all of the translators in the world worked around the clock, with the current growth rate of content being created online and the sheer amount of data on the web, it would take hundreds of years to make even a small dent.

Today, we're happy to announce a new website translator gadget powered by Google Translate that enables you to make your site's content available in 51 languages. Now, when people visit your page, if their language (as determined by their browser settings) is different than the language of your page, they'll be prompted to automatically translate the page into their own language. If the visitor's language is the same as the language of your page, no translation banner will appear.


After clicking the Translate button, the automatic translations are shown directly on your page.


It's easy to install — all you have to do is cut and paste a short snippet into your webpage to increase the global reach of your blog or website.


Automatic translation is convenient and helps people get a quick gist of the page. However, it's not a perfect substitute for the art of professional translation. Today happens to be International Translation Day, and we'd like to take the opportunity to celebrate the contributions of translators all over the world. These translators play an essential role in enabling global communication, and with the rapid growth and ease of access to digital content, the need for them is greater than ever. We hope that professional translators, along with translation tools such as Google Translator Toolkit and this Translate gadget, will continue to help make the world's content more accessible to everyone.

Are you the owner of a .yu domain? Then you might have heard the news: as of September 30, all .yu domains will stop working, regardless of their renewal date. This means that any content you're hosting on a .yu domain will no longer be online. For those of you who would still like to have your site online, we've prepared some recommendations to make sure that Google keeps crawling, indexing, and serving your content appropriately.
  • Check your backlinks. Since it won't be possible to set up a redirection from the old .yu domain to your new one, all links pointing to .yu domains will lead to dead ends. This means that it will be increasingly difficult for search engines to retrieve your new content. To find out who is linking to you, sign up with Google Webmaster Tools and check the links to your site (you can also download this list as a "comma separated value" -- .csv -- file for ease of use). Then read through the list for sites that you recognize as important and contact their webmasters to make sure that they update their links to your new website.
  • Check your internal links. If you are planning to simply move your content in bulk from the old to the new site, make sure that the new internal navigation is up to date. For example, if you are renaming pages on your site from "www.example.yu/home.htm" to "www.example.com/home.htm" make sure that your internal navigation reflects such changes to prevent broken links.
  • Start moving the site to your new domain. It's a good idea to start moving while you can still maintain control of your old domain, so don't wait! As mentioned in our best practices when moving your site, we recommend starting by moving a single directory or subdomain, and testing the results before completing the move. Remember that you will not be able to keep a 301 redirection on your old domain after September 30, so start your test early.
While you're moving your site, you can test how Google crawls and indexes your new site at its new location by submitting a Sitemap via Google Webmaster Tools. Although we may not crawl or index all the pages listed in each Sitemap, we recommend that you submit one because doing so helps Google understand your site better. You can read more on this topic in our answers to the most frequently asked questions on Sitemaps. And remember that for any question or concerns we're waiting for you in the Google Webmaster Help Forum!
Update: as mentioned here, we have introduced a new feature: Change of Address. Check it out if you are moving from one domain to another! By using this feature you will help us update our index faster and hopefully make the transition for your users smoother.

A lot has been said about how to start a multi-lingual site and how to better target content through meta tags. Our users have raised a number of interesting questions about creating websites in different languages, like the one below.

ganex':
> How does one do for INDIA.
> As there are many languages spoken here.
> My Site is primarily in English, but my site targets different cities in INDIA.
> For Hyderabad - I want in Urdu & Telugu and for Chennai I want in Tamil
> for Bengaluru I want in Kannada.
> For North I want in Hindi.’

We’d like to introduce the transliteration API for Indic languages (languages spoken in India) in addition to our Ajax API for languages. With this API at your disposal, content creation is simplified because it not only helps integrating transliteration in your websites but also allows users visiting your site to type in Indic languages.

To include the transliteration API, first you need the AJAX script.

<script type="text/javascript" src="http://www.google.com/jsapi"></>

This script tag will load the google.load function, which lets you load the individual Google APIs. For loading Google Transliteration API, call to google.load looks like this:

<script type="text/javascript">
google.load("elements", "1", {
packages: "transliteration"
});
</script>


When it comes to targeting, don't forget to add meta tags in your local language. And for your questions, we have a new addition to our already existing communication channels like the webmaster help groups and webmaster tools (available in 26 languages!). We also have our own official Orkut webmaster community! Here users can share thoughts and discuss webmaster related issues.

Sign up for our Orkut community now and if you have any additional thoughts we'd love to hear about them.

Cheers,

When webmasters put content out on the web it's there for the world to see. Unfortunately, most content on the web is only published in a single language, understandable by only a fraction of the world's population.

In a continued effort to make the world's information universally accessible, Google Translate has a number of tools for you to automatically translate your content into the languages of the world.


Users may already be translating your webpage using Google Translate, but you can make it even easier by including our "Translate My Page" gadget, available at http://translate.google.com/translate_tools.

The gadget will be rendered in the user's language, so if they come to your page and can't understand anything else, they'll be able to read the gadget, and translate your page into their language.

Sometimes there may be some content on your page that you don't want us to translate. You can now add class=notranslate to any HTML element to prevent that element from being translated. For example, you may want to do something like:
Email us at <span class="notranslate">sales at mydomain dot com</span>
And if you have an entire page that should not be translated, you can add:
<meta name="google" value="notranslate">
to the <head> of your page and we won't translate any of the content on that page.

Update on 12/15/2008: We also support:
<meta name="google" content="notranslate">
Thanks to chaoskaizer for pointing this out in the comments. :)

Lastly, if you want to do some fancier automatic translation integrated directly into your page, check out the AJAX Language API we launched last March.

With these tools we hope you can more easily make your content available in all the languages we support, including Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian, and Vietnamese.

Have you ever thought of creating one or several sites in different languages? Let's say you want to start a travel site about backpacking in Europe, and you want to offer your content to English, German, and Spanish speakers. You'll want to keep in mind factors like site structure, geographic as well as language targeting, and content organization.

Site structure
The first thing you'll want to consider is if it makes sense for you to buy country-specific top-level domains (TLD) for all the countries you plan to serve. So your domains might be ilovebackpacking.co.uk, ichlieberucksackreisen.de, and irdemochilero.es.es. This option is beneficial if you want to target the countries that each TLD is associated with, a method known as geo targeting. Note that this is different from language targeting, which we will get into a little more later. Let's say your German content is specifically for users from Germany and not as relevant for German-speaking users in Austria or Switzerland. In this case, you'd want to register a domain on the .de TLD. German users will identify your site as a local one they are more likely to trust. On the other hand, it can be pretty expensive to buy domains on the country-specific TLDs, and it's more of a pain to update and maintain multiple domains. So if your time and resources are limited, consider buying one non-country-specific domain, which hosts all the different versions of your website. In this case, we recommend either of these two options:
  1. Put the content of every language in a different subdomain. For our example, you would have en.example.com, de.example.com, and es.example.com.
  2. Put the content of every language in a different subdirectory. This is easier to handle when updating and maintaining your site. For our example, you would have example.com/en/, example.com/de/, and example.com/es/.
Matt Cutts wrote a substantial post on subdirectories and subdomains, which may help you decide which option to go with.

Geographic targeting vs. Language targeting
As mentioned above, if your content is especially targeted towards a particular region in the world, you can use the Set Geographic Target tool in Webmaster Tools. It allows you to set different geographic targets for different subdirectories or subdomains (e.g., /de/ for Germany).

If you want to reach all speakers of a particular language around the world, you probably don't want to limit yourself to a specific geographic location. This is known as language targeting, and in this case, you don't want to use the geographic target tool.

Content organization
The same content in different languages is not considered duplicate content. Just make sure you keep things organized. If you follow one of the site structure recommendations mentioned above, this should be pretty straightforward. Avoid mixing languages on each page, as this may confuse Googlebot as well as your users. Keep navigation and content in the same language on each page.

If you want to check how many of your pages are recognized in a certain language, you can perform a language-specific site search. For example, if you go to google.de and do a site search on google.com, choose the option below the search box to only display German results.
If you have more questions on this topic, you can join our Webmaster Help Group to get more advice.