DE

Multilingual Websites with Hugo

Hugo includes a range of tools to localize content for different language regions and thus realize multilingual websites that integrate two or more languages.

Content is translated by hand using different markup files to represent translations of one content. Various options are available to identify translations as such and to allow users to switch between different language variants.

Proper localization has other requirements. In the following, we will address the most important of these and show how they can be implemented in Hugo. But to begin with, the website must generally be configured to support multiple languages.

General Configuration

Hugo’s main configuration file is located in the root directory, called hugo.toml, hugo.yaml or hugo.json by default. Individual languages are configured under the languages key, with the keys for individual languages taken from the RFC 5646 standard.

In YAML, the general structure is as follows:

languages:
  en:
    ...
  de:
    ...
  fr:
    ...

Some settings that affect the website as a whole must be configured separately for the different language variants of a multilingual website. The title and subtitle of the website are typical examples. The general configuration could look like this:

languages:
  de:
    languageCode: de-DE
    languageName: Deutsch
    params:
      subtitle: Erklärungen und Tutorials
    title: Persönliche Seite
  en:
    languageCode: en-US
    languageName: English
    params:
      subtitle: Explanations and Tutorials
    title: Personal Site

These values can be called up in templates via the Site object. Hugo does not specify how title and similar values must be implemented. On simple websites, the title is set in the <title> tag of the head, while on more complex sites it may be part of the title of individual documents.

The main purpose of languageName is to implement a language switcher. Suppose our site supports two languages, German and English. On single pages, the name of the other language could be displayed as link to the language variant of the same content. We will show below how such a switcher can be implemented.

The languageCode can be used in the base template to automatically set the value for the lang attribute in the HTML tag.

<html lang="{{ .Site.LanguageCode }}">

Hugo itself only uses the value in the predefined RSS and Alias templates.

Using the disabled field in the main configuration, we can set whether a given language is to be displayed on the website. The content assigned to the language is then no longer available to users. This is particularly helpful if translations are still under construction.

languages:
  de:
    disabled: true

Other predefined fields are designed to meet specific localization requirements. These fields are not used in templates, but directly influence the build process of the website.

Formal Differences

Time and number formats and units of measurement differ from region to region. There are also differences in vocabulary and spelling, such as “Jänner” for January (Austria) or “ss” instead of “ß” (Switzerland). These differences follow well-defined patterns and can therefore be localized automatically.

The Common Locale Data Repository (CLDR) defines a standard for this, which is also used by Hugo. The rules are defined for linguistically and regionally delimited units, which are characterized by combinations of two-digit language codes and two-digit country codes. For example, de-A stands for German as used in Austria.

Hugo uses the key under languages to recognize which formats should be applied to the respective content. This means that values are rendered in the appropriate format when the predefined functions and methods are used.

In addition, fonts may need to support different characters. Languages such as Arabic and Hebrew also require a writing direction from right to left, which we can define in the configuration file:

languages:
  ar:
    languageDirection: rtl

When translating manually, other formal differences must also be taken into account, such as the currency used or different measurement systems.

How Content is assigned to Languages

The question now arises as to how Hugo recognizes which content belongs to which language. In the next section, we will look at the related aspect: how content in different languages can be identified as translations of the same material.

For simple websites, assignment via the file name is recommended. The language code is inserted between the base name and the file extension, with the components separated by dots. For example, about.en.md would be an English page, and content/posts/first-post.en.md an English page within a section. For the corresponding German page, content/posts/first-post.de.md would be created. In this case, Hugo easily recognizes that the files represent translations of the same content.

Assignment via content subdirectories generally offers additional advantages. All content of a language variant can be clearly managed on separate branches of the directory tree. Which subdirectory is used for which language can be defined in the main configuration:

languages:
  en:
    contentDir: content/english
  fr:
    contentDir: content/french

These settings do not determine the structure of the URLs via which the content of the respective language is available. As with the assignment via file names, the language code is again decisive here:

languages:
  de-AT:
    contentDir: content/de

In this case, the content written in content/de/ueber-mich.md would be accessible via domain.com/de-at/ueber-mich/.

A main or default language can also be defined. You can specify whether the language code should also be included in the URL for this language:

defaultContentLanguage: 'de'
defaultContentLanguageInSubdir: true # Still place 'de' in the URL

For localized URLs, file and directory names can be translated, such as content/en/posts/first-post.md. The next section shows how this content is declared to be a translation of content/en/posts/first-post.md.

If you want to take every aspect of the translation into your own hands, you can specify the language directly in the front matter of the respective content:

---
title: "About Us"
language: "en"
---

The advantage of this approach is that the URL directly follows the given directory structure. In special cases, it may make sense to use this method in addition to the other two generally preferred approaches.

Localization of the Main Content

Now to the question of how Hugo recognizes language variants of the same content. One possibility has already been mentioned for the case of assignment via file names. For example, if content is located under content/en/about.md and content/de/about.md (or under content/about.en.md and content/about.de.md), Hugo will recognize that these are translations of each other.

However, this approach has one significant drawback. By default, the URL in Hugo reflects the directory structure. This means that the content mentioned would be available at domain.com/de/about/. However, untranslated URLs not only look rather off, but are also less than ideal from a search engine optimization (SEO) perspective.

It has already been mentioned that we can use translated file and directory names in the subdirectory-based approach to achieve translated URLs. However, this means that it is no longer automatically apparent that content/en/about.md and content/de/ueber-mich.md represent the same content.

The translationKey field in the frontmatter establishes this connection. Hugo considers all files containing the same value for this key to be representing language variants of the same content.

translationKey: about

The directory structure does not play any role here, and how this key is named is up to the user.

Language Switcher

After the steps described above have been carried out, Hugo recognizes the translations of a specific content (if available). This information can be used in templates to implement a language switcher.

We would like to realize a menu that shows the other available languages and links directly to the corresponding translation of the content that is currently opened. To illustrate the approach, let’s first consider the simplest case with only two languages. If we are on the German version of the website, EN should be displayed; on the English page, DE. This selection could be positioned at the top right of the screen, for example.

{{ if .IsTranslated }}
  {{ range .Translations }}
    <a href="{{ .RelPermalink }}" hreflang="{{ .Lang }}">
      {{ if eq .Lang "en" }}EN{{ else }}DE{{ end }}
    </a>
  {{ end }}
{{ else }}
  {{ if eq .Lang "en" }}
    <a href="{{ relURL "de/" }}" hreflang="de">DE</a>
  {{ else }}
    <a href="{{ relURL "en/" }}" hreflang="en">EN</a>
  {{ end }}
{{ end }}

The currently open page forms the context, which in Hugo is represented by .. With the PAGE.IsTranslated method we can query whether the page has at least one translation. If no translations exist, the links lead to the homepage of the other language.

If translations exist, they can be retrieved using the PAGE.Translations method. The translations are then processed in the context of the loop iterations. Each translation represents a Page object whose URL Hugo automatically saves and which can be retrieved using the PAGE.RelPermalink method. We use the URL to link to the corresponding language variant. The hreflang attribute informs web crawlers about the language of the target page and assists search engine optimization.

For websites with more than two languages, the menu should list all languages into which the current content has been translated. If the content has not been translated at all, the homepages of the other website languages are linked to.

Instead of querying the language of the current page, we can add the languages of the available translations to the menu. This is possible because there is at most one translation per language. In this way, we can simplify the code despite supporting more languages:

{{ $currentLang := .Lang }}

{{ if .IsTranslated }}
  {{ range .Translations }}
    {{ $targetLang := .Lang }}
    <a href="{{ .RelPermalink }}" 
       hreflang="{{ $targetLang }}">
      {{- upper $targetLang -}}
    </a>
  {{ end }}
{{ else }}
  {{ range .Site.Languages }}
    {{ if ne $currentLang .Lang }}
      {{ $targetLang := .Lang }}
      <a href="{{ relURL (print .Lang "/") }}"
         hreflang="{{ $targetLang }}">
        {{- upper $targetLang -}}
      </a>
    {{ end }}
  {{ end }}
{{ end }}

Two variables are declared – one for the current language and another for the target language – to render the code more readable. The main new feature is the handling of the case in which the current content has not been translated. The SITE.Languages method gives us a list of all supported languages of the website. All languages that differ from the current language are then listed in the menu. Here we use the parent context, which is represented by $ and which in this case refers to the current page.

Localization of Shared Components

Components such as tables of contents or sidebars can be used for different language variants. However, these fixed structures often contain hard-coded strings, such as "Latest posts" or "Inhaltsverzeichnis". Hugo provides a mechanism that can be used to automatically insert the translated strings.

Suppose we have an overview page on which the parts of an article are listed. In German we want to display Teil <n>, while in English it should be Part <n>. For this purpose, we define translation tables under i18n/en.yaml and i18n/de.yaml:

# i18n/en.yaml
part: Part

# i18n/de.yaml
part: Teil

With {{ i18n "part" }} we can insert the respective value into a layout or shortcode, and Hugo will automatically recognize the language of the respective subpage.

Localization of Keywords

Taxonomies, taxonomy terms and their values should be localized, too. Similar to the localization of the main content, the main goal here is to translate the URLs of the corresponding overview pages and to be able to automatically switch to the appropriate translations.

The classification system for main content must be configured. For a multilingual website, the definition of the classes (or taxonomies) to be used is relative to the supported languages.

languages:
  en:
    taxonomies:
      category: categories
      tag: tags
  de:
    taxonomies:
      category: kategorien
      tag: stichwoerter

The string value (plural) determines what is displayed in the URL. With the given configuration, for example, domain.com/de/kategorien/ and domain.com/en/categories/ can be accessed. The key (singular) is used in the frontmatter of the content and can be translated if required. However, standardized and single-language designations facilitate the assignment of the corresponding taxonomies.

Unfortunately, Hugo does not automatically recognize the corresponding overview pages as translations of each other. To establish this connection, we proceed in a similar way as above: Under content/en/<plural-of-taxonomy>/_index.md/ and content/de/<plural-of-taxonomy>/_index.md, both contents receive the same translationKey:

---
TranslationKey: "taxonomy-categories"
---

If you also want to switch between language variants for taxonomy terms, the administrative effort increases considerably. Here too it’s advisable to use the same term in all language versions:

# rezepte.md
---
category:
  - cooking
---
# recipes.md
---
category:
  - cooking
---

Depending on the complexity of the website, there may be a large number of terms for which an _index.md should be created under content/<lang>/<plural-of-taxonomy>/<term>/. In addition to the translationKey, the translation that is to appear in the URL should also be defined in this file.

# content/de/kategorien/cooking/_index.md
---
translationKey: term-cooking
slug: kochen
---

Currently, there seems to be no mechanism to simplify this process. I would therefore recommend to hide the language switcher for term overview pages.

Shared Resources

Usually, we want to use the same resources for different language variants. Shared page resources in particular raise questions about their correct handling. Of course, we don’t want to duplicate media files just to conveniently access them from different languages.

For websites that only make moderate use of media, the general use of globally available resources is recommended. If they are stored in the assets/ directory, they can be accessed regardless of the language.

If resources are used frequently, we want to bundle their files with the main content that uses them. If content is assigned to the language by file name, we have no problems: index.de.md and index.en.md would be in the same directory and thus have the same relationship to the bundled assets.

It becomes more complicated when the contents of different languages are stored on their own branches of the directory tree. In this case, it is advisable to bundle resources with one language only; the translations of the content can then access the resources from the main language variant.

A shortcode must be defined for this purpose. For images, for example, it could look like this:

{{- $imageName := .Get 0 }}
{{- $altText := .Get 1 }}
{{- $caption := .Get 2 }}

{{- $currentPage := .Page }}
{{- $germanPage := $currentPage }}

{{- if ne $currentPage.Language.Lang "de" }}
    {{- range where .Site.AllPages "TranslationKey" $currentPage.TranslationKey }}
        {{- if eq .Language.Lang "de" }}
            {{- $germanPage = . }}
        {{- end }}
    {{- end }}
{{- end }}

{{- $image := $germanPage.Resources.GetMatch (printf "images/%s" $imageName) }}

{{- if $image }}
    {{ partial "images/render-image.html" (dict "image" $image.Permalink "alt" $altText "caption" $caption) }}
{{- else }}
    <div class="error">
        Image not found: "images/{{ $imageName }}" in page bundle at {{ $germanPage.File.Dir }}
    </div>
{{- end }}

On an English page you could then use a shortcode like

{{ < images/shared-images "example.png" "alt-text" "captcha" > }}

to access an image bundled with the German translation of the same content.

Article from October 18, 2024.