Publisher does not support the Fluid field type. Please do not contact asking when support will be available.

If you purchased an add-on from expressionengine.com, be sure to visit boldminded.com/claim to add the license to your account here on boldminded.com.

Ticket: Generating sitemap.xml - best practices / guidance?

Status Resolved
Add-on / Version Publisher 1.6.2
Severity
EE Version 2.9.3

Gavin Lawrie

Apr 10, 2015

Is there a best-practice with regard to generation of sitemap.xml files?

I’m generating a simple sitemap from NavEE using Google Sitemap Lite (1.11).

Without publisher the URL for the sitemap file was http://v7a.2gc.org/sitemap but if I hit this from a non-browser (e.g. curl) with Publisher enabled I get no result.  To get a result I need to specify a language code - so curl http://v7a.2gc.org/en/sitemap delivers a sitemap file that looks a bit like this:

<urlset >
<url>
<loc>http://v7a.2gc.org/en/</loc>
<lastmod>2015-04-10</lastmod>
<changefreq>weekly</changefreq>
<priority>1</priority>
</url>
<url>
<loc>http://v7a.2gc.org/en/pages/about-us</loc>
<lastmod>2015-04-10</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>

If I change the language code in the URL used for curl the language code in the sitemap changes accordingly.

If I try and do the same with a browser rather than curl the non-language URL is rewritten with a language code inserted: I’m guessing the one that was in force last time the site was opened in that browser.

A few questions arise.

1) What happens when google crawler hits the site?  The robots text file currently has the unadorned sitemap url (e.g. http://v7a.2gc.org/sitemap) - what will the site return to google?
2) Should I change the sitemap URL in robots.txt to include a language - if so presumably just one (the default) or should I create a sitemap index file and put in a sitemap URL for each language option?
3) I don’t use URL translation currently, but if I did, would these answers change?

Thanks a lot for your thoughts on this.

#1

BoldMinded (Brian)

I honestly do not know how Google will handle this. Have you googled best practices for creating sitemaps on multi-lingual sites?

#2

Gavin Lawrie

OK, was hoping to avoid if answers already known. Well it turns out to be quite simple in theory, but would value help in practice…

Seems to be two parts to this, as regards Google.

First, it would seem that normal / typical Publisher users need to put in hreflang= statements in the meta data for every page that has multiple translations, to tell Google that there are multiple translations of the page that basically contain same content but are targeted at different language groups. The format of the required tags is as follows:

<link rel=”alternate” href=”http://example.com/ar/category/page” hreflang=”ar” />
<link rel=”alternate” href=”http://example.com/fr/category/page” hreflang=”fr” />
<link rel=”alternate” href=”http://example.com/en/category/page” hreflang=”en” />
<link rel=”alternate” href=”http://example.com/en/category/page” hreflang=”x-default” />

The above example would tell google that there are three versions of the page (for AR / FR / EN) and to build search results for each language accordingly. Further, if someone searches with a language preference that is not covered (e.g. en-GB) then give them results based on the /en/ version of the site.

Second, you need to create multilingual sitemap files that include details of alternative language versions for each page in the site map using a structure in XML that looks like this:

<url>
  <loc>http://www.example.com/en</loc>
  <xhtml:link 
    rel="alternate"
    hreflang="de"
    href="http://www.example.com/de" >
  <xhtml:link
    rel="alternate"
    hreflang="en"
    href="http://www.example.com/en" >
</url>

(more on this here - http://googlewebmastercentral.blogspot.co.uk/2012/05/multilingual-and-multinational-site.html)

I am not sure how to tackle the sitemap generation issue - certainly appears to be beyond the EE Google Sitemap lite extension’s ability. But it should be possible to generate the meta tags for the header.

And so this is the question… what EE / publisher markup combination can I use to generate the alternative language URLs for a page …?

I can use {page_url} to get the current site page, but not sure what I can use to generate something I can use to insert the alternative pages available. Would be nice if there is a way to get publisher to loop through the defined options somehow, but if not maybe it is possible to construct the various URLs - but this would appear to entail inserting the language segment into some deconstructed version of {page_uri}… which seems complicated.

Any ideas on how to proceed?

#3

Gavin Lawrie

I’ve fixed it 😊

Have posted the solution here in case anyone else finds it helpful / a time saver.

The fix for the header is quite simple. You put this kind of code into your header:

{exp:publisher:languages}
    <link rel=”alternate” href=”{root_url}{short_name}{if current_path != "/"}/{/if}{current_path}” hreflang=”{short_name}” />{/exp:publisher:languages}
    <link rel=”alternate” href=”{root_url}{publisher:default_language_code}{if current_path != "/"}/{/if}{current_path}” hreflang=”x-default” />

This would appear now to be an essential requirement from Google for any multi-lingual site that has parallel translation of text and uses something like language segments to differentiate between language versions.

The fix for the sitemap file is more complicated, due to the need to do multiple levels of nested loops.

I generated the sitemap file from the NavEE definitions on the site using code like this.

The sitemap is generated by this template:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocati http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
{!-- Home --}
<url> 
<loc>{site_url}</loc>
{exp:publisher:languages}<xhtml:link 
    rel="alternate"
    hreflang="{short_name}"
    href="{root_url}{short_name}/" >
{/exp:publisher:languages} 
<lastmod>{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}</lastmod> 
<changefreq>always</changefreq> 
1.0</priority> 
</url> 
{embed="partials/sitemap_listgen" embed_nav="main_nav_upper_ar" last_update="{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}"}
{embed="partials/sitemap_listgen" embed_nav="main_nav_lower" last_update="{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}"}
{embed="partials/sitemap_listgen" embed_nav="footer_legal" last_update="{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}"}
</urlset>

This sets up the sitemap.xml file structure, generates the entry for the site URL, and then calls an embed called “sitemap_listgen for each NavEE menu on the site.

sitemap_listgen is used to expand the NavEE menu items and looks like this:

{exp:navee:custom nav_title="{embed:embed_nav}" wrap_type="none" max_depth="2"}
{embed="partials/sitemap_inner" embed_link="{navee_link}" last_updated="{embed:last_update}"}
{if has_kids}
{kids}
{embed="partials/sitemap_inner" embed_link="{navee_link}" last_updated="{embed:last_update}"}
{/if}
{/exp:navee:custom}

This in turn calls an embed called “sitemap_inner” to generate the entry for each NavEE menu item, and this looks like this

<url> 
    <loc>{root_url}{publisher:default_language_code}{embed:embed_link}</loc> 
    {embed="partials/sitemap_embed" embed_link="{embed:embed_link}"}
    <lastmod>{embed:last_updated}</lastmod> 
    <changefreq>weekly</changefreq> 
    0.5</priority> 
</url>

This then calls another embed called “sitemap_embed” to generate the actual alternate URL entries by language, and this final embed looks like this

{exp:publisher:languages}
        <xhtml:link 
            rel="alternate"
            hreflang="{short_name}"
            href="{root_url}{short_name}{embed:embed_link}" >{/exp:publisher:languages}
        <xhtml:link 
            rel="alternate"
            hreflang="x-default"
            href="{root_url}{publisher:default_language_code}{embed:embed_link}" >

i.e. remarkably similar to the code found in the <head> example above.

I suspect there may be clever ways to eliminate some of the nesting, but this works.

HTH someone.

#4

Gavin Lawrie

There is a typo in the code for the sitemap.xml file given above. The correct code is:

<?xml version="1.0" encoding="UTF-8"?>
<urlset 
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url> 
<loc>{site_url}</loc>
{exp:publisher:languages}<xhtml:link 
    rel="alternate"
    hreflang="{short_name}"
    href="{root_url}{short_name}/" >
{/exp:publisher:languages} 
<lastmod>{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}</lastmod> 
<changefreq>always</changefreq> 
1.0</priority> 
</url> 
{embed="partials/sitemap_listgen" embed_nav="main_nav_upper_ar" last_update="{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}"}
{embed="partials/sitemap_listgen" embed_nav="main_nav_lower" last_update="{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}"}
{embed="partials/sitemap_listgen" embed_nav="footer_legal" last_update="{exp:stats}{last_entry_date format="{DATE_W3C}"}{/exp:stats}"}
</urlset>

Login to reply