[Accessforall] Codes for languages in ISO 24751 and the registry

Gregg Vanderheiden gv at trace.wisc.edu
Thu Oct 4 15:23:10 EDT 2012


Great discussion

We need to have someone who will own this issue and manage it through to resolution.  

Christophe, can you take ownership of this  -- and work with everyone to find a resoultion?


Gregg
--------------------------------------------------------
Gregg Vanderheiden Ph.D.
Director Trace R&D Center
Professor Industrial & Systems Engineering
and Biomedical Engineering
University of Wisconsin-Madison

Technical Director - Cloud4all Project - http://Cloud4all.info
Co-Director, Raising the Floor - International
and the Global Public Inclusive Infrastructure Project
http://Raisingthefloor.org   ---   http://GPII.net









On Oct 4, 2012, at 6:48 AM, Christophe Strobbe <strobbe at hdm-stuttgart.de> wrote:

> 
> A few things to bear in mind before making this decision:
> 1. ISO 639-2 (or any other part of ISO 639) just covers the codes for the
> identification of languages, not subcodes for countries, scripts, etc.
> 2. IETF RFC 4646 describes how to combine ISO 639 language codes with ISO
> 3166 country codes (and other optional subtags), but prefers two-letter
> language codes over three-letter codes if the former type of code is
> available. So that would gives us en-CA instead of eng-CA. So if we want
> to use codes like en-CA, we should refer to IETF RFC 4646; in order to use
> tags like eng-CA, we would need to invent our own "standard" for language
> codes. If we prefer IETF RFC 4646 tags, we will need to check if ISO
> standards can use IETF RFCs as normative references.
> 3. The two-letter language code is what you find in HTML pages, the
> OpenDocument format, and many other formats. That might be the reason why
> this type of code was in the sample preference sets. If we use
> three-letter codes, some parts of the GPII/Cloud4all architecture will
> need to refer to a table that maps two-letter codes to three-letter codes,
> because the two-letter codes seem to be the dominant convention (but that
> might change; e.g. Dublin Core seems to accept both types of codes).
> 
> 
> I am not speaking against using codes like eng-CA, but we should know what
> the impact of this decision would be.
> 
> 
> Best regards,
> 
> Christophe
> 
> Am Do, 4.10.2012, 07:18 schrieb Gregg Vanderheiden:
>> OK
>> 
>> 	Does anyone want to SPEAK AGAINST doing as Colin outlined which seems to
>> be in line with everyone else's comments.
>> 
>> 	  If so please post any counter thoughts in the next few days.    We have
>> everyone I think on the two lists attached so we can make a decision if
>> there are no counter proposals to consider
>> 
>> thanks
>> 
>> 
>> Gregg
>> --------------------------------------------------------
>> Gregg Vanderheiden Ph.D.
>> Director Trace R&D Center
>> Professor Industrial & Systems Engineering
>> and Biomedical Engineering
>> University of Wisconsin-Madison
>> 
>> Technical Director - Cloud4all Project - http://Cloud4all.info
>> Co-Director, Raising the Floor - International
>> and the Global Public Inclusive Infrastructure Project
>> http://Raisingthefloor.org   ---   http://GPII.net
>> 
>> 
>> On Oct 3, 2012, at 10:44 PM, Colin Clark <colinbdclark at gmail.com> wrote:
>> 
>>> Hi all,
>>> 
>>> We should be using ISO 639-2 language codes throughout the system. If
>>> not, it's a bug.
>>> 
>>> If I remember correctly, this was probably introduced by the UI Options
>>> team who were integrating at very short notice with the GPII framework.
>>> I believe UI Options can support both two- and three-character language
>>> codes (as is often the case).
>>> 
>>> As a speaker of "eng-CA", I don't see any reason not to simply use ISO
>>> 639-2 from the start and to also support country codes, as Christophe
>>> suggests. I also think it's probably worth supporting the two-character
>>> subset for interoperability if possible.
>>> 
>>> Colin
>>> 
>>> On 2012-10-03, at 1:18 PM, Gregg Vanderheiden wrote:
>>> 
>>>> I think that having language and country codes is a great idea.
>>>> 
>>>> Wd DO need to decide which codes to use.  I think the square brackets
>>>> were because an official decision was not made yet
>>>> 
>>>> But I think using the ISO codes for both would be the right thing to
>>>> do.  I added the arch list to see if someone knows  why two letter
>>>> codes are currently used.  (W3C?)
>>>> 
>>>> We also should say something like  "if no country is specified then
>>>> ...."
>>>> (is there a default country for all languages specified somewhere?)
>>>> we might say the country of origin -- but I'm not sure all languages
>>>> have an (existing) country of origin anymore.
>>>> 
>>>> Good catch Christophe.
>>>> Lets get a decision and then record it in the Glossary.
>>>> 
>>>> I wonder if we should have a decision registry somewhere since we have
>>>> so many people involved.
>>>> 
>>>> 
>>>> Gregg
>>>> --------------------------------------------------------
>>>> Gregg Vanderheiden Ph.D.
>>>> Director Trace R&D Center
>>>> Professor Industrial & Systems Engineering
>>>> and Biomedical Engineering
>>>> University of Wisconsin-Madison
>>>> 
>>>> Technical Director - Cloud4all Project - http://Cloud4all.info
>>>> Co-Director, Raising the Floor - International
>>>> and the Global Public Inclusive Infrastructure Project
>>>> http://Raisingthefloor.org   ---   http://GPII.net
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Oct 3, 2012, at 11:43 AM, Christophe Strobbe
>>>> <christophestrobbe at yahoo.co.uk> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> While creating a preference set for one of the personas in the
>>>>> Cloud4all smarthouse simulation
>>>>> <http://wiki.gpii.net/index.php/SmartHouses_Preference_Sets>, I looked
>>>>> into language codes and found the following:
>>>>> (1) ISO/IEC 24751:2008 (all subparts) refer to ISO 639-2:1998 for
>>>>> language codes. In the registry, the value space for "language" is
>>>>> [ISO 639-2/T] (I don't know the reason for the square brackets).
>>>>> According to <https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes>
>>>>> and <http://www.loc.gov/standards/iso639-2/php/code_list.php>, the ISO
>>>>> 639-2 codes are three-letter codes (e.g. "eng" for English, "dut" or
>>>>> "nld" for Dutch, "fre" or "fra" for French, etc). However, the JSON
>>>>> preference sets I've seen so far (I mean those by the GPII/Cloud4all
>>>>> Architecture team) use two-letter codes (see Carla's, Nisha's and
>>>>> Timothy's preference sets). Am I misreading the information I found
>>>>> about ISO 639-2?
>>>>> (2) Related to this is the absence of country information, i.e.
>>>>> combining a language code with a country code from ISO 3166 (see
>>>>> <http://www.loc.gov/standards/iso639-2/faq.html#22>). This is relevant
>>>>> to text-to-speech engines and Braille. For example for Dutch, not many
>>>>> people in Flanders are keen on TTS that uses pronunciation rules from
>>>>> the Netherlands. Braille conventions also vary between countries that
>>>>> use the same official language (well, they even vary between Braille
>>>>> centres, but let's not go into that).
>>>>> (3) Note that IETF RFC 4646 <http://tools.ietf.org/html/rfc4646> gives
>>>>> preference to the shortest ISO 639 code (2 or three letters) that is
>>>>> available for a language (check the ABNF syntax under
>>>>> <http://tools.ietf.org/html/rfc4646#section-2.1>). This base code can
>>>>> then be combined with an ISO 3166 country code, to create tags like
>>>>> en-US (American English) and en-GB (British English). However, IETF
>>>>> RFC 4646 is referenced neither by ISO 24751 nor by the registry.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Christophe Strobbe
>>>>> 
>>>>> _______________________________________________
>>>>> Accessforall mailing list
>>>>> Accessforall at fluidproject.org
>>>>> http://lists.idrc.ocad.ca/cgi-bin/mailman/listinfo/accessforall
>>>> 
>>>> _______________________________________________
>>>> Accessforall mailing list
>>>> Accessforall at fluidproject.org
>>>> http://lists.idrc.ocad.ca/cgi-bin/mailman/listinfo/accessforall
>>> 
>>> ---
>>> Colin Clark
>>> Technical Lead, Fluid Project
>>> http://fluidproject.org
>>> 
>> 
>> 
> 
> 
> -- 
> Christophe Strobbe
> Akademischer Mitarbeiter
> Adaptive User Interfaces Research Group
> Hochschule der Medien
> Nobelstraße 10
> 70569 Stuttgart
> Tel. +49 711 8923 2749
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idrc.ocad.ca/pipermail/accessforall/attachments/20121004/43b8807b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7318 bytes
Desc: not available
URL: <http://lists.idrc.ocad.ca/pipermail/accessforall/attachments/20121004/43b8807b/attachment-0001.bin>


More information about the Accessforall mailing list