Home » In the news, Linguistics

The multi-paths of Unicode in the blogosphere

Written By: daveski on April 4, 2012 2 Comments

Thanks to Steven L on Facebook for bringing this to my attention: Matt Meyer at ReignDesign narrates his presentation “Love Hotels and Unicode” from a recent tech “unconference” in Shanghai. After reviewing some of the convergences that led to a single standard for representing many of the world’s scripts and symbols, he showcases some of the problems that remain with Unicode, especially the cultural politics of the various symbols and emoticons (the “emoji” that are such important fixtures in chatrooms and mobile phones) that have made their way into Unicode 6.0.

Super interesting stuff, but as Deborah Anderson argued a while ago via her work through the Berkeley Linguistics Dep’t, we might want to think more about the dozens of scripts that haven’t yet been brought into Unicode before we spend too much time worrying if FACE WITH STUCK-OUT TONGUE AND WINKING EYE means the same thing as FACE WITH STUCK-OUT TONGUE.

😛

Tags:

Digg this!Add to del.icio.us!Stumble this!Add to Techorati!Share on Facebook!Seed Newsvine!Reddit!

2 Responses to “The multi-paths of Unicode in the blogosphere”

  1. Youki on: 4 April 2012 at 11:32 am

    Very interesting stuff! For me, the type-token distinction comes to mind. For example, the character “i” can be expressed in many different ways (font type, color, size, whether it’s italicized/bold/underlined). Unicode retains character information at the abstract level, and it’s only at the user level (at the level of the medium) that the letter “i” is then produced as the letter “i” that we see. The metadata is added through the medium, not through the encoding of the character in Unicode.

    Since we can express a type “i” as an infinite number of token “i”s (that’s plural “i”, as unwieldly as that looks), we’ve accepted that Unicode doesn’t have to be responsible for the inclusion of each manifestation of a character. It is only responsible for including the abstract “idea” of a character, which is then transformed and made visible with metadata on the part of the medium. So we have to accept that metadata plays an integral role in how characters are encoded.

    So, my attempt at thinking this through: the parallel I see is that FACE WITH STUCK-OUT TONGUE AND WINKING EYE vs. FACE WITH STUCK-OUT TONGUE is somewhat similar to “i” with serif vs. “i” sans-serif (with the serif being the winking eye). Unicode isn’t responsible for the infinite variations of the letter “i”, it’s just responsible for the single abstract character (and maybe some basic font information, but not to the degree of font size or color or where each pixel lies). The computer and user does the rest. So I’d keep an abstract representation of each general Emoji (which there may be millions, with more added per year), but leave it abstract enough so that the medium can combine it with metadata to make it visible. So instead of the million face-eye-nose-mouth-hair-ear-cat whiskers-teardrop combinations, just have the general ones and leave the more nuanced modifications to the client (so “face with stuck out tongue” is the code, but it is then modified with a winking eye at the level of the medium, the same way an “i” is serif or sans-serif at the level of the computer, not the Unicode).

    The fundamental paradox is that characters (like an alphabet) allow the abstraction and parsing of words, but allowing a separate code for each possible icon that will ever exist is like trying to make a code for each word that exists. So the real solution is to find an iconographic “alphabet”, which I don’t think is possible (an “i” is pretty much always an “i” no matter what font you use, but an icon of an “eye” has different meaning whether it’s closed, winking, blue, or crossed). It’s possible to encode the data at the pixel level, but then that defeats the purpose of Unicode. I’m curious how this gets resolved.

  2. daveski on: 5 April 2012 at 8:54 am

    Yeah, a huge dilemma, isn’t it? Although emoticons, to my view, bring to the fore an issue that can (relatively?) stay below the surface in the rendering of the basic templates or what you call the “abstract ‘idea’ of a character”. Like in Saussurian linguistics with its “acoustic images” that we assume provide security against words articulated in actual speech being confused for one another, the important thing about “i”s is that they are recognizably distinct from “t”s and “l”s and the rest. But with images that are predicated on both iconic resemblance to expressions/animals/objects/etc. in the real world and *also* their cultural salience as abstract symbols (kind of like the onomatopoeia that Saussure says is beyond his concern with the system of “language”, as i remember it anyway), the situation seems to get a lot trickier. Who’s to say whether sticking one’s tongue out while winking is a variant of, equal to, or a totally different meaning from sticking one’s tongue out with both eyes open? Whether chopsticks belong in a bowl of noodles, or, indeed, if noodles in a bowl are a totally different kind of noodles from noodles on a plate? And how these images even correlate to the things we’re used to calling (and encoding as) “languages”?

    I’m very curious too. Fascinating stuff!

Leave a Reply:

You must be logged in to post a comment.

  Copyright ©2009 Found in Translation, All rights reserved.| Powered by WordPress| WPElegance2Col theme by Techblissonline.com