SOAP: The Power Of Simplicity 
O'Reilly P2P Conference 2001
<<< 35/52 >>>
Copyright (C) 2001 Paul Kulchenko, Tony Hong

SOAP::Lite, Basic concepts: Internationalization

You're lucky if you never thought about sending or receiving non-English characters. If you do need to send those, then you usually have a several questions: what encoding to choose, how to convert your data into that encoding and how to specify encoding on wire. We will address those.

In sending international characters your best bet is to stay with UTF8 or iso-8859-1 encodings. Advantages of UTF8 encoding are well-known: it's compact comparing to other unicode-based encodings, it uses one byte to encode ASCII characters and every XML parser should know how to deal with this encoding.

Convert from iso-8859-1 to UTF8:

  
  use SOAP::Lite;
  # convert to UTF (if you're using Perl 5.6 and later)
  my $utf8 = pack('U*', unpack('C*', 'привет'));
  # specify type explicitly and it won't be encoded as base64
  my $string = SOAP::Data->type(string => $utf8);

Unicode::Map8, Unicode::String, or Encode modules can be used for transcoding between different encodings.

Specifying encoding on wire:

  
  use SOAP::Lite;
  # specify type explicitly and it won't be encoded as base64
  my $string = SOAP::Data->type(string => 'привет');
  my $result = SOAP::Lite
    -> proxy (...)
    -> uri (...)
    -> encoding('iso-8859-1') # specify encoding, because default is UTF8
    -> hello($string)
    -> result;

You don't need to do anything magical on receiving side, XML::Parser that is used in SOAP::Lite by default will always give you data in UTF8. There is one catch though. Result string is encoded as UTF8, but it isn't marked as UTF8 string in Perl. To fix it, you need to use $result = pack 'U0A*', $result or something similar. After that, Perl will know that this string is UTF8. Future versions of XML::Parser or SOAP::Lite may do it for you.

More information and details you can find in 'Perl, Unicode and i18N FAQ' at http://rf.net/~james/perli18n.html and 'Unicode FAQ' at http://www.unicode.org/unicode/faq/