Thursday, 6 June 2013

'ascii' codec can't encode character u'\xe6'

Follow the development of forene.no, in pictures, here (src: morphogenetically).

A slightly technical title for this post, however this is a slightly technical post as well. This is mainly a note to self, and other people having problems with encoding / decoding / unicode in Python.

Problem one
During my development of "runners.no 2.0" aka "forene.no" I ran into a Python problem, caused by the Norwegian characters "æ", "ø" and "å".

Input string:
Invitasjon til Mærraølen 2013

Output error:
Traceback (most recent call last):
  File "run.py", line 16, in <module>
    result                              = foreneno.updateFeedSources(db, feedSourcesWithFeed)
  File "C:\Users\klevstul\_miscellaneous\Dropbox\Miscellaneous\projects\foreneNo\trunk\cgi\foreneno.py", line 269, in updateFeedSources
    returnObject.two = self.rpc(db.urlInsUpdFeedEntries, "post", query_args)
  File "C:\Users\klevstul\_miscellaneous\Dropbox\Miscellaneous\projects\foreneNo\trunk\cgi\foreneno.py", line 105, in rpc
    data = urllib.urlencode(query_args)
  File "C:\prgFiles\Python27\lib\urllib.py", line 1294, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 16: ordinal not in range(128)

Problematic code:
query_args = {
 'p_pk': src['PK']
, 'p_value1': src['feedparser_feed']['id']
, 'p_value2': src['feedparser_feed']['title']
, 'p_value3': src['feedparser_feed']['subtitle']
, 'p_value4': src['feedparser_feed']['link']
}
data = urllib.urlencode(query_args)

Solution one
The following code fixed the problem (solution found on stackoverflow.com, once again):

query_args = {
  'p_pk': src['PK']
, 'p_value1': src['feedparser_feed']['id']
, 'p_value2': src['feedparser_feed']['title']
, 'p_value3': src['feedparser_feed']['subtitle']
, 'p_value4': src['feedparser_feed']['link']
}
str_query_args = {}
for k, v in query_args.iteritems():
str_query_args[k] = unicode(v).encode('iso_8859_1')

data = urllib.urlencode(str_query_args)
Problem two
The code above is doing a RPC toward an Oracle database. The new problem now was that the Norwegian characters were replaced with squares, or weird characters. A typical encoding / decoding / wrong character format settings problem.

Inserted value:
Invitasjon til M�rra�len 2013

Solution two
The solution was adding the following lines of code to my Python RPC code module (luckily, I remembered from earlier experiences):

import os
os.environ['NLS_LANG']  = 'NORWEGIAN_NORWAY.WE8MSWIN1252'


[k]



No comments:

Post a Comment

Allowed HTML tags:
<b>bold</b>
<strong>strong</strong>
<i>italics</i>
<em>emphasis</em>
<a href="">hyperlink</a>


Please, show the courtesy of identifying yourself when adding a comment. Anonymous comments will, most likely, be removed.