This bug was tracked down by Sam and it was fixed in CVS on 19 June 2006: https://lists.webarch.co.uk/pipermail/mkdoc-commit/2006-June/001128.html
However this fix introduced another bug that Bruno found:
You can demonstrate it by pasting a UTF-8 en-dash character into a text component: ‘something – like – this’.
The content of the text component then vanishes from the page when you save it.
Reverting your commit from Mon, 19 Jun 2006 22:35:08 +0100 (BST) fixes the problem. Tested on perl-5.8.5.
The error that this puts in the log is:
== Cannot decode string with wide characters at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Encode.pm line 166. == at /usr/local/mkdoc-1-6/flo/Component.pm line 660.
This bug means that the latest CVS version of MKDoc has a Text Component that only works with US-ASII.
MKDoc does not work correctly distros with Perl 5.8.8 and Encode v2.18 — these cause errors as documented on the Fedora Core 5 page, this problems doesn't happen with older versions of Perl, see the Fedora Core 4 page for an attempt to generate the same error.
The latest version of Encode to work with MKDoc is 0.09: http://www.dan.co.jp/~dankogai/cpan/Encode-2.09.tar.gz
Workaround
Install Encode 0.09, a RPM for FC5 has been built:
http://rpms.mkdoc.com/pub/apt/fedora/linux/5/i386/RPMS.mkdoc/perl-Encode-2.09- 8.i386.rpm
Install it like this:
rpm -Uvh --force perl-Encode-2.09-8.i386.rpm
Description
This is how Sam has described the problem:
It's a bug in the Encode module, which is a core Perl module used to do utf-8 encoding and decoding. Here's the behavior that MKDoc is relying on:
$ perl -MEncode -MData::Dumper -e 'my $ref = Encode::decode_utf8({ foo => 1}); print Dumper($ref);' $VAR1 = { 'foo' => 1 };
That shows that Encode::decode_utf8 is passing a reference to a hash through un-mangled. That's what I get on my laptop, running Perl 5.8.6. However, here's what's happening on the machine that exhibits this problem:
$ perl -MEncode -MData::Dumper -e 'my $ref = Encode::decode_utf8({ foo => 1}); print Dumper($ref);' $VAR1 = 'HASH(0x9710c28)';
Now Encode::decode_utf8 is turning a hash into a stringified reference, causing the bug in MKDoc. That's Perl 5.8.8.
Sam has raised this at perlmonks see: http://perlmonks.org/?node_id=556003 and in the perl.unicode usenet group: http://www.nntp.perl.org/group/perl.unicode/3009 | http://groups.google.com/group/perl.unicode/browse_thread/thread/623802a9a29ca 220