Thursday, July 2, 2009

SSH, backspace and UTF-8

Just great. If you use UTF-8 characters encoded on more than 1 byte (such as Latin diacritics, Armenian, Japanese etc. characters) and you attempt to hit [backspace] through SSH to delete them, you might be surprised to find out that backspace in SSH only deletes the last byte.

This can lead to a lot of unexpected results in your documents and it can generate a lot of confusion for any app you might be running via SSH.
Case study: the letter "ș" is encoded as "c8 99". On the host machine, backspace deletes both bytes. Via SSH, only the second one ("99") will get deleted.

Let's see what happens if you write "testș[backspace]test".

Note: endline (Ctrl+D) is encoded as "0a".

user@host:~$ cat > test.txt
testș[backspace]test
^d
user@host:~$ hexdump -C test.txt
00000000 74 65 73 74 74 65 73 74 0a |testtest.|
00000009

Great. Now try it via SSH. Just SSH as the same user on the same machine (ssh user@localhost):

user@host:~$ ssh user@localhost
user@localhost's password:
user@host:~$ cat > test.txt
testș[backspace]test
^d
user@host:~$ hexdump -C test.txt
00000000 74 65 73 74 c8 74 65 73 74 0a |test.test.|
0000000a

Notice how

74 65 73 74 74 65 73 74 0a

turned into

74 65 73 74 c8 74 65 73 74 0a

I filed in a bug report on launchpad. I am not yet sure whether I should take it to OpenSSH's bug list directly.

2 comments:

  1. Hey, aren't you clever;)

    ReplyDelete
  2. Thanks for reporting this. That discussion led me to a fix: typing `stty iutf8` every time you start a session. Or, better yet:

    echo stty iutf8 >> .bashrc
    echo stty iutf8 >> .profile

    Or, globally as root:

    echo stty iutf8 >> /etc/profile
    echo stty iutf8 >> /etc/bash.bashrc

    Cheers!

    ReplyDelete