#117 (UTF-8 in IMR) – HoverRace

Ticket #117 (new task)

Opened 13 months ago

Last modified 10 months ago

UTF-8 in IMR

Reported by: zoogie Owned by:
Priority: critical Milestone: HoverRace 1.24
Component: IMR Version:
Keywords: i18n Cc:
Blocking: Blocked By:

Description

This is just a task to make sure the IMR (server-side) code is UTF-8 safe so that we can handle i18n-ized text in chat (although other parts will be restricted to ASCII for security reasons).

Change History

Changed 13 months ago by Austin

If theres anything I need to do, just let me know.

Changed 13 months ago by zoogie

Basically, we need to ensure that:

  • Chat text encoded in UTF-8 is properly handled (i.e., character counting, logging, storage, etc.)
  • Any chat text returned from the server is UTF-8 encoded (not Latin-1, not Windows-1252). This includes text generated by the IMR itself -- the welcome banner, server messages, decorations for chat lines (not sure if that's actually handled by the IMR).

Ideally, the IMR should detect if it's communicating with a 1.23.x client or a 1.24+ client and make sure only ASCII or UTF-8 is sent, respectively.

Changed 13 months ago by zoogie

One thing that may be helpful for that last part: The client now properly embeds the version in the User-Agent string:

User-Agent: HoverRace/1.24 (Win32)

Whereas in 1.23.1 and earlier it looked like:

User-Agent: HoverRace/0.1

Changed 13 months ago by ryan

Version 1.24 uses UTF-8 for chat messages. However, the "»" character is Latin-1 encoded in the strings returned by the IMR (that is, 0xbb); which will not display in 1.24. The UTF-8 character is 0xC2 0xBB. Therefore can something like this be done:

if(version is 1.24)
  returnmessage = "usernameU+00C2BB chat"
else
  returnmessage = "username» chat"

That's not really great pseudocode, but essentially, the IMR should return the Latin-1 character if the user is on anything before 1.24, and the UTF-8 one for 1.24 and later.

Changed 10 months ago by Austin

Initial testing of UTF-8 text indicates there are no issues displaying foreign characters in the IMR, maybe some weird effects to the font, but that's all.

Added utf8_encode("»") for the » used in chat messages, that's in [573].

Changed 10 months ago by zoogie

TODO: Need to check on UTF-8 in track names and user names.

Note: See TracTickets for help on using tickets.