loading...
loading...

A Guide to Unicode Characters and Message Encoding

Unicode Standard Explained 

 

The Unicode Standard is a universal character encoding system developed to ensure consistency across digital platforms. Unlike the GSM alphabet—which covers basic Latin letters, numbers, and a limited set of symbols—Unicode supports a vast array of characters from global languages, including scripts like Chinese and Thai. It also accommodates technical symbols, pictographs, and emojis, making it essential for modern, multilingual communication.

 

Understanding Unicode Messages

 

A Unicode message is one that uses the Unicode encoding system to represent its characters. If a message includes even a single Unicode character—such as a symbol, emoji, or non-Latin script—it must be encoded accordingly to ensure those characters display correctly.

 

What’s the character limit for a Unicode message?

 

Each Unicode SMS can hold up to 70 characters before it’s divided into multiple segments.

When a Unicode message exceeds 70 characters and is divided into multiple SMS segments, each part can contain only 67 characters. This is because 3 characters are reserved for linking the message parts in the correct sequence.

 

What are some examples of Unicode characters?

 

Unicode Symbol

Name

‘ ’

Apostrophe

“ ”

Double quotation marks

`

Grave accent

Hyphen/dash

&

Ampersand

ß

Eszett (German)

Pilcrow (English document formating and footnotes)

©

Copyright 

Ω

Omega (Greek)

÷

Division 

Infinity 

Ñ

Tilde (Latin, Spanish)

Curly Apostrophe (Can be  replaced with ‘)

 

Helpful Tips About Unicode Characters

  1. Unicode Support is limited by carrier. 
  2. When sending messages through our HTTP API, Unicode content must be triple encoded to ensure proper interpretation.
  3. When composing a message in CXT or through API integration, avoid pasting content from programs like MS Word. To prevent unintentional inclusion of Unicode characters, it’s best to type the message directly into CXT or your application.
  4. When transmitting Unicode characters over SMPP, make sure to set the Data Coding Scheme (DCS) value to 8 to ensure proper Unicode encoding.