uchar_t is confusing

ReignBough · ‎2014-10-21

When I read a uchar_t data type, I thought it was an unsigned char type. But when I look it up on Definisions.hpp, I found out that it is:

typedef unsigned short              uchar_t;        // 2-byte unicode charater (UTF16)

Why?????? This is so confusing.

~ReignBough~
ARCHICAD 26 INT (from AC18)
Windows 11 Pro, AMD Ryzen 7, 3.20GHz, 32.0GB RAM, 64-bit OS

Ralph Wessel · ‎2014-10-21

ReignBough wrote:
When I read a uchar_t data type, I thought it was an unsigned char type. But when I look it up on Definisions.hpp, I found out that it is:
typedef unsigned short              uchar_t;        // 2-byte unicode charater (UTF16)
Why?????? This is so confusing.

I'm not sure why this is confusing, but I assume you've not worked with any text encoding other than ASCII? Early text encodings of this type used only 1 byte (8 bits) per character, so the types char or unsigned char equated to a single character. However, this isn't suitable for many languages that have far more than 256 characters.

uchar_t is targeting UTF16, which is a 16 bit encoding and hence the unsigned short type. For more information about text encoding, take a look here

Ralph Wessel BArch
Software Engineer Speckle Systems

ReignBough · ‎2014-11-11

Well, when we are coding and we want to specify that a character is 16-bits, we use/define wchar / WCHAR (wide char, exactly 16-bits, based on wchar_t) or char16 / CHAR16 (at least 16-bits). We use uchar / UCHAR for unsigned char (and char / CHAR for signed char) for exactly 8-bits and char8 / CHAR8 for at least 8-bits.

This is just a thought.

~ReignBough~
ARCHICAD 26 INT (from AC18)
Windows 11 Pro, AMD Ryzen 7, 3.20GHz, 32.0GB RAM, 64-bit OS

Ralph Wessel · ‎2014-11-14

ReignBough wrote:
Well, when we are coding and we want to specify that a character is 16-bits, we use/define wchar / WCHAR (wide char, exactly 16-bits, based on wchar_t) or char16 / CHAR16 (at least 16-bits). We use uchar / UCHAR for unsigned char (and char / CHAR for signed char) for exactly 8-bits and char8 / CHAR8 for at least 8-bits.

None of these definitions are standards-based, so it's really a matter of semantics. You could also read uchar_t as unicode character type.

It could also be argued that defining wchar as exactly 16 bits is confusing, given that the C++ standard defines wchar_t as the largest width for the locales supported by that implementation. Or defining char16 as "at least 16 bits" when the C++ standard for char16_t is exactly 16 bits.

Horses for courses 😉

Ralph Wessel BArch
Software Engineer Speckle Systems

uchar_t is confusing

Didn't find the answer?

Check other topics in this Forum

Read the latest accepted solutions!

Start a new conversation!