Binaries
In the last section, we talked about representing strings as UTF-8 encoded binary. But we left a topic for this section: How a character having code point above 255 is represented in binary?
Well, we all know that a single byte can only hold a value between 0 and 255. Let's consider the string hełło. Here ł has a code point 322.
This is where UTF-8 comes into picture. It solves the problem by using 2 bytes to represent each ł, and one byte each to represent h, e and o.
Read more: Binaries, strings and char lists
In the last section, we talked about representing strings as UTF-8 encoded binary. But we left a topic for this section: How a character having code point above 255 is represented in binary?
Well, we all know that a single byte can only hold a value between 0 and 255. Let's consider the string hełło. Here ł has a code point 322.
This is where UTF-8 comes into picture. It solves the problem by using 2 bytes to represent each ł, and one byte each to represent h, e and o.
iex> string = "hełło"
"hełło"
iex> byte_size string
7
iex> String.length string
5
Here, we can see that the byte size of the string is 7 while the string length is 5. The function byte_size/1 can be used to check how many bytes are actually used in the memory to represent the given string.Read more: Binaries, strings and char lists
No comments:
Post a Comment