Binaries
In Elixir, strings are UTF-8 encoded binaries.
What is a binary?
A binary is just a sequence of bytes. By default, each character in a string is internally represented in memory using 8 bits.
Let's take a simple example. The string "hello" in UTF-8 encoded binary form is 104, 101, 108, 108, 111. As you can see, each character uses a number (code point) between 0 and 255, which is represented with 8 bits or 1 byte.
You can define a binary using <<>> as shown below.
The unicode standard assigns code points to many of the characters we know. For example, a has code point of 97. Here, as you can see, h has a code point of 104.
All commonly occuring characters have code points between 0 and 255. But there are characters whose code points are above 255, which cannot be represented in memory using a single byte. In the next post, I'll explain how they are represented and stored in memory.
Read more: Binaries, strings and char lists
In Elixir, strings are UTF-8 encoded binaries.
What is a binary?
A binary is just a sequence of bytes. By default, each character in a string is internally represented in memory using 8 bits.
Let's take a simple example. The string "hello" in UTF-8 encoded binary form is 104, 101, 108, 108, 111. As you can see, each character uses a number (code point) between 0 and 255, which is represented with 8 bits or 1 byte.
You can define a binary using <<>> as shown below.
iex> <<0, 1, 2, 3>>
<<0, 1, 2, 3>>
We have the string concatenation operator <> in Elixir, which is actually a binary concatenation operator.Let's see "hello" in Elixir's binary form:iex> "hello" <> "world" "helloworld" iex> <<0, 1, 2, 3
>> <> <<
4>> <<0, 1, 2, 3,
4>>
iex>
"hello" <> <<0>>
<<104, 101, 108, 108, 111, 0>>
Here, I just used a concatenation operator to append a binary to a string. It converted the string to its internal binary representation.The unicode standard assigns code points to many of the characters we know. For example, a has code point of 97. Here, as you can see, h has a code point of 104.
All commonly occuring characters have code points between 0 and 255. But there are characters whose code points are above 255, which cannot be represented in memory using a single byte. In the next post, I'll explain how they are represented and stored in memory.
Read more: Binaries, strings and char lists