03-String Data Type in Go. Strings in Go deserve special attention… _ by Uday Hiwarale _ RunGo _ Medium
03-String Data Type in Go. Strings in Go deserve special attention… _ by Uday Hiwarale _ RunGo _ Medium
Published in RunGo
Save
GOLANG
(source: pexels.com)
Strings are defined between double quotes "..." and not single quotes, unlike
JavaScript. Strings in Go are UTF-8 encoded by default which makes more sense in
the 21st century.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 1/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
As UTF-8 supports ASCII character set, you don’t need to worry about encoding in
most of the cases. But to understand how UTF-8 encoding works, you should
definitely visit my article on Character Encoding.
https://play.golang.org/p/vMDoeaV3RCY
To find the length of a string, you can use len function. The len function is
available in Go runtime, hence you don’t need to import it from any package.
💡 The len is a universal function to find length of any data type, it’s not exclusive for
strings. We will learn about more Go’s built-in functions in upcoming tutorials.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 2/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
1.99K 4
https://play.golang.org/p/Kqj-TJMFyXP
In the above program, len(s) will print 11 to the console as the string s has 11
All characters in string Hello World are valid ASCII characters, hence we hope to see
each character to occupy only a byte in memory (as ASCII characters in UTF-8
occupies 8 bits or 1 byte).
https://medium.com/rungo/string-data-type-in-go-8af2b639478 3/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/cE32NenaYmN
Woha! I guess you were expecting s[i] to be a letter in s string where i is index of
the character in the string starting from 0 . Then what is this? Well, these are the
decimal value of ASCII/UTF-8 characters in Hello World string (see table
http://www.asciichart.com).
In Go, a string is in effect a read-only slice of bytes. For now, imagine slice is like a
simple array. We will learn about slices in upcoming lessons.
In the above example, we are iterating over a slice of bytes (values of uint8 array).
Hence s[i] prints the decimal value of the byte held by the character. But to see
individual characters, you can use %c format string in Printf statement. You can
also use %v format string to see the byte value and %T to see data type of the value.
https://play.golang.org/p/wwqhgHcTeIU
https://medium.com/rungo/string-data-type-in-go-8af2b639478 4/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
So you can see each letter shows a decimal number which holds 8 bits or 1 byte of
memory in type uint8 .
As we know (read wikipedia page), UTF-8 character can be defined in memory size
from 1 byte (ASCII compatible) to 4 bytes. Hence in Go, all characters are represented
in int32 (size of 4 bytes) data type. A code unit is the number of bits an encoding
uses for one single unit cell. So UTF-8 uses 8 bits and UTF-16 uses 16 bits for a code
unit , that means UTF-8 needs minimum 8 bits or 1 byte to represent a character.
A code point is any numerical value that defines the character and this is
represented by one or more code units depending on the encoding. As UTF-8 is
compatible with ASCII, all ASCII characters are represented in a single byte (8 bits),
hence UTF-8 needs only 1 code unit to represent them.
But the biggest question is, if all characters in UTF-8 are represented in int32 , then
why we are getting uint8 type in the above example. As said earlier, in Go, a string
is a read-only slice of bytes. When we use len function on a string, it calculates the
length of that slice .
When we use for loop, it loops around the slice returning one byte at a time or one
code unit at a time. As so far, all our characters were in the ASCII character set, the
byte provided by for loop was a valid character or a code unit was, in fact, a code
point.
Hence %c in Printf statement could print valid a character from that byte value.
But as we know, UTF-8 code point or character value can be represented by series
of one or more bytes (max 4 bytes), what will happen in for loop we saw earlier if we
introduce non-ASCII characters?
https://medium.com/rungo/string-data-type-in-go-8af2b639478 5/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/rhueGpn4pDc
From the above result, we got c3 b5 instead of 6f but characters of Hellõ World did
not get printed very well. We also see that len(s) returns 12 because len counts
the number of bytes in a string and that caused this problem.
As indexing a string (using for loop on it) accesses individual bytes, not characters.
Hence c3 ( decimal 195 ) in UTF-8 represents à and b5 ( decimal 181 ) represents µ
(check here).
To avoid the above the chaos, Go introduces data type rune (synonym of code point )
which is an alias of int32 and I told you (but not proved yet) that Go represents a
character (code point) in int32 data type.
💡 Interesting answer on why rune is int32 and not uint32 (as character code point
value can not be negative and int32 data type can hold both negative and positive values)
is here.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 6/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
So, instead of a slice of bytes, we need to convert a string into a slice of runes.
https://play.golang.org/p/ELgL-upVnz_r
We converted a string into a slice of runes using type conversion. Observe f5 in the
above result instead of c3 b5 .
This happened because while converting the string s to a slice of rune , c3 b5 got
converted to f5 as c3 b5 collectively represents the character õ and code point of
õ in UTF table is f5 (hence Unicode code point representation U+00F5 ) or decimal 245
(check here).
Also, we got the length 11 of string s which is correct, because there are 11 runes
in the slice (or 11 code points or 11 characters). And we also proved that a code point or
a character in Go is represented by int32 data type.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 7/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/Xet2cJbywLH
In the above program, we lost index 5 because the 5th byte is second code unit of
õ character. If you don’t need index value, you can ignore it by using _ (blank
identifier) instead.
What is a rune
A string is a slice of bytes or uint8 integers, simple as that. When we use for loop
with range , we get rune because each character in the string is represented by rune
data type.
In Go, a character can be represented between single quotes AKA character literal.
Hence, any valid UTF-8 character within a single quote ( ' ) is a rune and its type is
int32 .
https://medium.com/rungo/string-data-type-in-go-8af2b639478 8/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/QNBsDunKTrJ
The above program will print f5 245 int32 which is hexadecimal/decimal value and
data type of code point value of õ in the UTF table.
https://play.golang.org/p/9Uu5LqNqVkb
https://medium.com/rungo/string-data-type-in-go-8af2b639478 9/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
The above program will not compile and the compiler will throw an error, cannot
However, you can create a string from a slice of bytes and not only from a string
literal. But once the conversion from slice to string is done, you can not modify the
string as explained in the above example.
var1 := []uint8{72, 101, 108, 108, 111} // [72 101 108 108 111]
var2 := string(var1) // Hello
💡 Remember, byte is an alias for unit8 and rune is an alias for int32 . Hence, you can
use them interchangiably
If you put a line break in a backtick string, it is interpreted as a ‘\n’ character, see
https://golang.org/ref/spec#String_literals
💡 The value of a raw string literal is the string composed of the uninterpreted
( implicitly UTF-8-encoded ) characters between the backticks; in particular, backslashes
have no special meaning and the string may contain newlines. Carriage return characters
( \r ) inside raw string literals are discarded from the raw string value. - GoLang
documentation
https://medium.com/rungo/string-data-type-in-go-8af2b639478 10/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/9Ir-0Lxx0u3
We can see that original formatting of the string with newline, tab and double
quotes persisted in the output and newline character \n did nothing while carriage
return \r was discarded.
Character comparison
As character represented in single quotes in Go is rune and rune can be compared
because they represent Unicode code points ( int32 values). Hence if a character has
more decimal value, it will be greater than the character which has lower.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 11/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/lxGiJzNeNWO
Since int32 value of b is greater than a, the expression 'b' > 'a' will be true. Let's
see another example.
https://play.golang.org/p/aw8Sv8Vto-c
Since we know that characters are nothing but int32 internally, we can do all sorts
of comparisons with them. For example, a for loop between two character-value
range.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 12/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
https://play.golang.org/p/kS4vxuSSmWg
This was a basic introduction to Strings in Go but there are many utility functions
provided by strings package that can be used to perform all sorts of operations on
string like join, replace, search, etc. The strings package is a part of Go’s standard
library.
https://medium.com/rungo/string-data-type-in-go-8af2b639478 13/14
18/12/2022, 13:40 String Data Type in Go. Strings in Go deserve special attention… | by Uday Hiwarale | RunGo | Medium
Go Programming Language
https://medium.com/rungo/string-data-type-in-go-8af2b639478 14/14