Ultimate Guide Building Database Intensive Apps Go Ebook 2019 PDF
Ultimate Guide Building Database Intensive Apps Go Ebook 2019 PDF
Gophers by Ashley McNamara
By Baron Schwartz
2019 Edition
Page 1
Table of Contents
Introduction 2
What is database/sql? 3
Your First database/sql Program 5
Using a sql.DB 7
Fetching Result Sets 11
Modifying Data With db.Exec() 17
Using Prepared Statements 19
Working with Transactions 23
Working With a Single Connection 26
Error Handling 27
Using Built-In Interfaces 28
Monitoring, Tracing, Observability 31
Working With Context 33
Database Drivers 34
Common Pitfalls 35
Conclusions 37
Meet the Author
Baron Schwartz
Baron is a performance and scalability expert who
participates in various database, opensource, and
DevOps communities. He has helped build and scale
many large, high-traffic services for Fortune 1000
clients. He has written several books, including
O'Reilly's best-selling High Performance MySQL.
Baron has a CS degree from the University of Virginia.
Copyright © 2019 VividCortex
Page 2
Introduction
Congratulations! You’ve discovered the ultimate resource for writing
database-intensive applications in the Go programming language.
What is Go, and who uses it? Go is a modern language in the C family. It is elegant,
simple, and clear, making it maintainable. It includes a garbage collector to manage
memory for you. Its built-in features make it easy to write concurrent programs. These
include goroutines, which you can think of as lightweight threads, and mechanisms to
communicate amongst goroutines. At the same time, Go is strongly typed and
compiles to self-contained binaries free of external dependencies, and is
high-performance and efficient in terms of CPU and memory usage.
Using Go to access databases brings you all the benefits of Go itself, plus an elegant
database interface and a vibrant community of users and developers writing
high-quality, open-source database drivers for you to use.
Go’s database/sql library has excellent documentation and source code but leaves a
lot of learning to the user. Fortunately, you’ve found this book, which will save you
time and mistakes! This book has years of collected wisdom from many experienced
programmers, distilled to just what you need to know, when you need to know it.
Go is an excellent choice for systems programming, where you might otherwise choose
Java, C or C++ for performance reasons. And it’s not a stretch to say that Go is one of
the main languages of cloud computing, with a strong presence in distributed systems
and microservices architectures. Here are some key use cases for choosing Go:
● Building high performance networked applications. Go is great for building API
servers, microservices, and all types of HTTP services among other things. It’s
not limited to HTTP, of course, it’s equally capable of speaking protocols like
RPC and interchanging data in every format you can think of.
● Building heavy-duty systems applications. A number of databases—including
distributed high-performance databases—have been written in Go recently. In
decades past, most of those would have been written in C or C++.
● Cloud migrations. A lot of companies undertake a rewrite at the same time
they move to the cloud, instead of just lift-and-shift. Go is a popular language
for this because of its simplicity, making it highly productive. A common joke is
that you get a Go programmer by letting a Java programmer use Go and they
never want to write in any other language again.
● Anywhere high-throughput, high-concurrency, low-latency, low-variability
Copyright © 2019 VividCortex
Page 3
What is database/sql?
database/sql is a package of functionality that’s included in Go’s standard library. It is
the idiomatic, official way to communicate with a row-oriented database. Loosely
speaking, it is designed for databases that are SQL or relational, or similar to
relational databases (many non-relational databases work just fine with it too).
database/sql provides much the same functionality for Go that you’d find in ODBC,
Perl’s DBI, Java’s JDBC, and similar. However, it is not designed exactly the same as
those, and you should be careful not to assume your knowledge of other database
interfaces applies directly in Go.
The database/sql package handles non-database-specific aspects of database
communication. These are tasks that have to be handled in common across many
database technologies, so they’re factored out into a uniform interface for you to use.
Copyright © 2019 VividCortex
Page 4
Copyright © 2019 VividCortex
Page 5
You’ll see how to use all these data types, and the abstractions they present, in the
following sections. Let’s get started with a quick-start: a “hello, world” program!
import (
"database/sql"
"log"
_ "github.com/go-sql-driver/mysql"
)
func main() {
db, err := sql.Open("mysql", "root:@tcp(:3306)/test")
if err != nil {
log.Fatal(err)
}
defer db.Close()
_, err = db.Exec(
"CREATE TABLE IF NOT EXISTS test.hello(world varchar(50))")
if err != nil {
log.Fatal(err)
}
Copyright © 2019 VividCortex
Page 6
Run your new Go program with go run hello_mysql.go. It’s safe to run it multiple
times. As you do, you should see output like the following as it continues to insert
rows into the table:
Desktop $ go run hello_mysql.go
2014/12/16 10:57:03 inserted 1 rows
2014/12/16 10:57:03 found row containing "hello world!"
Desktop $ go run hello_mysql.go
2014/12/16 10:57:05 inserted 1 rows
2014/12/16 10:57:05 found row containing "hello world!"
Copyright © 2019 VividCortex
Page 7
Congratulations! You’ve written your first program to interact with a MySQL server
using Go. What might surprise you is that this is not just a toy example. It is very
similar to the code you’ll use in production systems under high load, including error
handling. I’ll explore much of this code in further sections of this book and learn more
about what it does. For now, I’ll just mention a few highlights:
● You imported database/sql and loaded a driver for MySQL.
● You created a sql.DB with a call to sql.Open(), passing the driver name and
the connection string.
● You used E xec() to create a table and insert a row, then inspect the results.
● You used Q uery() to select the rows from the table, r
ows.Next() to iterate
over them, and r ows.Scan() to copy columns from the current row into
variables.
● You used . Close() to clean up resources when you were finished with them.
Let’s dig into each of these topics, and more, in detail. I’ll begin with the sql.DB itself
and see what it is and how it works.
Using a sql.DB
As I mentioned previously, a sql.DB is an abstraction of a database. (A common
mistake is to think of it as a connection to a database.) It exposes a set of functions
you use to interact with the database.
Internally, it manages a connection pool for you (a very important topic in this book),
and handles a great deal of tedious and repetitive work for you, all in a way that’s safe
to use concurrently from multiple goroutines.
The intent of a sql.DB is that you’ll create one object to represent each database
you’ll use, and keep it for a long time. Because it has a connection pool, it’s not meant
to be created and destroyed continually. You should make your single s ql.DB
available to all the code that needs it.
To create a sql.DB, as you saw in the previous section, you need to import a driver.
Copyright © 2019 VividCortex
Page 8
The bare-minimum imports you need in your program’s main file are as follows:
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
)
I’ll cover drivers in more detail later. For now it’s enough to note that you don’t need
access to the driver directly. That’s why you bind their import name to the _
anonymous variable, so their namespace isn’t usable from your code. This means that
you have imported the driver for side effects only. Behind the scenes, drivers use an
init() function to register themselves with the d atabase/sql package.
Now, to actually create an instance of a sql.DB, you usually use sql.Open() with two
arguments. The first is the driver’s name. This is the string that the driver registers
atabase/sql, and is typically the same as its package name to avoid confusion.
with d
The second argument is the connection string, or DSN (data source name) as some
people call it. This is driver-specific and has no meaning to database/sql. It is merely
passed to the driver you identify. It might include a TCP connection endpoint, a Unix
socket, username and password, a filename, or anything else you can think of. Check
the driver’s documentation for details.
At this point you might think you have a connection to a database, but you probably
don’t. You probably have only created an object in memory and associated it with a
driver; it hasn’t yet actually done anything like connecting to the database. This is
because most drivers don’t connect until you do something with the database. If you
want to check that the connection parameters are correct, you can “ping” the
database. That is what d b.Ping() is for. Idiomatic code looks like this:
db, err := sql.Open("driverName", "dataSourceName")
if err != nil {
log.Fatal(err)
}
defer db.Close()
err = db.Ping()
Copyright © 2019 VividCortex
Page 9
Now you know that your db variable is really ready to use. The above code, by the
way, is not really idiomatic in one regard. Usually you’d do something more intelligent
og.Fatal(err) as a
than just fatally logging an error. But in this book I’ll always show l
placeholder for real error handling.
An alternative way to construct a sql.DB is with the sql.OpenDB() function, which
takes a driver.Connector as its argument. This lets you specify connection
parameters more flexibly and clearly than concatenating a string together.
1
There are also variants of each of these functions, which accept a Context object. More on this later.
Copyright © 2019 VividCortex
Page 10
A consequence of the connection pool is that you do not need to check for, or
attempt to handle connection failures. If a connection fails when you perform an
operation on the database, database/sql will take care of it for you. Internally, it retries
up to 10 times when a connection in the pool is discovered to be dead. It simply
fetches another from the pool or opens a new one. This means that your code can be
clean and free of messy retry logic.
Copyright © 2019 VividCortex
Page 11
There are a few things to know about this code, and I’ll examine it outside-in,
beginning with iterating over the rows with rows.Next().
But what if you don’t exit the loop normally? What if you intentionally break out of it or
return from the function? If this happens, your results won’t be fetched and processed
completely, and the connection might not be released back to the pool. Handling
rows correctly requires thinking about this possibility. Your goal should be that
rows.Close() is always called to release its connection back to the pool. If it isn’t,
then the connection will never go back into the pool, and this can cause serious
problems manifested as a “leakage” of connections. If you’re not careful you can
easily cause server problems or reach your database server’s maximum number of
connections, causing downtime or unavailability.
How do you prevent this? First, you’ll be happy to know that if the loop terminates
due to rows.Next() returning false, whether normally or abnormally, rows.Close() is
automatically called for you, so in normal operation you won’t reserve connections
from the pool in these cases.
The remaining cases are an early return or breaking out of the loop. What you should
do in these cases depends on the circumstances. If you will return from the enclosing
function when processing ends, you should use d efer rows.Close(). This is the
idiomatic way to ensure that “must-run” code indeed always runs when the function
returns. And it’s also idiomatic (and important for correctness) to place such a cleanup
call immediately after the resource is created. Our modified code would then look like
this:
rows, err := db.Query("SELECT * FROM test.hello")
if err != nil {
log.Fatal(err)
}
defer rows.Close()
Copyright © 2019 VividCortex
Page 13
loop exits abnormally. We’ve seen that the normal reason for it to exit is when the
loop encounters an io.EOF error, making rows.Next() return false. Anytime
rows.Next() finds an error, it saves the error internally for later inspection, and exits
the loop.
You can then examine it with rows.Err(). I didn’t show this in the examples above, but
in real production code you should always check for an error after exiting the loop:
for rows.Next() {
// process the rows
}
if err = rows.Err(); err != nil {
log.Fatal(err)
}
The io.EOF error is a special case that is handled inside rows.Err(). You don’t need
to handle this explicitly in your code; rows.Err() will return nil so you won’t see it.
That’s pretty much everything you need to know about looping over the rows, except
for one small detail: handling errors from rows.Close(). Interestingly, this function
does return an error, but it’s a good question what can be done with it. If it doesn’t
make sense for your code to handle it (and I haven’t seen a case where it does), then
you can feel free to ignore it or just log it and continue.
As you can see, the idiomatic usage is a little different from before. Internally, the
sql.Row object holds either an error from the query, or a s ql.Rows from the query. If
Copyright © 2019 VividCortex
Page 14
there’s an error, then .Scan() will return the deferred error. If there’s none, then
.Scan() will work as usual, except that if there was no row, it returns a special error
ql.ErrNoRows. You can check for this error to determine whether the call
constant, s
Scan() actually executed and copied values from the row into your destination
to .
variables.
The argument types are the empty interface, interface{}, which you probably
already know is satisfied by any type in Go. In most cases, Go copies data2 from the
row into the destinations you give. The d atabase/sql package will examine the type
of the destination, and will convert values in many cases. This can help make your
code, especially your error-handling code, much smaller.
For example, suppose you have a column of numbers, but for some reason it’s not
numeric, it is instead a VARCHAR with numbers in ASCII format. You could scan the
column into a string variable, convert it into a number, and check errors at each step.
But you don’t need to, because d atabase/sql can do it for you! If you pass, say, a
can() will detect that you are trying to scan
float64 destination variable into the call, S
trconv.ParseFloat() for you, and return any errors.
a string into a number, call s
Another special case with scanning is when values are NULL in the database. A NULL
can’t be scanned into an ordinary variable, and you can’t pass a nil into rows.Scan().
Instead, you must use a special type as the scan destination. These types are defined
atabase/sql for many common types, such as s
in d ql.NullFloat64 and so forth.
If you need a type that isn’t defined, you can look to see whether your driver provides
one, or copy/paste the source code to make your own; it’s only a few lines of code.
After scanning, you can check whether the value was valid or not, and get the value if
2
There are special cases where the copy can be avoided if you want, but you have to use
*sql.RawBytes types to do that, and the memory is owned by the database and has a limited lifetime of
validity. If you need this behavior, you should read the documentation and the source code to learn
more about how it works, but most people won’t need it. Note that you can’t use *sql.RawBytes with
db.QueryRow().Scan() due to an internal limitation in database/sql.
Copyright © 2019 VividCortex
Page 15
it was. (If you don’t care, you can just skip the validity check; reading the value will
give the underlying type’s zero-value.)
Putting it all together, a more complex call to rows.Scan() might look like this:
var (
s1 string
s2 sql.NullString
i1 int
f1 float64
f2 float64
)
// Suppose the row contains ["hello", NULL, 12345, "12345.6789", "not-a-float"]
err = rows.Scan(&s1, &s2, &i1, &f1, &f2)
if err != nil {
log.Fatal(err)
}
The call to rows.Scan() will fail with the following error, illustrating that the last
column was automatically converted to a float, but it failed:
sql: Scan error on column index 4: converting string "not-a-float" to a float64:
strconv.ParseFloat: parsing "not-a-float": invalid syntax
However, since the arguments are handled in order, the rest of the scans would have
succeeded, and by removing the call to log.Fatal() you can see that with the
following lines of code:
err = rows.Scan(&s1, &s2, &i1, &f1, &f2)
log.Printf("%q %#v %d %f %f", s1, s2, i1, f1, f2)
// output:
// "hello" sql.NullString{String:"", Valid:false}
// 12345 12345.678900 0.000000
This illustrates that the s2 variable’s Valid field is false and its String field is empty,
as expected. Your code could inspect this variable and handle that as desired:
if s2.Valid {
// use s2.String
}
The database/sql package provides a way to get the column names, and thus also
the number of columns. To get the column names, use rows.Columns(). It returns an
error, so check for that as usual:
cols, err := rows.Columns()
if err != nil {
log.Fatal(err)
}
Now you can do something useful with the results. In the simplest case, when you
expect a variable number of columns to be used in different scenarios, but you know
their types, you can write something like the following code. Suppose that you get at
most 5 columns but in some cases fewer, and they are of type u int64, s tring, s
tring,
int32. Define a slice of i
string, u nterface{} with valid variables (not n il pointers)
that handles the largest case, then pass the appropriate sized slice of that to S can():
dest := []interface{}{
new(uint64),
new(string),
new(string),
new(string),
new(uint32),
}
err = rows.Scan(dest[:len(cols)])
If you don’t know the columns or the data types, you have two options. One is to
resort to sql.RawBytes. After scanning, you can examine the vals slice. For each
element you should check whether it’s n il, and use type introspection and type
assertions to figure out the type of the variable and handle it:
cols, err := rows.Columns()
vals := make([]interface{}, len(cols))
for i, _ := range cols {
vals[i] = new(sql.RawBytes)
}
for rows.Next() {
err = rows.Scan(vals...)
}
The resulting code (not shown!) is usually not very pretty, but there’s another option in
newer Go versions: using sql.ColumnType, which supports the following operations:
DatabaseTypeName(), D ecimalSize(), L
ength(), N
ame(), N
ullable(), and S
canType().
Copyright © 2019 VividCortex
Page 17
showstopper. A lot depends on the database and the driver implementation, but
those older versions of database/sql were designed for a query to return a single
result set, and were not capable of fetching the next result set or handling changes in
columns after the first row. This had a number of consequences such as making it
impossible to call stored procedures in MySQL.
Happily, Go version 1.8 introduced support for multiple result sets. After finishing
iteration over a sql.Rows variable, you can call rows.NextResultSet() to advance the
alse, there is either no next
variable to the next result set. If the function returns f
result set, or there was an error advancing; you should check r ows.Err() to figure out
what the situation is. Once you’ve advanced to a new result set, you need to restart
your r ows.Next() loop as usual, keeping in mind that result sets can be empty.
The db.Exec() call returns a sql.Result, which you can use to get the number of
rows affected, as shown. You can also use it to fetch the auto-increment ID of the
last-inserted row, although support for that varies by driver and database. In
PostgreSQL, for example, you should use I NSERT RETURNING and d
b.QueryRow() to
fetch the desired value as a result set.
There’s a vitally important difference between db.Exec() and db.Query(), and it isn’t
just a matter of being pedantic. As mentioned earlier, db.Exec() releases its
connection back to the pool right away, whereas d b.Query() keeps it out of the pool
ows.Close() is called. The following code ignores the returned rows, and will
until r
cause problems:
_, err := db.Query("DELETE FROM hello.world LIMIT 1")
Copyright © 2019 VividCortex
Page 18
The problem is that although the first value returned from the method is assigned to
the _ variable and is inaccessible to the program after that, it’s still really assigned, with
all the usual consequences. And it won’t go away until it’s garbage collected. Worse,
the connection that’s bound to it will never be returned to the connection pool. This is
a common way to “leak” connections and run the server out of available connections.
In addition to the above, there are some more subtleties you should know about the
Result. Go guarantees that the database connection that was used to create the
Result is the same one used for L astInsertId() and R
owsAffected(), and that it’s
taken out of the pool for these operations and locked. But beyond that, it’s an
interface type, and the exact behavior will be dependent on the underlying database
and the driver’s provided implementation. For example:
● MySQL can use a BIGINT UNSIGNED as an auto-increment column, so it’s
possible for the last-inserted row’s column to be too large to fit in int64, the
returned type defined by L astInsertId().
● The MySQL driver I prefer doesn’t make an extra round-trip to the database to
find out the last-inserted value and number of rows affected. This information
is returned from the server in the wire protocol, and stored in a struct, so
there’s no need for it. (The connection is still taken out of the pool and locked,
then put back, even though it’s not used. This is done by d atabase/sql, not
the driver. So even though this function doesn’t access the database, it may
still block waiting if its connection is busy.)
● Whether L astInsertId() and R owsAffected() return errors is driver-specific.
In the MySQL driver, for example, they will never return an error. You should
not rely on driver-specific details like this, though. Adhere to the contract that
is publicly promised in the d atabase/sql interface: functions that return errors
should be checked for errors.
● Some of the behavior of these functions varies between databases or
implementations. For example, suppose a database driver provides
RowsAffected() but implements it by making a query to the underlying
database (e.g. needlessly calling S ELECT LAST_INSERT_ID() in MySQL instead
of using the values returned in the protocol). As mentioned, the original
connection will be used, so to that extent the behavior will be correct, but what
if other work has been done on that connection in the meantime? You’d be
subject to a race condition. This is an area where you’ll need to know the actual
implementation you’re working with.
In general, the database drivers I’ve seen and worked with are well implemented and
you don’t need to worry about these finer points most of the time. But I want you to
be aware of the details anyway. Now let’s see how to use prepared statements!
Copyright © 2019 VividCortex
Page 19
res, err := db.Exec(
"INSERT INTO test.hello(world) VALUES(?)", "hello world!")
See what I did there? I removed the literal hello world! from the SQL and put a ?
placeholder in its place, then called the method with the value as a parameter. Under
the hood, Go handles this as follows:
1. It treats parameter 0 as the statement, and prepares it with the server.
2. It executes the resulting prepared statement with the rest of the parameters (in
this case just one).
3. It closes the prepared statement.
Prepared statements have their benefits:
● They are convenient; they help you avoid code to quote and assemble SQL.
● They guard against SQL injection and other attacks by avoiding quoting
vulnerabilities, so they enhance security.
● There may be driver-specific, protocol-specific, and database- specific
performance and other enhancements. For example, in MySQL, prepared
statements use the so-called binary protocol for sending parameters to the
server, which can be more efficient than sending them as ASCII.
● They can reward you with additional efficiency by eliminating repeated SQL
parsing, execution plan generation, and so on.
Some of these benefits apply no matter how many times statements are executed,
but some are only beneficial if a statement will be repeatedly re-executed. As a result,
you should be aware of the automatic use of prepared statements when you call
b.Query() and db.Exec() with more than one parameter.
functions such as d
In addition to behind-the-scenes use of prepared statements, you can prepare
statements explicitly. In fact, to reuse them and gain some of the benefits mentioned
above, you must prepare them explicitly. Use the d b.Prepare() function for that. The
ql.Stmt variable:
result is a s
stmt, err := db.Prepare("INSERT INTO test.hello(world) VALUES(?)")
if err != nil {
log.Fatal(err)
}
Now you can repeatedly execute the statement with parameters as desired. The stmt
variable’s method signatures match those you’ve been using thus far on the db
tmt.Exec():
variable. In this case I’ll use s
for _, str := range []string{"hello1", "hello2", "hello3"} {
res, err := stmt.Exec(str)
Copyright © 2019 VividCortex
Page 21
Copyright © 2019 VividCortex
Page 22
leading to lots of re-preparing. In the worst cases, I have seen statements appear to
“leak” due to being re-prepared as many times as there are connections. When
combined with bugs that lead to connections not being returned to the pool as
previously discussed, it’s even possible to exceed the maximum number of
statements the server will permit to be prepared at one time.
Another subtlety of prepared statements is that it’s quite likely that at least some
prepared statements will be prepared and then immediately re-prepared upon
execute, due to the statement’s original connection being returned to the pool and
grabbed by another user during the interval between d b.Prepare() and s
tmt.Exec().
If you use a network traffic capture inspection tool such as VividCortex, you’ll be able
to verify this. For example, in one of the VividCortex blog posts we analyzed
single-use prepared statements. Careful inspection reveals that the count of
Prepare() actually exceeds the count of E xec() by a small margin, which is expected
due to the phenomenon just mentioned.
Another consequence of how the pool handles connections and prepared statements
is that you do not need to explicitly handle problems with prepared statements being
invalidated by, for example, a failed connection that was killed or timed out
server-side. The connection pool will handle this for you transparently. In other words,
you shouldn’t write any logic to re-prepare statements, just like you don’t need to
write any logic to re-connect to the database. There are up to 10 retries hidden within
database/sql.
Copyright © 2019 VividCortex
Page 23
Copyright © 2019 VividCortex
Page 24
statements were executed on the same connection. You could have started a
transaction (and left it open and idle!) on one connection, updated the account table
on another connection, and committed some other connection’s in-flight transaction.
Don’t do this!
Creating statefulness and binding things to a single connection is exactly what a
sql.Tx is for, and that’s what you should do instead. The essence is as follows:
● You create a sql.Tx with db.Begin()—or with its newer cousin, db.BeginTx(),
which accepts options in a struct, letting you specify attributes like an isolation
level.
● It removes exactly one connection from the pool and keeps it until it’s finished.
● The driver is instructed to start a transaction on that connection.
● The connection, its transaction, and the t x variable are coupled, but the t
x and
the d b are disconnected from each other.
● The lifetime of the transaction and the t x ends with t x.Commit() or
tx.Rollback() and the variable is invalid after that.
Let’s dig into these and see how transactions work. First, after you use db.Begin() to
create the sql.Tx, you should operate solely on the tx variable, and ignore the db for
anything that needs to work within the transaction. This is because the d b isn’t in a
transaction, the t x is! This is a common source of confusion for programmers.
Code like the following is another buggy anti-pattern:
tx, err := db.Begin()
// ...
_, err = db.Exec("UPDATE account SET balance = 100 WHERE user = 83")
// ...
err = tx.Commit()
The programmer might not realize it, but the UPDATE did not happen within the context
of the transaction. Instead of db.Exec(), the code should use tx.Exec(). Study this if
it’s confusing, because it’s important. On the database server, the transaction is
scoped to a single connection; in the code, the connection is bound to the t x variable
and not available through the d b anymore. When you call methods on the d b, you’re
operating on a different connection, which doesn’t participate in the transaction.
The tx variable, as you’ve seen previously with prepared statements, has all the
familiar methods with the same signatures: Query(), Exec(), and so on. There’s even
at x.Prepare() to prepare statements that are bound solely to the transaction.
However, prepared statements work differently within a t x. A s
ql.Stmt, when
associated with a s ql.Tx, is bound to the one and only one underlying connection to
the database, so there’s no automatic repreparing on different connections.
Copyright © 2019 VividCortex
Page 25
To clarify this a bit further, a stmt that was prepared from a db is invalid on the tx, and
as tmt that was prepared from a tx is valid only on the tx. (There is a way to “clone” a
stmt into the scope of the t x by using t
x.Stmt() but this works by the statement
being reprepared even if it has already been prepared on the t x’s connection.)
Within the scope of a tx, the usual implicit logic of retrying 10 times also is disabled.
You’ll need to be prepared to retry statements or restart the entire transaction
yourself, as appropriate for the scenario. This is meat-and-potatoes behavior for
transactional databases, naturally. You’ve always needed to be ready to handle
deadlocks and rollbacks when dealing with transactions. And of course a transaction
can’t be started on one connection and continued on another in most databases.
This is okay because when you’re working with a db variable, the inner statement will
get a new connection from the pool and use it, so the loop will effectively use (at
least) two connections. (Inside the loop, the first connection is busy fetching rows for
ueryRow() to use.)
the loop, so it’s not available for the Q
In the scope of a tx, however, that won’t work:
rows, _ := tx.Query("SELECT id FROM master_table")
for rows.Next() {
var mid, did int
rows.Scan(&mid)
tx.QueryRow("SELECT id FROM detail_table WHERE master = ?", mid).Scan(&did)
// **BOOM**
}
If you do that, you’ll be trying to start a new query on the tx’s connection, but it’s busy
in row-fetching mode and that won’t work. The example is rather silly and could be
replaced by a J OIN, and doesn’t even need to be in a transaction for that matter
Copyright © 2019 VividCortex
Page 26
because it’s not modifying any data, but hopefully you see the point. Within the scope
x, you have to finish each statement before you start the next one.
of a t
On the other hand, a corollary also holds. If you need to do some inner-loop fetching
within the context of a tx, but it doesn’t need to participate in the transaction itself for
some reason, you could use the d b (which isn’t in the transaction) to perform it
instead. Just be careful that your code doesn’t ambush some other programmer later!
Copyright © 2019 VividCortex
Page 27
Error Handling
Idiomatic error handling in Go is to explicitly and immediately check for an error after
every function call that can return an error, and it’s no different in database/sql.
However, for the sake of brevity, I’ve omitted some error handling in several code
listings thus far.
In general, all of the method calls I’ve shown to this point can return an error, even
when I didn’t show it. There is one place, however, where you should check for an
ows.Next() loop. I covered this
error that isn’t returned by a function call: after the r
usage of r ows.Err() previously.
The other special consideration for error-checking is how to inspect and handle
errors that the database returns. You might find yourself doing string-matching, for
example, to try to catch errors such as a deadlock:
if strings.Contains(err.Error(), "Deadlock found") {
// Handle the error
}
But what if the database’s language is not set to English, and error codes are
returned in Elbonian? Oops. Plus, string-matching like this is just a code smell.
The solution for this depends on the driver and the database. Although some
databases return ANSI-standard error codes for some operations, these really are not
specific enough for use in most applications. Instead, it’s better to deal with the
database’s own error codes, which are usually very granular and allow you to isolate
exactly what happened.
To do this, you’ll have to import the driver and use a type assertion to get access to
the driver-specific struct underlying the error, like this example using the MySQL
driver:
if driverErr, ok := err.(*mysql.MySQLError); ok {
if driverErr.Number == 1213 {
// Handle the error
}
}
That’s better, but still has a code smell. What’s that magic number 1213? It’s much
better to use a defined list of error numbers. Alas, the driver I like doesn’t have such a
list in it, for various reasons. Fortunately, VividCortex provides one at
github.com/VividCortex/mysqlerr. Now the code can be cleaned up more:
if driverErr, ok := err.(*mysql.MySQLError); ok {
if driverErr.Number == mysqlerr.ER_LOCK_DEADLOCK {
Copyright © 2019 VividCortex
Page 28
Much better. In the most popular PostgreSQL driver, you can use driver-provided
types and a driver-provided error list too.
import (
"database/sql"
"database/sql/driver"
"errors"
_ "github.com/go-sql-driver/mysql"
"log"
"strings"
)
Copyright © 2019 VividCortex
Page 29
// Implements driver.Valuer.
func (ls LowercaseString) Value() (driver.Value, error) {
return driver.Value(strings.ToLower(string(ls))), nil
}
func main() {
db, err := sql.Open("mysql",
"root:@tcp(:3306)/test")
if err != nil {
log.Fatal(err)
}
defer db.Close()
_, err = db.Exec(
"CREATE TABLE IF NOT EXISTS test.hello(world varchar(50))")
if err != nil {
log.Fatal(err)
}
Copyright © 2019 VividCortex
Page 30
As you can see, both rows are apparently lowercased. But if you look in the database,
you’ll see a different picture:
mysql> select * from test.hello;
+-------------------------------+
| world |
+-------------------------------+
| I AM UPPERCASED NORMAL STRING |
| i am uppercased magic string |
+-------------------------------+
This is because when I inserted into the database, I used one normal string variable,
which got inserted as-is, and one LowercaseString variable, which got lowercased on
the way into the database. But while reading these rows back, I used a
LowercaseString as a destination variable, and both of the rows were transformed to
lowercase, so the program printed them out in lowercase.
This is a very simple example. Real-world examples include much more useful things,
such as:
● Enforcing validation of data that must be formatted in a specific way.
● Transforming data into a uniform format.
● Compressing and decompressing data transparently.
● Encrypting and decrypting data transparently.
Here’s a sample implementation of gzip compression and decompression, courtesy of
Copyright © 2019 VividCortex
Page 31
Now the GzippedText type can be used just as easily as a []byte in your source
code. At VividCortex, we use similar techniques to transparently encrypt all of our
customers’ sensitive data when we insert it into our databases. You can read more
about that on our blog post about encryption.
The database/sql package used to be hard to observe, but it’s gotten steadily more
instrumentation built in over time. The two primary means of observing and
controlling its internals are the stats it exposes, and contexts. Contexts are discussed
in the next section.
The stats interface is fairly simple: a call to db.Stats() will return a stats struct. The
struct has added progressively more members, and at the time of writing, has the
following:
● MaxOpenConnections. This is the current setting of the maximum permitted
open connections. It’s not a performance counter, it reports configuration.
● OpenConnections. The current total number of connections, both in use and
idle.
● InUse and I dle. The breakdown of states of the open connections.
● WaitCount. The total number of times a connection was needed but not
available, and a calling program had to wait.
● WaitDuration. The total amount of time such calling programs had to wait for a
connection.
● MaxIdleClosed and M axLifetimeClosed. The total number of times a
connection was closed due to S etMaxIdleConns or S etConnMaxLifetime.
If you don’t see all of these fields, you might have an older version of Go. You can
collect these fields and turn them into metrics in your favorite monitoring system. The
b.Stats() is cheap and thread-safe, so there’s no reason not to collect the
call to d
stats frequently if your monitoring system supports high-resolution metrics.
That covers the instrumentation that database/sql offers for observing its workings,
but when you’re building services with Go, monitoring the application/database
interactions themselves is a must, too. Here’s a sample of some of the important
things you’ll need to be able to measure in order to keep apps running reliably and
with high performance:
● Are connections being opened and closed, or reused?
● Are prepared statements being repeatedly created and used just once, or
created once and reused many times? Are they being closed, or left to time
out and be deallocated when the connection closes?
● What are the most frequent and most time-consuming types of statements?
● Are “garbage” database interactions, such as constant needless “ping”
activities, draining resources and adding latency?
● If a query takes a long time to execute, was it executing for a long time, or was
the call simply blocked waiting for a free connection in the pool?
All of these questions require highly detailed observability of the database and the
Copyright © 2019 VividCortex
Page 33
Copyright © 2019 VividCortex
Page 34
Contexts are used in various ways in database/sql. For example a context might be
used only for preparing a statement, but not for executing it; you might use a different
context for that. It’s best to get to know the documentation to learn the nuances. At
the time of writing, I personally haven’t accumulated any real-world experience with
contexts that would qualify me to tell you anything non-obvious about them.
Driver support for contexts varies. My favorite MySQL driver uses it for query timeouts
and cancellation, but my preferred PostgreSQL driver doesn’t actually implement any
functionality related to contexts.
Database Drivers
I’ve alluded several times to third-party database drivers, but what are they, really?
And what do they do? It would be great to write a manual for how to create a driver,
but that’s a little out of scope for this book. Instead I’ll cover what they do (briefly) and
how they work, and list some good open-source ones you might be interested in.
In brief, the driver’s responsibilities are:
1. To open a connection to the database and communicate over it. The driver
need not implement any kind of pooling or caching of connections, because
database/sql does that itself. The connection must support preparing
statements, beginning transactions, and closing.
2. To implement Rows, an iterator over an executed query’s results.
3. To implement an interface for examining an executed statement’s results.
4. To implement prepared statements that can be executed and closed.
5. To implement transactions that can be committed or rolled back.
6. To implement bidirectional conversions between values as provided by the
database and values in Go.
Drivers can optionally implement a few nice-to-have functionalities as well, most of
which signal that the driver and database support a fast-path operation for specific
things (such as querying the database directly without using a prepared statement).
Drivers become available by registering themselves with database/sql via the
ql.Open(). This is
sql.Register() call. They register under the name you’ll use in s
nit() function, similar to the following:
done with an i
func init() {
sql.Register("mysql", &MySQLDriver{})
}
Copyright © 2019 VividCortex
Page 35
This init() function executes when the package is imported, as shown previously in
many code samples.
You can find a list of loaded drivers by calling sql.Drivers(), by the way. Here are
some of the drivers that are good quality and idiomatic Go code:
● MySQL: github.com/go-sql-driver/mysql (godoc)
● PostgreSQL: github.com/lib/pq (godoc)
You can find more drivers on the Go wiki page for drivers.
If your application is tightly bound to the underlying database (as most are), you’ll likely
want to get to know your preferred driver well. For example, if you use the PostgreSQL
driver I just mentioned, you might be interested in some of the extra bits it exports,
such as a N ullBool type and helpful error-handling functionality. Be sure to read your
driver’s documentation carefully to learn about all these little goodies.
Common Pitfalls
As you’ve seen, although the surface area of database/sql is pretty small, there’s a lot
you can do with it. That includes a lot of places you can trip up and make a mistake.
This section is dedicated to all the mistakes I’ve made, in hopes that you won’t make
them yourself.
Deferring inside a loop. A long-lived function with a query inside a loop, and defer
rows.Close() inside the loop, will cause both memory and connection usage
to grow without bounds.
Opening many db objects. Make a global sql.DB, and don’t open a new one for, say,
every incoming HTTP request your API server should respond to. Otherwise
you’ll be opening and closing lots of TCP connections to the database. It’ll
cause a lot of latency, load, and TCP connections in TIME_WAIT status.
Not doing rows.Close() when done. Forgetting to close the rows variable means
leaking connections. Combined with growing load on the server, this likely
means running into “too many connections” errors or similar. Run
rows.Close() as soon as you can, even if it’ll later be run again (it’s harmless).
b.QueryRow() and .
Chain d Scan() together for the same reason.
Single-use prepared statements. If a prepared statement isn’t going to be used more
than once, consider whether it makes sense to assemble the SQL with
fmt.Sprintf() and avoid parameters and prepared statements. This could
save two network round-trips, a lot of latency, and potentially wasted work.
Copyright © 2019 VividCortex
Page 36
Prepared statement bloat. If code will be run at high concurrency, consider whether
prepared statements are the right solution, since they are likely to be
reprepared multiple times on different connections when connections are
busy.
Cluttering the code with strconv or casts. Scan into a variable of the type you want,
and let .Scan() convert behind the scenes for you.
Cluttering the code with error-handling and retry. Let database/sql handle
connection pooling, reconnecting, and retry logic for you.
Forgetting to check errors after rows.Next(). Don’t forget that the rows.Next() loop
can exit abnormally.
Using db.Query() for non-SELECT queries. Don’t tell Go that you want to iterate over
a result set if there won’t be one, or you’ll leak connections. Don’t use
db.Query() when you should use d b.Exec() instead.
Assuming that subsequent statements use the same connection. Run two
statements one after another and they’re likely to run on two different
connections. Run L OCK TABLES tbl1 WRITE followed by S ELECT * FROM tbl1
and you’re likely to block and wait. If you need a guarantee of a single
statement being used, you need to use a transaction or a s ql.Conn.
Accessing the db while working with a tx. A sql.Tx is bound to a transaction, but the
db is not, so access to it will not participate in the transaction. This might be
what you want… or not!
Being surprised by a NULL. You can’t scan a NULL into a variable unless it is one of
ullXXX types provided by the d
the N atabase/sql package (or one of your own
making, or provided by the driver). Examine your schema carefully, because if
a column can be N ULL, someday it will be, and what works in testing might
blow up in production.
Passing a uint64 as a parameter. For some reason the Query(), QueryRow(), and
Exec() methods don’t accept parameters of type u int64 with the most
significant bit set. If you start out small and eventually your numbers get big,
they could start failing unexpectedly. Convert them to strings with
fmt.Sprint() to avoid this.
Copyright © 2019 VividCortex
Page 37
Conclusion
By now I hope you’re convinced that Go is a great programming language for writing
next-generation Internet-facing services (and lots of other things), and that it’s adept
at working with relational databases. Go continues to evolve, and I’m sure I will get a
chance to write a third revision of this book at some point. (You’re reading the second
revision; the first was written just after Go 1.2 was released). What impresses me the
most about Go is how it continually grows more powerful while remaining so simple,
clear, and elegant. I am not sure what needs to be changed or improved in
database/sql for future versions of Go, but I look forward to finding out!
I hope you’ve found this book enjoyable, and I hope you’re able to put your new
knowledge to work to build something great, and avoid many of the mistakes I’ve
made. I certainly enjoyed writing it—almost as much as I enjoyed learning the lessons
I’ve shared in these pages!
I didn’t write this book alone. In particular I’d like to thank the team at VividCortex,
who contributed their experiences to the book too, as well as reviewing it. Any errors
in this book are mine alone, though. I’d also like to thank Ashley McNamara, who
generously contributed the gophers on the cover. She draws such wonderful
gophers!
If you have any suggestions or comments about this book, you can email me at
[email protected]. I’d love to hear your thoughts.
Copyright © 2019 VividCortex
Page 38
We
Can Help You Build
We hope you enjoyed the second edition of The Ultimate Guide to Building
Database-Intensive Apps with Go, and have found value in the tips we shared.
Many companies use our fast, easy, cloud-based monitoring to help them further
optimize their applications by providing deep visibility into every interaction with their
databases. VividCortex can help you:
● Deploy Code with Confidence
Copyright © 2019 VividCortex
Page 39
VividCortex is a database performance monitoring solution that provides observability into database
workload and query response, so engineers can find and resolve issues fast. The result is better
application performance, reliability, and uptime. Industry leaders use VividCortex to innovate with
confidence — visualizing, anticipating, and fixing database performance problems before they
impact their applications and customers.
Find out how VividCortex database monitoring can help you optimize apps built with Go.
Request a FREE TRIAL Today!
Copyright © 2019 VividCortex