A Formal Specification of The JQ Language: Michael Färber
A Formal Specification of The JQ Language: Michael Färber
MICHAEL FÄRBER
jq is a widely used tool that provides a programming language to manipulate JSON data. However, the jq
language is currently only specified by its implementation, making it difficult to reason about its behaviour.
To this end, we provide a formal syntax and denotational semantics for a large subset of the jq language. Our
most significant contribution is to provide a new way to interpret updates that allows for more predictable
and performant execution.
CCS Concepts: Software and its engineering → Semantics; Functional languages.
Additional Key Words and Phrases: jq, JSON, semantics
1 INTRODUCTION
UNIX has popularised the concept of filters and pipes [1]: A filter is a program that reads from an
input stream and writes to an output stream. Pipes are used to compose filters.
JSON (JavaScript Object Notation) is a widely used data serialisation format [2]. A JSON value
is either null, a boolean, a number, a string, an array of values, or an associative map from strings
to values.
jq is a tool that provides a language to define filters and an interpreter to execute them. Where
UNIX filters operate on streams of characters, jq filters operate on streams of JSON values. This
allows to manipulate JSON data with relatively compact filters. For example, given as input the
public JSON dataset of streets in Paris [3], jq retrieves the number of streets (6528) with the fil-
ter “length”, the names of the streets with the filter “.[].nomvoie”, and the total length of all
streets (1574028 m) with the filter “[.[].longueur] | add”. jq provides syntax to update data; for
example, to remove geographical data obtained by “.[].geo_shape”, but leaving intact all other
data, we can use “.[].geo_shape |= empty”. This shrinks the dataset from ~25 MB to ~7 MB.
jq provides a Turing-complete language that is interesting on its own; for example, “[0, 1] |
recurse([.[1], add])[0]" generates the stream of Fibonacci numbers. This makes jq a widely
used tool. We refer to the program jq as “jq” and to its language as “the jq language”.
The jq language is a dynamically typed, lazily evaluated functional programming language with
second-class higher-order functions [4]. The semantics of the jq language are only informally
specified, for example in the jq manual [5]. However, the documentation frequently does not
cover certain cases, and historically, the implementation often contradicted the documentation.
The underlying issue is that there existed no formally specified semantics to rely on. Having such
semantics allows to determine whether certain behaviour of a jq implementation is accidental or
intended.
However, a formal specification of the behaviour of jq would be very verbose, because jq has
many special cases whose merit is not apparent. Therefore, we have striven to create denotational
semantics (Section 5) that closely resemble those of jq such that in most cases, their behaviour
This work is licensed under a Creative Commons Attribution 4.0 International License.
© 2024 Copyright held by the owner/author(s).
2 Färber
coincides, whereas they may differ in more exotic cases. The goals for creating these semantics
were, in descending order of importance:
• Simplicity: The semantics should be easy to describe, understand, and implement.
• Performance: The semantics should allow for performant execution.
• Compatibility: The semantics should be consistent with jq.
We created these semantics experimentally, by coming up with jq filters and observing their out-
put for all kinds of inputs. From this, we synthesised mathematical definitions to model the be-
haviour of jq. The most significant improvement over jq behaviour described in this text are the
new update semantics (Section 6), which are simpler to describe and implement, eliminate a range
a potential errors, and allow for more performant execution.
The structure of this text is as follows: Section 2 introduces jq by a series of examples that give
a glimpse of actual jq syntax and behaviour. From that point on, the structure of the text follows
the execution of a jq program as shown in Figure 1. Section 3 formalises a subset of jq syntax and
shows how jq syntax can be transformed to increasingly low-level intermediate representations
called HIR (Section 3.1) and MIR (Section 3.2). After this, the semantics part starts: Section 4 de-
fines the type of JSON values and the elementary operations that jq provides for it. Furthermore,
it defines other basic data types such as errors, exceptions, and streams. Section 5 shows how to
evaluate jq filters on a given input value. Section 6 then shows how to evaluate a class of jq filters
that update values using a filter called path that defines which parts of the input to update, and
a filter that defines what the values matching the path should be replaced with. The semantics of
jq and those that will be shown in this text differ most notably in the case of updates. Finally, we
show how to prove properties of jq programs by equational reasoning in Section 7.
2 TOUR OF JQ
This goal of this section is to convey an intuition about how jq functions. The official documenta-
tion of jq is its user manual [5].
jq programs are called filters. For now, let us consider a filter to be a function from a value to
a (lazy, possibly infinite) stream of values. Furthermore, in this section, let us assume a value to
be either a boolean, an integer, or an array of values. (We introduce the full set of JSON values in
Section 4.)
Figure 1: Evaluation of a jq program with an input value. Solid lines indicate data flow, whereas a
dashed line indicates that a component is defined in terms of another.
A formal specification of the jq language 3
³The filters in this section can be executed on most UNIX shells by echo $INPUT | jq $FILTER,
where $INPUT is the input value in JSON format and $FILTER is the jq program to be executed. Often, it
is convenient to quote the filter; for example, to run the filter “.” with the input value 0, we can run
echo 0 | jq '.'. In case where the input value does not matter, we can also use jq -n $FILTER,
which runs the filter with the input value null. We use jq 1.7.
4 Färber
2) + (1, 2)”, because arithmetic operators such as “f + g” take as inputs the Cartesian product
of the output of f and g.⁴ However, there are cases where variables are indispensable.
Example 2.2 (Variables Are Necessary) : jq defines a filter “in(xs)” that expands to “. as $x | xs
| has($x)”. Given an input value i, “in(xs)” binds it to $x, then returns for every value produced
by xs whether its domain contains $x (and thus i). Here, the domain of an array is the set of its
indices. For example, for the input 1, the filter “in([5], [42, 3], [])” yields the stream false,
true, false, because only [42, 3] has a length greater than 1 and thus a domain that contains
1. The point of this example is that we wish to pass xs as input to has, but at the same point, we
also want to pass the input given to in as an argument to has. Without variables, we could not
do both.
Folding over streams can be done using reduce and foreach: The filter “reduce xs as $x (init;
f)” keeps a state that is initialised with the output of init. For every element $x yielded by the
filter xs, reduce feeds the current state to the filter f, which may reference $x, then sets the state
to the output of f. When all elements of xs have been yielded, reduce returns the current state.
For example, the filter “reduce .[] as $x (0; . + $x)” calculates the sum over all elements of
an array. Similarly, “reduce .[] as $x (0; . + 1)” calculates the length of an array. These two
filters are called “add” and “length” in jq, and they allow to calculate the average of an array by
“add / length”. The filter “foreach xs as $x (init; f)” is similar to reduce, but also yields
all intermediate states, not only the last state. For example, “foreach .[] as $x (0; . + $x)”
yields the cumulative sum over all array elements.
Updating values can be done with the operator “|=”, which has a similar function as lens setters
in languages such as Haskell [6]–[8]: Intuitively, the filter “p |= f” considers any value v returned
by p and replaces it by the output of f applied to v. We call a filter on the left-hand side of “|=” a
path expression. For example, when given the input [1, 2, 3], the filter “.[] |= (. + 1)” yields
[2, 3, 4], and the filter “.[1] |= (. + 1)” yields [1, 3, 3]. We can also nest these filters;
for example, when given the input [[1, 2], [3, 4]], the filter “(.[] | .[]) |= (. + 1)”
yields [[2, 3], [4, 5]]. However, not every filter is a path expression; for example, the filter
“1” is not a path expression because “1” does not point to any part of the input value but creates
a new value.
Identities such as “.[] |= f” being equivalent to “[.[] | f]” when the input value is an array,
or “. |= f” being equivalent to f, would allow defining the behaviour of updates. However, these
identities do not hold in jq due the way it handles filters f that return multiple values. In particular,
when we pass 0 to the filter “. |= (1, 2)”, the output is 1, not (1, 2) as we might have expected.
Similarly, when we pass [1, 2] to the filter “.[] |= (., .)”, the output is [1, 2], not [1, 1, 2,
2] as expected. This behaviour of jq is cumbersome to define and to reason about. This motivates
in part the definition of more simple and elegant semantics that behave like jq in most typical use
cases but eliminate corner cases like the ones shown. We will show such semantics in Section 6.
3 SYNTAX
This section describes the syntax for a subset of the jq language that will be used later to define
the semantics in Section 5. To set the formal syntax apart from the concrete syntax introduced in
Section 2, we use cursive font (as in “𝑓”, “𝑣”) for the specification instead of the previously used
typewriter font (as in “f”, “v”).
⁴Haskell users might appreciate the similarity of the two filters to their Haskell analoga “[0, 2] >>=
(\x -> [1, 2] >>= (\y -> return (x+y)))” and “(+) <$> [0, 2] <*> [1, 2]”, which both return
[1, 2, 3, 4].
A formal specification of the jq language 5
We will start by introducing high-level intermediate representation (HIR) syntax in Section 3.1.
This syntax is very close to actual jq syntax. Then, we will identify a subset of HIR as mid-level
intermediate representation (MIR) in Section 3.2 and provide a way to translate from HIR to MIR.
This will simplify our semantics in Section 5. Finally, in Section 3.3, we will show how HIR relates
to actual jq syntax.
3.1 HIR
A filter 𝑓 is defined by
𝑓 ≔𝑛 ‖ 𝑠 ‖ .
‖ (𝑓) ‖ 𝑓? ‖ [𝑓] ‖ {𝑓 : 𝑓, …, 𝑓 : 𝑓} ‖ 𝑓𝑝? …𝑝?
‖ 𝑓⋆𝑓 ‖ 𝑓 ⚬𝑓
‖ 𝑓 as $𝑥 | 𝑓 ‖ 𝜙 𝑓 as $𝑥(𝑓; 𝑓) ‖ $𝑥
‖ label $𝑥 | 𝑓 ‖ break $𝑥
‖ if 𝑓 then 𝑓 else 𝑓 ‖ try 𝑓 catch 𝑓
‖ 𝑥 ‖ 𝑥(𝑓; …; 𝑓)
3.2 MIR
We are now going to identify a subset of HIR called MIR and show how to lower a HIR filter to a
semantically equivalent MIR filter.
A MIR filter 𝑓 has the shape
𝑓 ≔𝑛 ‖ 𝑠 ‖ .
‖ [𝑓] ‖ {} ‖ {𝑓 : 𝑓} ‖ .𝑝
‖ 𝑓⋆𝑓 ‖ $𝑥 ⚬ $𝑥
‖ 𝑓 as $𝑥 | 𝑓 ‖ 𝜙 𝑓 as $𝑥(.; 𝑓) ‖ $𝑥
‖ if $𝑥 then 𝑓 else 𝑓 ‖ try 𝑓 catch 𝑓
‖ label $𝑥 | 𝑓 ‖ break $𝑥
‖ 𝑥 ‖ 𝑥(𝑓; …; 𝑓)
Furthermore, the set of complex operators ⋆ in MIR does not include “=” and “⊙=” anymore.
Compared to HIR, MIR filters have significantly simpler path operations (.𝑝 versus 𝑓𝑝? …𝑝? )
and replace certain occurrences of filters by variables (e.g. $𝑥 ⚬ $𝑥 versus 𝑓 ⚬ 𝑓).
Table 2 shows how to lower an HIR filter 𝜑 to a semantically equivalent MIR filter ⌊𝜑⌋. In
particular, this desugars path operations and makes it explicit which operations are Cartesian
or complex. By convention, we write $𝑥′ to denote a fresh variable. Notice that for some com-
plex operators ⋆, namely “=”, “⊙=”, “⫽=”, “and”, and “or”, Table 2 specifies individual lowerings,
whereas for the remaining complex operators ⋆, namely “|”, “,”, “⊧”, and “⫽”, Table 2 specifies a
uniform lowering ⌊𝑓 ⋆ 𝑔⌋ = ⌊𝑓⌋ ⋆ ⌊𝑔⌋.
Table 3 shows how to lower a path part 𝑝? to MIR filters. Like in Section 3.1, the meaning of
superscript “?” is an optional presence of “?”. In the lowering of 𝑓𝑝1? …𝑝𝑛? in Table 2, if 𝑝𝑖 in the first
column is directly followed by “?”, then ⌊𝑝𝑖? ⌋ in the second column stands for ⌊𝑝𝑖 ?⌋$𝑥 , otherwise
$𝑥
for ⌊𝑝𝑖 ⌋$𝑥 . Similarly, in Table 3, if 𝑝 in the first column is followed by “?”, then all occurrences of
superscript “?” in the second column stand for “?”, otherwise for nothing.
Example 3.2.1: The HIR filter (.[]?[]) is lowered to (. as $𝑥′ | . | .[]? | .[]). Semantically, we will
see that this is equivalent to (.[]? | .[]).
Example 3.2.2 : The HIR filter 𝜇 ≡ .[0] is lowered to ⌊𝜇⌋ ≡ . as $𝑥 | . | ($𝑥 | 0) as $𝑦 | .[$𝑦].
Semantically, we will see that ⌊𝜇⌋ is equivalent to 0 as $𝑦 | .[$𝑦].
𝑝? ⌊𝑝? ⌋
$𝑥
[]? .[]?
[𝑓]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | .[$𝑦′ ]?
[𝑓 :]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | length()? as $𝑧 ′ | .[$𝑦′ : $𝑧 ′ ]?
[: 𝑓]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | 0 as $𝑧 ′ | .[$𝑧 ′ : $𝑦′ ]?
[𝑓 : 𝑔]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | ($𝑥 | ⌊𝑔⌋) as $𝑧 ′ | .[$𝑦′ : $𝑧 ′ ]?
Table 3: Lowering of a path part 𝑝? with input $𝑥 to a MIR filter.
A formal specification of the jq language 7
𝜑 ⌊𝜑⌋
𝑛, 𝑠, ., $𝑥, or break $𝑥 𝜑
(𝑓) ⌊𝑓⌋
𝑓? try ⌊𝑓⌋ catch empty()
[] [empty()]
[𝑓] [⌊𝑓⌋]
{} {}
{𝑓 : 𝑔} ⌊𝑓⌋ as $𝑥 | ⌊𝑔⌋ as $𝑦′ | {$𝑥′ : $𝑦′ }
′
𝑓 as $𝑥 | 𝑔 ⌊𝑓⌋ as $𝑥 | ⌊𝑔⌋
𝜙 𝑓𝑥 as $𝑥(𝑓𝑦 ; 𝑓) . as $𝑥 | ⌊𝑓𝑦 ⌋ | 𝜙⌊$𝑥′ | 𝑓𝑥 ⌋ as $𝑥(.; ⌊𝑓⌋)
′
The HIR filter 𝜑 ≡ [3] | .[0] = (length(), 2) is lowered to the MIR filter
⌊𝜑⌋ ≡ [3] | (length(), 2) as $𝑧 | ⌊𝜇⌋ ⊧ $𝑧. In Section 5, we will see that its output is ⟨[1], [2]⟩.
This lowering assumes the presence of one filter in the definitions, namely empty. This filter re-
turns an empty stream. We might be tempted to define it as {} | .[], which constructs an empty
object, then returns its contained values, which corresponds to an empty stream as well. However,
such a definition relies on the temporary construction of new values (such as the empty object
here), which is not admissible on the left-hand side of updates (see Section 6). For this reason, we
have to define it in a more complicated way, for example
empty() ≔ ({} | .[]) as $𝑥 | .
This definition ensures that empty can be employed also as a path expression.
8 Färber
⁵Actual jq syntax has a few more constructions to offer, including nested definitions, variable
arguments, string interpolation, modules, etc. However, these constructions can be transformed into
semantically equivalent syntax as treated in this text.
A formal specification of the jq language 9
recurse(. + 1). The lowering of the definition to MIR yields the same as the HIR definition, and
the lowering of the main filter to MIR yields recurse(. as $𝑥′ | 1 as $𝑦′ | $𝑥′ + $𝑦′ ).
Example 3.3.2 : Consider the jq program
def select(f): if f then . else empty end;
def negative: . < 0; .[] | select(negative). When given an array as an input, it yields
those elements of the array that are smaller than 0. Here, the definitions in the example are
converted to the HIR definitions select(𝑓) ≔ if 𝑓 then . else empty() and negative() ≔ . < 0,
and the main filter is converted to the HIR filter .[] | select(negative()). Both the definition of
select(𝑓) and the main filter are already in MIR; the MIR version of the remaining definition is
negative() ≔ . as $𝑥′ | 0 as $𝑦′ | $𝑥′ < $𝑦′ .
We will show in Section 5 how to run the resulting MIR filter 𝑓 in the presence of a set of MIR
definitions. For a given input value 𝑣, the output of 𝑓 will be given by 𝑓|{}
𝑣 .
4 VALUES
In this section, we will define JSON values, errors, exceptions, and streams. Furthermore, we will
define several functions and operations on values.
A JSON value 𝑣 has the shape
𝑣 ≔ null ‖ false ‖ true ‖ 𝑛 ‖ 𝑠 ‖ [𝑣0 , …, 𝑣𝑛 ] ‖ {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 },
An error can be constructed from a value by the function error(𝑣). The error function is bijective;
that is, if we have an error 𝑒, then there is a unique value 𝑣 with 𝑒 = error(𝑣). In the remainder
of this text, we will write just “error” to denote calling error(𝑣) with some value 𝑣. This is done
such that this specification does not need to fix the precise error value that is returned when an
operation fails.
An exception either is an error or has the shape break($𝑥). The latter will become relevant
starting from Section 5.
A value result is either a value or an exception.
A stream (or lazy list) is written as ⟨𝑣0 , …, 𝑣𝑛 ⟩. The concatenation of two streams 𝑠1 ,
𝑠2 is written as 𝑠1 + 𝑠2 . Given some stream 𝑙 = ⟨𝑥0 , …, 𝑥𝑛 ⟩, we write ∑𝑥∈𝑙 𝑓(𝑥) to denote
𝑓(𝑥0 ) + … + 𝑓(𝑥𝑛 ). We use this frequently to map a function over a stream, by having 𝑓(𝑥) re-
turn a stream itself.
In this text, we will see many functions that take values as arguments. By convention, for any
of these functions 𝑓(𝑣1 , …, 𝑣𝑛 ), we extend their domain to value results such that 𝑓(𝑣1 , …, 𝑣𝑛 )
yields 𝑣𝑖 (or rather ⟨𝑣𝑖 ⟩ if 𝑓 returns streams) if 𝑣𝑖 is an exception and for all 𝑗 < 𝑖, 𝑣𝑗 is a value. For
example, in Section 4.3, we will define 𝑙 + 𝑟 for values 𝑙 and 𝑟, and by our convention, we extend
the domain of addition to value results such that if 𝑙 is an exception, then 𝑙 + 𝑟 returns just 𝑙, and
if 𝑙 is a value, but 𝑟 is an exception, then 𝑙 + 𝑟 returns just 𝑟.
4.1 CONSTRUCTION
In this subsection, we will introduce operators to construct arrays and objects.
The function [⋅] transforms a stream into an array if all stream elements are values, or into the
first exception in the stream otherwise:
𝑣𝑖 if 𝑣𝑖 is an exception and for all 𝑗 < 𝑖, 𝑣𝑗 is a value
[⟨𝑣0 , …, 𝑣𝑛 ⟩] ≔ {
[𝑣0 , …, 𝑣𝑛 ] otherwise
⎧0 if 𝑣 = null
{
{
{
{ |𝑛| if 𝑣 is a number 𝑛
{
{
{𝑛 if 𝑣 = 𝑐1 …𝑐𝑛
|𝑣| ≔ ⎨
{ 𝑛 if 𝑣 = [𝑣1 , …, 𝑣𝑛 ]
{
{
{ 𝑛 if 𝑣 = {𝑘1 ↦ 𝑣1 , …, 𝑘𝑛 ↦ 𝑣𝑛 }
{
{
{error otherwise (if 𝑣 ∈ {true, false})
⎩
The boolean value of a value 𝑣 is defined as follows:
false if 𝑣 = null or 𝑣 = false
bool(𝑣) ≔ {
true otherwise
We can draw a link between the functions here and jq: When called with the input value 𝑣, the jq
filter keys yields ⟨[keys(𝑣)]⟩, the jq filter length yields ⟨|𝑣|⟩, and the jq filter true and . yields
⟨bool(𝑣)⟩.
4.3.1 ADDITION
We define addition of two values 𝑙 and 𝑟 as follows:
⎧𝑣 if 𝑙 = null and 𝑟 = 𝑣, or 𝑙 = 𝑣 and 𝑟 = null
{
{𝑛 + 𝑛 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2
{ 1 2
{
{𝑐𝑙,1 …𝑐𝑙,𝑚 𝑐𝑟,1 …𝑐𝑟,𝑛 if 𝑙 = 𝑐𝑙,1 …𝑐𝑙,𝑚 and 𝑟 = 𝑐𝑟,1 …𝑐𝑟,𝑛
𝑙+𝑟 ≔⎨
{[⟨𝑙1 , …, 𝑙𝑚 , 𝑟1 , …, 𝑟𝑛 ⟩] if 𝑙 = [𝑙1 , …, 𝑙𝑚 ] and 𝑟 = [𝑟1 , …, 𝑟𝑛 ]
{
{𝑙 ∪ 𝑟 if 𝑙 = {…} and 𝑟 = {…}
{
{error otherwise
⎩
Here, we can see that null serves as a neutral element for addition. For strings and arrays, addition
corresponds to their concatenation, and for objects, it corresponds to their union.
4.3.2 MULTIPLICATION
Given two objects 𝑙 and 𝑟, we define their recursive merge 𝑙 ⋓ 𝑟 as:
⎧{𝑘 ↦ 𝑣 ⋓ 𝑣 } ∪ 𝑙′ ⋓ 𝑟′ if 𝑙 = {𝑘 ↦ 𝑣 } ∪ 𝑙′ , 𝑟 = {𝑘 ↦ 𝑣 } ∪ 𝑟′ , and 𝑣 , 𝑣 are objects
{ 𝑙 𝑟 𝑙 𝑟 𝑙 𝑟
{
{{𝑘 ↦ 𝑣𝑟 } ∪ 𝑙′ ⋓ 𝑟′ if 𝑙 = {𝑘 ↦ 𝑣𝑙 } ∪ 𝑙′ , 𝑟 = {𝑘 ↦ 𝑣𝑟 } ∪ 𝑟′ , and 𝑣𝑙 or 𝑣𝑟 is not an object
𝑙⋓𝑟 ≔⎨ ′
{{𝑘 ↦ 𝑣𝑟 } ∪ 𝑙 ⋓ 𝑟 if 𝑘 ∉ dom(𝑙) and 𝑟 = {𝑘 ↦ 𝑣𝑟 } ∪ 𝑟′
{
{𝑙 otherwise (if 𝑟 = {})
⎩
We use this in the following definition of multiplication of two values 𝑙 and 𝑟:
12 Färber
4.3.3 SUBTRACTION
We now define subtraction of two values 𝑙 and 𝑟:
⎧𝑛1 − 𝑛2 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2
{
{
𝑙 − 𝑟 ≔ ⎨[∑ ⟨𝑙 ⟩] if 𝑙 = [𝑙0 , …, 𝑙𝑛 ] and 𝑟 = [𝑟0 , …, 𝑟𝑛 ]
𝑖,𝑙𝑖 ∈{𝑟0 ,…,𝑟𝑛 } 𝑖
{
{error
⎩ otherwise
When both 𝑙 and 𝑟 are arrays, then 𝑙 − 𝑟 returns an array containing those values of 𝑙 that are not
contained in 𝑟.
4.3.4 DIVISION
We will now define a function that splits a string 𝑦 + 𝑥 by some non-empty separator string 𝑠.
The function preserves the invariant that 𝑦 does not contain 𝑠:
⎧split(𝑐1 …𝑐𝑛 , 𝑠, 𝑦 + 𝑐0 ) if 𝑥 = 𝑐0 …𝑐𝑛 and 𝑐0 …𝑐|𝑠| −1 ≠ 𝑠
{
{
split(𝑥, 𝑠, 𝑦) ≔ ⎨[⟨𝑦⟩] + split(𝑐|𝑠| …𝑐𝑛 , 𝑠, "") if 𝑥 = 𝑐0 …𝑐𝑛 and 𝑐0 …𝑐|𝑠| −1 = 𝑠
{
{[⟨𝑦⟩] otherwise (|𝑥| = 0)
⎩
We use this splitting function to define division of two values:
⎧𝑛 ÷ 𝑛 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2
{ 1 2
{
{[] if 𝑙 and 𝑟 are strings and |𝑙| = 0
{
𝑙 ÷ 𝑟 ≔ ⎨[∑ ⟨𝑐𝑖 ⟩] if 𝑙 = 𝑐0 …𝑐𝑛 , 𝑟 is a string, |𝑙| > 0, and |𝑟| = 0
𝑖
{
{split(𝑙, 𝑟, "") if 𝑙 and 𝑟 are strings, |𝑙| > 0, and |𝑟| > 0
{
{error otherwise
⎩
Example 4.3.4.1 : Let 𝑠 = "ab". We have that 𝑠 ÷ 𝑠 = ["", ""]. Furthermore, "𝑐" ÷ 𝑠 = ["𝑐"],
(𝑠 + "𝑐"+𝑠) ÷ 𝑠 = ["", "𝑐", ""] and (𝑠 + "𝑐"+𝑠 + "de") ÷ 𝑠 = ["", "𝑐", "de"].
From this example, we can infer the following lemma.
Lemma 4.3.4.1 : Let 𝑙 and 𝑟 strings with |𝑙| > 0 and |𝑟| > 0. Then 𝑙 ÷ 𝑟 = [𝑙0 , …, 𝑙𝑛 ] for some 𝑛 > 0
such that 𝑙 = (∑𝑛−1
𝑖=0 𝑖
(𝑙 + 𝑟)) + 𝑙𝑛 and for all 𝑖, 𝑙𝑖 is a string that does not contain 𝑟 as substring.
A formal specification of the jq language 13
4.3.5 REMAINDER
For two values 𝑙 and 𝑟, the arithmetic operation 𝑙 % 𝑟 (modulo) yields 𝑚 % 𝑛 if 𝑙 and 𝑟 are numbers
𝑚 and 𝑛, otherwise it yields an error.
4.4 ACCESSING
We will now define three access operators. These serve to extract values that are contained within
other values.
The value 𝑣[𝑖] of a value 𝑣 at index 𝑖 is defined as follows:
⎧𝑣 if 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑖 ∈ ℕ, and 𝑖 ≤ 𝑛
{ 𝑖
{
{null if 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑖 ∈ ℕ, and 𝑖 > 𝑛
{
{𝑣[𝑛 + 𝑖] if 𝑣 = [𝑣 , …, 𝑣 ], 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
0 𝑛
𝑣[𝑖] ≔ ⎨
𝑣
{ 𝑗 if 𝑣 = {𝑘 0 ↦ 𝑣 0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }, 𝑖 is a string, and 𝑘𝑗 = 𝑖
{
{null if 𝑣 = {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }, 𝑖 is a string, and 𝑖 ∉ {𝑘0 , …, 𝑘𝑛 }
{
{error otherwise
⎩
The idea behind this index operator is as follows: It returns null if the value 𝑣 does not contain a
value at index 𝑖, but 𝑣 could be extended to contain one. More formally, 𝑣[𝑖] is null if 𝑣 ≠ null and
there exists some value 𝑣′ = 𝑣 + 𝛿 such that 𝑣′ [𝑖] ≠ null.
The behaviour of this operator for 𝑖 < 0 is that 𝑣[𝑖] equals 𝑣[|𝑣| + 𝑖].
Example 4.4.1 : If 𝑣 = [0, 1, 2], then 𝑣[1] = 1 and 𝑣[−1] = 𝑣[3 − 1] = 2.
Using the index operator, we can define the values 𝑣[] in a value 𝑣 as follows:
𝑣[] ≔ ∑ ⟨𝑣[𝑖]⟩
𝑖∈keys(𝑣)
4.5 UPDATING
For each access operator in Section 4.4, we will now define an updating counterpart. Intuitively,
where an access operator yields some elements contained in a value 𝑣, its corresponding update
operator replaces these elements in 𝑣 by the output of a function. The access operators will be used
in Section 5, and the update operators will be used in Section 6.
All update operators take at least a value 𝑣 and a function 𝑓 from a value to a stream of value
results. We extend the domain of 𝑓 to value results such that 𝑓(𝑒) = ⟨𝑒⟩ if 𝑒 is an exception.
The first update operator will be a counterpart to 𝑣[]. For all elements 𝑥 that are yielded by 𝑣[],
𝑣[] ⊧ 𝑓 replaces 𝑥 by 𝑓(𝑥):
⎧[∑ 𝑓(𝑣 )] if 𝑣 = [𝑣0 , …, 𝑣𝑛 ]
{ 𝑖 𝑖
{
𝑣[] ⊧ 𝑓 ≔ ⎨⋃ {{𝑘𝑖 :ℎ} if 𝑓(𝑣𝑖 )=⟨ℎ⟩+𝑡 if 𝑣 = {𝑘 ↦ 𝑣 , …, 𝑘 ↦ 𝑣 }
0 0 𝑛 𝑛
{ 𝑖 {} otherwise
{error otherwise
⎩
For an input array 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑣[] ⊧ 𝑓 replaces each 𝑣𝑖 by the output of 𝑓(𝑣𝑖 ), yielding
[𝑓(𝑣0 ) + … + 𝑓(𝑣𝑛 )]. For an input object 𝑣 = {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }, 𝑣[] ⊧ 𝑓 replaces each 𝑣𝑖
by the first output yielded by 𝑓(𝑣𝑖 ) if such an output exists, otherwise it deletes {𝑘𝑖 ↦ 𝑣𝑖 } from
the object. Note that updating arrays diverges from jq, because jq only considers the first value
yielded by 𝑓.
For the next operators, we will use the following function head(𝑙, 𝑒), which returns the head of
a list 𝑙 if it is not empty, otherwise 𝑒:
ℎ if 𝑙 = ⟨ℎ⟩ + 𝑡
head(𝑙, 𝑒) ≔ {
𝑒 otherwise
The next function takes a value 𝑣 and replaces its 𝑖-th element by the first output of 𝑓, or deletes
it if 𝑓 yields no output:
⎧𝑣[0 : 𝑖] + [head(𝑓(𝑣[𝑖]), ⟨⟩)] + 𝑣[(𝑖 + 1) : 𝑛] if 𝑣 = [𝑣 , …, 𝑣 ], 𝑖 ∈ ℕ, and 𝑖 ≤ 𝑛
{ 0 𝑛
{
{𝑣[𝑛 + 𝑖] ⊧ 𝑓 if 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
{
𝑣[𝑖] ⊧ 𝑓 ≔ ⎨𝑣 + {𝑖 : ℎ} if 𝑣 = {…} and 𝑓(𝑣[𝑖]) = ⟨ℎ⟩ + 𝑡
{⋃ {𝑘 ↦ 𝑣[𝑘]} if 𝑣 = {…} and 𝑓(𝑣[𝑖]) = ⟨⟩
{ 𝑘∈ dom(𝑣)∖{𝑖}
{
{error otherwise
⎩
Note that this diverges from jq if 𝑣 = [𝑣0 , …, 𝑣𝑛 ] and 𝑖 > 𝑛, because jq fills up the array with null.
The final function here is the update counterpart of the operator 𝑣[𝑖 : 𝑗]. It replaces the slice
𝑣[𝑖 : 𝑗] by the first output of 𝑓 on 𝑣[𝑖 : 𝑗], or by the empty array if 𝑓 yields no output.
⎧𝑣[0 : 𝑖] + head(𝑓(𝑣[𝑖 : 𝑗]), []) + 𝑣[𝑗 : 𝑛] if 𝑣 = [𝑣 , …, 𝑣 ], 𝑖, 𝑗 ∈ ℕ, and 𝑖 ≤ 𝑗
{ 0 𝑛
{
{𝑣 if 𝑣 = [𝑣 , …, 𝑣 𝑛 ], 𝑖, 𝑗 ∈ ℕ, and 𝑖 > 𝑗
{ 0
𝑣[𝑖 : 𝑗] ⊧ 𝑓 ≔ ⎨𝑣[(𝑛 + 𝑖) : 𝑗] ⊧ 𝑓 if |𝑣| = 𝑛, 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
{
{𝑣[𝑖 : (𝑛 + 𝑗)] ⊧ 𝑓 if |𝑣| = 𝑛, 𝑗 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑗
{
{error otherwise
⎩
Unlike its corresponding access operator 𝑣[𝑖 : 𝑗], this operator unconditionally fails when 𝑣 is a
string. This operator diverges from jq if 𝑓 yields null, in which case jq returns an error, whereas
this operator treats this as equivalent to 𝑓 returning [].
A formal specification of the jq language 15
Example 4.5.1 : If 𝑣 = [0, 1, 2, 3] and 𝑓(𝑣) = [4, 5, 6], then 𝑣[1 : 3] ⊧ 𝑓 = [0, 4, 5, 6, 3].
4.6 ORDERING
In this subsection, we establish a total order on values.⁹
We have that
null < false < true < 𝑛 < 𝑠 < 𝑎 < 𝑜,
where 𝑛 is a number, 𝑠 is a string, 𝑎 is an array, and 𝑜 is an object. We assume that there is a total
order on numbers and characters. Strings and arrays are ordered lexicographically.
Two objects 𝑜1 and 𝑜2 are ordered as follows: For both objects 𝑜𝑖 (𝑖 ∈ {1, 2}), we sort the array
[keys(𝑜𝑖 )] by ascending order to obtain the ordered array of keys 𝑘𝑖 = [𝑘1 , …, 𝑘𝑛 ], from which we
obtain 𝑣𝑖 = [𝑜[𝑘1 ], …, 𝑜[𝑘𝑛 ]]. We then have
𝑘1 < 𝑘2 if 𝑘1 < 𝑘2 or 𝑘1 > 𝑘2
𝑜1 < 𝑜2 ⟺ {
𝑣1 < 𝑣2 otherwise (𝑘1 = 𝑘2 )
5 EVALUATION SEMANTICS
In this section, we will define a function 𝜑|𝑐𝑣 that returns the output of the filter 𝜑 in the context
𝑐 on the input value 𝑣.
Let us start with a few definitions. A context 𝑐 is a mapping from variables $𝑥 to values and from
identifiers 𝑥 to pairs (𝑓, 𝑐), where 𝑓 is a filter and 𝑐 is a context. Contexts store what variables
and filter arguments are bound to.
We are now going to introduce a few helper functions. The first function helps define filters
such as if-then-else and alternation (𝑓 ⫽ 𝑔):
𝑡 if 𝑣 = 𝑖
ite(𝑣, 𝑖, 𝑡, 𝑒) = {
𝑒 otherwise
Next, we define a function that is used to define alternation. trues(𝑙) returns those elements of 𝑙
whose boolean values are not false. Note that in our context, “not false” is not the same as “true”,
because the former includes exceptions, whereas the latter excludes them, and bool(𝑥) can return
exceptions, in particular if 𝑥 is an exception.
trues(𝑙) ≔ ∑ ⟨𝑥⟩
𝑥∈𝑙, bool(𝑥)≠ false
The evaluation semantics are given in Table 5. Let us discuss its different cases:
• “.”: Returns its input value. This is the identity filter.
• 𝑛 or 𝑠: Returns the value corresponding to the number 𝑛 or string 𝑠.
• $𝑥: Returns the value currently bound to the variable $𝑥, by looking it up in the context. Well-
formedness of the filter (as defined in Section 3.1) ensures that such a value always exists.
• [𝑓]: Creates an array from the output of 𝑓, using the operator defined in Section 4.1.
• {}: Creates an empty object.
• {$𝑥 : $𝑦}: Creates an object from the values bound to $𝑥 and $𝑦, using the operator defined in
Section 4.1.
• 𝑓, 𝑔: Concatenates the outputs of 𝑓 and 𝑔.
• 𝑓 | 𝑔: Composes 𝑓 and 𝑔, returning the outputs of 𝑔 applied to all outputs of 𝑓.
⁹Note that jq does not implement a strict total order on values; in particular, its order on (floating-
point) numbers specifies nan < nan, from which follows that nan ≠ nan and nan ≯ nan.
16 Färber
𝜑 𝜑|𝑐𝑣
. ⟨𝑣⟩
𝑛 or 𝑠 ⟨𝜑⟩
$𝑥 ⟨𝑐($𝑥)⟩
[𝑓] ⟨[𝑓|𝑐𝑣 ]⟩
{} ⟨{}⟩
{$𝑥 : $𝑦} ⟨{𝑐($𝑥) : 𝑐($𝑦)}⟩
𝑓, 𝑔 𝑓|𝑐𝑣 + 𝑔|𝑐𝑣
𝑓 |𝑔 ∑𝑥∈𝑓|𝑐 𝑔|𝑐𝑥
𝑣
$𝑥 ⚬ $𝑦 ⟨𝑐($𝑥) ⚬ 𝑐($𝑦)⟩
𝑔|𝑐𝑒 if 𝑥= error(𝑒)
try 𝑓 catch 𝑔 ∑𝑥∈𝑓|𝑐 {
𝑣 ⟨𝑥⟩ otherwise
• 𝑓 ⫽ 𝑔: Returns 𝑙 if 𝑙 is not empty, else the outputs of 𝑔, where 𝑙 are the outputs of 𝑓 whose
boolean values are not false.
• 𝑓 as $𝑥 | 𝑔: For every output of 𝑓, binds it to the variable $𝑥 and returns the output of 𝑔, where
𝑔 may reference $𝑥. Unlike 𝑓 | 𝑔, this runs 𝑔 with the original input value instead of an output
of 𝑓. We can show that the evaluation of 𝑓 | 𝑔 is equivalent to that of 𝑓 as $𝑥′ | $𝑥′ | 𝑔, where
$𝑥′ is a fresh variable. Therefore, we could be tempted to lower 𝑓 | 𝑔 to ⌊𝑓⌋ as $𝑥′ | $𝑥′ | ⌊𝑔⌋
in Table 2. However, we cannot do this because we will see in Section 6 that this equivalence
does not hold for updates; that is, (𝑓 | 𝑔) ⊧ 𝜎 is not equal to (𝑓 as $𝑥′ | $𝑥′ | 𝑔) ⊧ 𝜎.
• $𝑥 ⚬ $𝑦: Returns the output of a Cartesian operation “⚬” (any of ≟, ≠, <, ≤, >, ≥, +, −, ×, ÷,
and %, as given in Table 1) on the values bound to $𝑥 and $𝑦. The semantics of the arithmetic
operators are given in Section 4.3, the comparison operators are defined by the ordering given
in Section 4.6, 𝑙 ≟ 𝑟 returns whether 𝑙 equals 𝑟, and 𝑙 ≠ 𝑟 returns its negation.
• try 𝑓 catch 𝑔: Replaces all outputs of 𝑓 that equal error(𝑒) for some 𝑒 by the output of 𝑔
on the input 𝑒. Note that this diverges from jq, which aborts the evaluation of 𝑓 after the
first error. This behaviour can be simulated in our semantics, by replacing try 𝑓 catch 𝑔 with
label $𝑥′ | try 𝑓 catch (𝑔, break $𝑥′ ).
• label $𝑥 | 𝑓: Returns all values yielded by 𝑓 until 𝑓 yields an exception break($𝑥). This uses the
function label(𝑙, $𝑥), which returns all elements of 𝑙 until the current element is an exception
of the form break($𝑥):
⟨ℎ⟩ + label(𝑡, $𝑥) if 𝑙 = ⟨ℎ⟩ + 𝑡 and ℎ ≠ break($𝑥)
label(𝑙, $𝑥) ≔ {
⟨⟩ otherwise
• break $𝑥: Returns a value break($𝑥). Similarly to the evaluation of variables $𝑥 described
above, wellformedness of the filter (as defined in Section 3.1) ensures that the returned value
break($𝑥) will be eventually handled by a corresponding filter label $𝑥 | 𝑓. That means that
the evaluation of a wellformed filter can only yield values and errors, but never break($𝑥).
• $𝑥 and 𝑓: Returns false if $𝑥 is bound to either null or false, else returns the output of 𝑓 mapped
to boolean values. This uses the function junction(𝑥, 𝑣, 𝑙), which returns just 𝑣 if the boolean
value of 𝑥 is 𝑣 (where 𝑣 will be true or false), otherwise the boolean values of the values in 𝑙.
Here, bool(𝑣) returns the boolean value as given in Section 4.2.
junction(𝑥, 𝑣, 𝑙) ≔ ite(bool(𝑥), 𝑣, ⟨𝑣⟩, ∑⟨bool(𝑦)⟩)
𝑦∈𝑙
• $𝑥 or 𝑓: Similar to its “and” counterpart above.
• if $𝑥 then 𝑓 else 𝑔: Returns the output of 𝑓 if $𝑥 is bound to either null or false, else returns
the output of 𝑔.
• .[], .[$𝑥], or .[$𝑥 : $𝑦]: Accesses parts of the input value; see Section 4.4 for the definitions of the
operators.
• 𝜙 𝑥 as $𝑥(.; 𝑓): Folds 𝑓 over the values returned by 𝑥, starting with the current input as accu-
mulator. The current accumulator value is provided to 𝑓 as input value and 𝑓 can access the
current value of 𝑥 by $𝑥. If 𝜙 = reduce, this returns only the final values of the accumulator,
whereas if 𝜙 = foreach, this returns also the intermediate values of the accumulator. We will
define the functions reduce𝑐𝑣 (𝑙, $𝑥, 𝑓) and foreach𝑐𝑣 (𝑙, $𝑥, 𝑓) in Section 5.1.
• 𝑥(𝑓1 ; …; 𝑓𝑛 ): Calls an 𝑛-ary filter 𝑥 that is defined by 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑓. The output is that of
the filter 𝑓, where each filter argument 𝑥𝑖 is bound to (𝑓𝑖 , 𝑐). This also handles the case of calling
nullary filters such as empty.
• 𝑥: Calls a filter argument. By the well-formedness requirements given in Section 3.1, this must
occur within the right-hand side of a definition whose arguments include 𝑥. This requirement
18 Färber
also ensures that 𝑥 ∈ dom(𝑐), because an 𝑥 can only be evaluated as part of a call to the filter
where it was bound, and by the semantics of filter calls above, this adds a binding for 𝑥 to the
context.
• 𝑓 ⊧ 𝑔: Updates the input at positions returned by 𝑓 by 𝑔. We will discuss this in Section 6.
An implementation may also define custom semantics for named filters. For example, an imple-
mentation may define error|𝑐𝑣 ≔ error(𝑣), keys|𝑐𝑣 ≔ keys(𝑣), and length|𝑐𝑣 ≔ |𝑣|, see Section 4.2.
In the case of keys, for example, there is no obvious way to implement it by definition, in par-
ticular because there is no simple way to obtain the domain of an object {…} using only the
filters for which we gave semantics in Table 5. For length, we could give a definition, using
reduce .[] as $𝑥(0; . + 1) to obtain the length of arrays and objects, but this would inherently
require linear time to yield a result, instead of constant time that can be achieved by a proper jq
implementation.
5.1 FOLDING
In this subsection, we will define the functions 𝜙𝑐𝑣 (𝑙, $𝑥, 𝑓) (where 𝜙 is either foreach or reduce),
which underlie the semantics for the folding operators 𝜙 𝑥 as $𝑥(.; 𝑓).
Let us start by defining a general folding function fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜): It takes a stream of value
results 𝑙, a variable $𝑥, a filter 𝑓, and a function 𝑜(𝑥) from a value 𝑥 to a stream of values. This
function folds over the elements in 𝑙, starting from the accumulator value 𝑣. It yields the next
accumulator value(s) by evaluating 𝑓 with the current accumulator value as input and with the
variable $𝑥 bound to the first element in 𝑙. If 𝑙 is empty, then 𝑣 is called a final accumulator value
and is returned, otherwise 𝑣 is called an intermediate accumulator value and 𝑜(𝑣) is returned.
⎧
{𝑜(𝑣) + ∑𝑥∈𝑓|𝑐{$𝑥↦ℎ} fold𝑐𝑥 (𝑡, $𝑥, 𝑓, 𝑜) if 𝑙 = ⟨ℎ⟩ + 𝑡
fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜) ≔ ⎨ 𝑣
{
⎩⟨𝑣⟩ otherwise (𝑙 = ⟨⟩)
We use two different functions for 𝑜(𝑣); the first returns nothing, corresponding to reduce which
does not return intermediate values, and the other returns just 𝑣, corresponding to foreach which
returns intermediate values. Instantiating fold with these two functions, we obtain the following:
reduce𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜) where 𝑜(𝑣) = ⟨ ⟩
for𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜) where 𝑜(𝑣) = ⟨𝑣⟩
Here, reduce𝑐𝑣 (𝑙, $𝑥, 𝑓) is the function that is used in Table 5. However, for𝑐𝑣 (𝑙, $𝑥, 𝑓) does not im-
plement the semantics of foreach, because it yields the initial accumulator value, whereas foreach
omits it.
Example 5.1.1 : If we would set foreach𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ for𝑐𝑣 (𝑙, $𝑥, 𝑓), then evaluating
foreach (1, 2, 3) as $𝑥(0; . + $𝑥) would yield ⟨0, 1, 3, 6⟩, but jq evaluates it to ⟨1, 3, 6⟩.
For that reason, we define foreach in terms of for, but with a special treatment for the initial ac-
cumulator:
⎧
{∑
𝑐
𝑐{$𝑥↦ℎ} for𝑥 (𝑡, $𝑥, 𝑓) if 𝑙 = ⟨ℎ⟩ + 𝑡
foreach𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ ⎨ 𝑥∈𝑓|𝑣
{
⎩⟨⟩ otherwise
We will now look at what the evaluation of the various folding filters expands to. Apart from
reduce and foreach, we will also consider a hypothetical filter for 𝑥 as $𝑥(.; 𝑓) that is defined by
the function for𝑐𝑣 (𝑙, $𝑥, 𝑓), analogously to the other folding filters.
Assuming that the filter 𝑥 evaluates to ⟨𝑥0 , …, 𝑥𝑛 ⟩, then reduce and for expand to
A formal specification of the jq language 19
We can see that the special treatment of the initial accumulator value also shows up in the expan-
sion of foreach. In contrast, the hypothetical for filter looks more symmetrical to reduce.
Note that jq implements only a restricted version of these folding operators that discards all
output values of 𝑓 after the first output. That means that in jq, 𝜙 𝑥 as $𝑥(.; 𝑓) is equivalent to
𝜙 𝑥 as $𝑥(.; first(𝑓)). Here, we assume the definition first(𝑓) ≔ label $𝑥 | 𝑓 | (., break $𝑥). This
returns the first output of 𝑓 if 𝑓 yields any output, else nothing.
6 UPDATE SEMANTICS
In this section, we will discuss how to evaluate updates 𝑓 ⊧ 𝑔. First, we will show how the original
jq implementation executes such updates, and show which problems this approach entails. Then,
we will give alternative semantics for updates that avoids these problems, while enabling faster
performance by forgoing the construction of temporary path data.
We can also have surprising behaviour that does not manifest any error.
Example 6.1.3 : Consider the same input value and filter as in Example 6.1.2, but now with 𝑔 set to
{"𝑐" : 2}. The output of the first step .["𝑎"] ⊧ 𝑔 is {"𝑎" ↦ {"𝑐" ↦ 2}}. This value is the input to the
second step .["𝑎"]["𝑏"] ⊧ 𝑔, which yields {"𝑎" ↦ {"𝑐" ↦ 2, "𝑏" ↦ {"𝑐" ↦ 2}}}. Here, the remain-
ing path (.["𝑎"]["𝑏"]) pointed to data that was removed by the update on the first path, so this data
gets reintroduced by the update. On the other hand, the data introduced by the first update step
(at the path .["𝑎"]["𝑐"]) is not part of the original path, so it is not updated.
We found that we can interpret many update filters by simpler filters, yielding the same output
as jq in most common cases, but avoiding the problems shown above. To see this, let us see what
would happen if we would interpret (𝑓1 , 𝑓2 ) ⊧ 𝑔 as (𝑓1 ⊧ 𝑔) | (𝑓2 ⊧ 𝑔). That way, the paths of 𝑓2
would point precisely to the data returned by 𝑓1 ⊧ 𝑔, thus avoiding the problems depicted by the
examples above. In particular, with such an approach, Example 6.1.2 would yield {"𝑎" ↦ []} in-
stead of an error, and Example 6.1.3 would yield {"𝑎" ↦ {"𝑐" ↦ {"𝑐" ↦ 2}}}.
In the remainder of this section, we will show semantics that extend this idea to all update op-
erations. The resulting update semantics can be understood to interleave calls to 𝑓 and 𝑔. By doing
so, these semantics can abandon the construction of paths altogether, which results in higher per-
formance when evaluating updates.
𝜇 𝜇⊧𝜎
empty() .
. 𝜎
𝑓 |𝑔 𝑓 ⊧ (𝑔 ⊧ 𝜎)
𝑓, 𝑔 (𝑓 ⊧ 𝜎) | (𝑔 ⊧ 𝜎)
if $𝑥 then 𝑓 else 𝑔 if $𝑥 then 𝑓 ⊧ 𝜎 else 𝑔 ⊧ 𝜎
𝑓 ⫽𝑔 if first(𝑓 ⫽ null) then 𝑓 ⊧ 𝜎 else 𝑔 ⊧ 𝜎
Table 6: Update semantics properties.
A formal specification of the jq language 21
the actual update semantics of 𝜇 ⊧ 𝜎 in Section 6.4 by defining (𝜇 ⊧ 𝜎)|𝑐𝑣 , not by translating 𝜇 ⊧ 𝜎
to equivalent filters.
and only catch unpolarised errors. That way, errors stemming from 𝜇 are propagated, whereas
errors stemming from 𝑓 are caught.
We use a function instead of a filter on the right-hand side to limit the scope of variable bindings
as explained in Section 6.3.1, and we use polarise to restrict the scope of caught exceptions, as
discussed in Section 6.3.2. Note that we depolarise the final outputs of 𝑓 ⊧ 𝑔 in order to prevent
leaking polarisation information outside the update.
Table 7 shows the definition of (𝜇 ⊧ 𝜎)|𝑐𝑣 . Several of the cases for 𝜇, like “.”, “𝑓 | 𝑔”, “𝑓, 𝑔”,
and “if $𝑥 then 𝑓 else 𝑔” are simply relatively straightforward consequences of the properties in
Table 6. We discuss the remaining cases for 𝜇:
• 𝑓 ⫽ 𝑔: Updates using 𝑓 if 𝑓 yields some non-false value, else updates using 𝑔. Here, 𝑓 is called
as a “probe” first. If it yields at least one output that is considered “true” (see Section 5 for the
definition of trues), then we update at 𝑓, else at 𝑔. This filter is unusual because is the only kind
where a subexpression is both updated with ((𝑓 ⊧ 𝜎)|𝑐𝑣 ) and evaluated (𝑓|𝑐𝑣 ).
• .[], .[$𝑥], .[$𝑥 : $𝑦]: Applies 𝜎 to the current value using the operators defined in Section 4.5.
𝜇 (𝜇 ⊧ 𝜎)|𝑐𝑣
. 𝜎(𝑣)
′
𝑓 |𝑔 (𝑓 ⊧ 𝜎 )|𝑐𝑣 where 𝜎′ (𝑥) = (𝑔 ⊧ 𝜎)|𝑐𝑥
𝑓, 𝑔 ∑𝑥∈(𝑓⊧𝜎)|𝑐 (𝑔 ⊧ 𝜎)|𝑐𝑥
𝑣
break $𝑥 ⟨break($𝑥)⟩
𝜙 𝑥 as $𝑥(.; 𝑓) 𝜙𝑐𝑣 (𝑥|𝑐𝑣 , $𝑥, 𝑓, 𝜎)
𝑐∪ ⋃𝑖 {𝑥𝑖 ↦(𝑓𝑖 ,𝑐)}
𝑥(𝑓1 ; …; 𝑓𝑛 ) (𝑓 ⊧ 𝜎)|𝑣 if 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑓
′
𝑥 (𝑓 ⊧ 𝜎)|𝑐𝑣 if 𝑐(𝑥) = (𝑓, 𝑐′ )
Table 7: Update semantics. Here, 𝜇 is a filter and 𝜎(𝑣) is a function from a value 𝑣 to a stream of
value results.
A formal specification of the jq language 23
• 𝑓 as $𝑥 | 𝑔: Folds over all outputs of 𝑓, using the input value 𝑣 as initial accumulator and up-
dating the accumulator by 𝑔 ⊧ 𝜎, where $𝑥 is bound to the current output of 𝑓. The definition
of reduce is given in Section 5.1.
• try 𝑓 catch 𝑔: Returns the output of 𝑓 ⊧ 𝜎, mapping errors occurring in 𝑓 to 𝑔. The definition
of the function catch is
⎧∑ 𝑐 ⟨error(𝑦)⟩ if 𝑥 = error(𝑒), 𝑥 is unpolarised, and 𝑔|𝑐𝑥 ≠ ⟨⟩
{
{ 𝑦∈𝑔|𝑒
catch(𝑥, 𝑔, 𝑐, 𝑣) ≔ ⎨⟨𝑣⟩ if 𝑥 = error(𝑒), 𝑥 is unpolarised, and 𝑔|𝑐𝑥 = ⟨⟩
{
{⟨𝑥⟩
⎩ otherwise
The function catch(𝑥, 𝑔, 𝑐, 𝑣) analyses 𝑥 (the current output of 𝑓): If 𝑥 is no unpolarised error, 𝑥
is returned. For example, that is the case if the original right-hand side of the update returns an
error, in which case we do not want this error to be caught here. However, if 𝑥 is an unpolarised
error, that is, an error that was caused on the left-hand side of the update, it has to be caught
here. In that case, catch analyses the output of 𝑔 with input 𝑥: If 𝑔 yields no output, then it
returns the original input value 𝑣, and if 𝑔 yields output, all its output is mapped to errors! This
behaviour might seem peculiar, but it makes sense when we consider the jq way of implement-
ing updates via paths: When evaluating some update 𝜇 ⊧ 𝜎 with an input value 𝑣, the filter 𝜇
may only return paths to data contained within 𝑣. When 𝜇 is try 𝑓 catch 𝑔, the filter 𝑔 only
receives inputs that stem from errors, and because 𝑣 cannot contain errors, these inputs cannot
be contained in 𝑣. Consequentially, 𝑔 can never return any path pointing to 𝑣. The only way,
therefore, to get out alive from a catch is for 𝑔 to return … nothing.
• break($𝑥): Breaks out from the update.¹⁰
• 𝜙 𝑥 as $𝑥(.; 𝑓): Folds 𝑓 over the values returned by $𝑥. We will discuss this in Section 6.5.
• 𝑥(𝑓1 ; …; 𝑓𝑛 ), 𝑥: Calls filters. This is defined analogously to Table 5.
There are many filters 𝜇 for which (𝜇 ⊧ 𝜎)|𝑐𝑣 is not defined, for example $𝑥, [𝑓], and {}. In such
cases, we assume that (𝜇 ⊧ 𝜎)|𝑐𝑣 returns an error just like jq, because these filters do not return
paths to their input data. Our semantics support all kinds of filters 𝜇 that are supported by jq,
except for label $𝑥 | 𝑔.
Example 6.4.1 (The Curious Case of Alternation): The semantics of (𝑓 ⫽ 𝑔) ⊧ 𝜎 can be rather sur-
prising: For the input {"𝑎" ↦ true}, the filter (.["𝑎"] ⫽ .["𝑏"]) ⊧ 1 yields {"𝑎" ↦ 1}. This is what
we might expect, because the input has an entry for "𝑎". Now let us evaluate the same filter on
the input {"𝑎" ↦ false}, which yields {"𝑎" ↦ false, "𝑏" ↦ 1}. Here, while the input still has an
entry for "𝑎" like above, its boolean value is not true, so .["𝑏"] ⊧ 1 is executed. In the same spirit,
for the input {} the filter yields {"𝑏" ↦ 1}, because .["𝑎"] yields null for the input, which also has
the boolean value false, therefore .["𝑏"] ⊧ 1 is executed.
For the input {}, the filter (false ⫽ .["𝑏"]) ⊧ 1 yields {"𝑏" ↦ 1}. This is remarkable insofar as
false is not a valid path expression because it returns a value that does not refer to any part of
the original input, yet the filter does not return an error. This is because false triggers .["𝑏"] ⊧ 1,
¹⁰Note that unlike in Section 5, we do not define the update semantics of label $𝑥 | 𝑓, which could
be used to resume an update after a break. The reason for this is that this requires an additional type of
break exceptions that carries the current value alongside the variable, as well as variants of the value
update operators in Section 4.5 that can handle unpolarised breaks. Because making update operators
handle unpolarised breaks renders them considerably more complex and we estimate that label
expressions are rarely used in the left-hand side of updates anyway, we think it more beneficial for the
presentation to forgo label expressions here.
24 Färber
so false is never used as path expression. However, running the filter (true ⫽ .["𝑏"]) ⊧ 1 does yield
an error, because true triggers true ⊧ 1, and true is not a valid path expression.
Finally, on the input [], the filter (.[] ⫽ error) ⊧ 1 yields error([]). That is because .[] does not
yield any value for the input, so error ⊧ 1 is executed, which yields an error.
6.5 FOLDING
In Section 5.1, we have seen how to evaluate folding filters of the shape 𝜙 𝑥 as $𝑥(.; 𝑓), where 𝜙
is either reduce or foreach. Here, we will define update semantics for these filters. These update
operations are not supported in jq 1.7; however, we will show that they arise quite naturally from
previous definitions.
Let us start with an example to understand folding on the left-hand side of an update.
Example 6.5.1 : Let 𝑣 = [[[2], 1], 0] be our input value and 𝜇 be the filter 𝜙(0, 0) as $𝑥(.; .[$𝑥]). The
regular evaluation of 𝜇 with the input value as described in Section 5 yields
⎧⟨ [2]⟩ if 𝜙 = reduce
{
{}
𝜇|𝑣 = ⎨⟨𝑣, [[2], 1], [2]⟩ if 𝜙 = for
{⟨ [[2], 1], [2]⟩ if 𝜙 = foreach
⎩
When 𝜙 = for, the paths corresponding to the output are ., .[0], and .[0][0], and when 𝜙 = reduce,
the paths are just .[0][0]. Given that all outputs have corresponding paths, we can update over
them. For example, taking . + [3] as filter 𝜎, we should obtain the output
⎧⟨[[[2, 3], 1 ], 0 ]⟩ if 𝜙 = reduce
{}
{
(𝜇 ⊧ 𝜎)𝑣 = ⎨⟨[[[2, 3], 1, 3], 0, 3]⟩ if 𝜙 = for
{⟨[[[2, 3], 1, 3], 0 ]⟩ if 𝜙 = foreach
⎩
First, note that for folding filters, the lowering in Table 2 and the defining equations in Section 5.1
only make use of filters for which we have already introduced update semantics in Table 7. This
should not be taken for granted; for example, we originally lowered 𝜙 𝑓𝑥 as $𝑥(𝑓𝑦 ; 𝑓) to
⌊𝑓𝑦 ⌋ as $𝑦 | 𝜙⌊𝑓𝑥 ⌋ as $𝑥($𝑦; ⌊𝑓⌋)
While both lowerings produce the same output for regular evaluation, we cannot use the original
lowering for updates, because the defining equations for 𝜙 𝑥 as $𝑥($𝑦; 𝑓) would have the shape
$𝑦 | …, which is undefined on the left-hand side of an update. However, the lowering in Table 2
avoids this issue by not binding the output of 𝑓𝑦 to a variable, so it can be used on the left-hand
side of updates.
To obtain an intuition about how the update evaluation of a fold looks like, we can take
𝜙 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎, substitute the left-hand side by the defining equations in Section 5.1 and ex-
pand everything using the properties in Section 6.2. This yields
reduce 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎 = ((𝑥0 as $𝑥 | 𝑓) for 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎 = 𝜎 | ((𝑥0 as $𝑥 | 𝑓)
⊧… ⊧…
⊧ ((𝑥𝑛 as $𝑥 | 𝑓) ⊧ 𝜎 | ((𝑥𝑛 as $𝑥 | 𝑓)
⊧ 𝜎)…) ⊧ 𝜎)…)
Example 6.5.2 : To see the effect of above equations, let us reconsider the input value and the filters
from Example 6.5.1. Using some liberty to write .[0] instead of 0 as $𝑥 | .[$𝑥], we have:
⎧ .[0] ⊧ .[0] ⊧ 𝜎 if 𝜙 = reduce
{
𝜇 ⊧ 𝜎 = ⎨𝜎 | (.[0] ⊧ 𝜎 | (.[0] ⊧ 𝜎)) if 𝜙 = for
{ .[0] ⊧ 𝜎 | (.[0] ⊧ 𝜎) if 𝜙 = foreach
⎩
We will now formally define the functions 𝜙𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎) used in Table 7. For this, we first intro-
duce a function fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎, 𝑜), which resembles its corresponding function in Section 5.1,
but which adds an argument for the update filter 𝜎:
⎧
{∑ (𝑓 ⊧ 𝜎′ )|𝑐{$𝑥↦ℎ} if 𝑙 = ⟨ℎ⟩ + 𝑡 and 𝜎′ (𝑥) = fold𝑐𝑥 (𝑡, $𝑥, 𝑓, 𝜎, 𝑜)
𝑦
fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎, 𝑜) ≔ ⎨ 𝑦∈𝑜(𝑣)
{
⎩𝜎(𝑣) otherwise (𝑙 = ⟨⟩)
as well as
(𝑓 ⊧ 𝜎′ )|𝑐{$𝑥↦ℎ}
𝑣 if 𝑙 = ⟨ℎ⟩ + 𝑡 and 𝜎′ (𝑥) = for𝑐𝑥 (𝑡, $𝑥, 𝑓, 𝜎)
foreach𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎) ≔ {
⟨𝑣⟩ otherwise
Proof: The lowering in Table 2 yields ⌊𝑓 ⚬ 𝑔⌋|𝑐𝑣 = (⌊𝑓⌋ as $𝑥′ | ⌊𝑔⌋ as $𝑦′ | $𝑥′ ⚬ $𝑦′ )|𝑐𝑣 .
Using the evaluation semantics in Table 5, we can further expand this to
′
↦𝑥,$𝑦′ ↦𝑦}
∑𝑥∈⌊𝑓⌋|𝑐 ∑𝑦∈⌊𝑔⌋𝑐{$𝑥′↦𝑥} ($𝑥′ ⚬ $𝑦′ )|𝑐{$𝑥
𝑣 . Because $𝑥′ and $𝑦′ are fresh variables, we
𝑣 𝑣 ′
know that they cannot occur in ⌊𝑔⌋, so ⌊𝑔⌋𝑐{$𝑥 ↦𝑥}
𝑣
= ⌊𝑔⌋𝑐𝑣 . Furthermore, by the evaluation se-
′ ′
mantics, we have ($𝑥′ ⚬ $𝑦′ )|𝑐{$𝑥
𝑣
↦𝑥,$𝑦 ↦𝑦}
= ⟨𝑥 ⚬ 𝑦⟩. From these two observations, the conclu-
sion immediately follows. □
Lemma 7.2 : For any HIR filters 𝑓 and 𝑔, we have ⌊{𝑓 : 𝑔}⌋|𝑐𝑣 = ∑𝑥∈⌊𝑓⌋|𝑐 ∑𝑦∈⌊𝑔⌋|𝑐 ⟨{𝑥 : 𝑦}⟩.
𝑣 𝑣
26 Färber
Proof: We will prove by induction on 𝑛. The base case 𝑛 = 1 directly follows from Lemma 7.2. For
the induction step, we have to show that ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 is equivalent to
𝑛+1
∑ ∑ … ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 }⟩.
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛+1 ∈⌊𝑘𝑛+1 ⌋|𝑐𝑣 𝑣𝑛+1 ∈⌊𝑣𝑛+1 ⌋|𝑐𝑣 𝑖
We start by
(lowering)
⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 =
= ⌊∑{𝑘𝑖 : 𝑣𝑖 }⌋|𝑐𝑣 =
𝑖
𝑛
(Lemma 7.1)
= ⌊∑{𝑘𝑖 : 𝑣𝑖 } + {𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 =
𝑖=1
= ∑ ∑ ⟨𝑥 + 𝑦⟩.
𝑥∈⌊∑𝑛
𝑖=1
{𝑘𝑖 :𝑣𝑖 }⌋|𝑐𝑣 𝑦∈⌊{𝑘𝑛+1 :𝑣𝑛+1 }⌋|𝑐𝑣
Here, we observe that ⌊∑𝑛𝑖=1 {𝑘𝑖 : 𝑣𝑖 }⌋|𝑐𝑣 = ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛 : 𝑣𝑛 }⌋|𝑐𝑣 , which by the induction hy-
pothesis equals
𝑛
∑ ∑ … ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 }⟩.
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛 ∈⌊𝑘𝑛 ⌋|𝑐𝑣 𝑣𝑛 ∈⌊𝑣𝑛 ⌋|𝑐𝑣 𝑖
We can use this to resume the simplification of ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 to
𝑛
∑ ∑ … ∑ ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 } + 𝑦⟩
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛 ∈⌊𝑘𝑛 ⌋|𝑐𝑣 𝑣𝑛 ∈⌊𝑣𝑛 ⌋|𝑐𝑣 𝑦∈⌊{𝑘𝑛+1 :𝑣𝑛+1 }⌋|𝑐𝑣 𝑖
Finally, applying Lemma 7.2 to ⌊{𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 proves the induction step. □
We can use this theorem to simplify the evaluation of filters such as the following one.
Example 7.1: The evaluation of {"𝑎" : (1, 2), ("𝑏", "𝑐") : 3, "𝑑" : 4} yields ⟨𝑣0 , 𝑣1 , 𝑣2 , 𝑣3 ⟩, where
𝑣0 = {"𝑎" ↦ 1, "𝑏" ↦ 3, "𝑑" ↦ 4},
𝑣1 = {"𝑎" ↦ 1, "𝑐" ↦ 3, "𝑑" ↦ 4},
𝑣2 = {"𝑎" ↦ 2, "𝑏" ↦ 3, "𝑑" ↦ 4},
𝑣3 = {"𝑎" ↦ 2, "𝑐" ↦ 3, "𝑑" ↦ 4}.
8 CONCLUSION
We have shown formal syntax and semantics of a large subset of the jq programming language.
On the syntax side, we first defined formal syntax (HIR) that closely corresponds to actual jq
syntax. We then gave a lowering that reduces HIR to a simpler subset (MIR), in order to simplify
A formal specification of the jq language 27
the semantics later. We finally showed how a subset of actual jq syntax can be translated into HIR
and thus MIR.
On the semantics side, we gave formal semantics based on MIR. First, we defined values and
basic operations on them. Then, we used this to define the semantics of jq programs, by specifying
the outcome of the execution of a jq program. A large part of this was dedicated to the evaluation
of updates: In particular, we showed a new approach to evaluate updates. This approach, unlike
the approach implemented in jq, does not depend on separating path building and updating, but
interweaves them. This allows update operations to cleanly handle multiple output values in cases
where this was not possible before. Furthermore, in practice, this avoids creating temporary data
to store paths, thus improving performance. This approach is also mostly compatible with the
original jq behaviour, yet it is unavoidable that it diverges in some corner cases.
We hope that our work is useful in several ways: For users of the jq programming language, it
provides a succinct reference that precisely documents the language. Our work should also ben-
efit implementers of tools that process jq programs, such as compilers, interpreters, or linters. In
particular, this specification should be sufficient to implement the core of a jq compiler or inter-
preter. Finally, our work enables equational reasoning about jq programs. This makes it possible to
prove correctness of jq programs or to implement provably correct optimisations in jq compilers/
interpreters.
BIBLIOGRAPHY
[1] D. M. Ritchie, “The UNIX system: The evolution of the UNIX time-sharing system”, AT&T Bell
Lab. Tech. J., vol. 63, no. 8, pp. 1577–1593, 1984, doi: 10.1002/j.1538-7305.1984.tb00054.x.
[2] T. Bray, “The JavaScript Object Notation (JSON) Data Interchange Format”. Accessed: Feb. 22,
2023. [Online]. Available: https://www.rfc-editor.org/info/rfc8259
[3] Paris Data, “Dénominations des emprises des voies actuelles”. Accessed: Feb. 22,
2023. [Online]. Available: https://opendata.paris.fr/explore/dataset/denominations-emprises-
voies-actuelles/
[4] N. Williams and jqlang contributors, “jq language description”. Accessed: Feb. 20, 2023. [On-
line]. Available: https://github.com/jqlang/jq/wiki/jq-Language-Description
[5] S. Dolan and jqlang contributors, “jq 1.7 manual”. Accessed: Feb. 20, 2023. [Online]. Available:
https://jqlang.github.io/jq/manual/v1.7/
[6] J. N. Foster, A. Pilkiewicz, and B. C. Pierce, “Quotient lenses”, in Proceeding of the 13th
ACM SIGPLAN international conference on Functional programming, ICFP 2008, Victoria, BC,
Canada, September 20-28, 2008, J. Hook and P. Thiemann, Eds., ACM, 2008, pp. 383–396. doi:
10.1145/1411204.1411257.
[7] J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, and A. Schmitt, “Combinators for bi-di-
rectional tree transformations: a linguistic approach to the view update problem”, in Proceed-
ings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL 2005, Long Beach, California, USA, January 12-14, 2005, J. Palsberg and M. Abadi, Eds.,
ACM, 2005, pp. 233–246. doi: 10.1145/1040305.1040325.
[8] M. Pickering, J. Gibbons, and N. Wu, “Profunctor Optics: Modular Data Accessors”, Art Sci.
Eng. Program., vol. 1, no. 2, p. 7, 2017, doi: 10.22152/programming-journal.org/2017/1/7.