0% found this document useful (0 votes)
16 views27 pages

A Formal Specification of The JQ Language: Michael Färber

Uploaded by

OBXO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

A Formal Specification of The JQ Language: Michael Färber

Uploaded by

OBXO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

A formal specification of the jq language

MICHAEL FÄRBER
jq is a widely used tool that provides a programming language to manipulate JSON data. However, the jq
language is currently only specified by its implementation, making it difficult to reason about its behaviour.
To this end, we provide a formal syntax and denotational semantics for a large subset of the jq language. Our
most significant contribution is to provide a new way to interpret updates that allows for more predictable
and performant execution.
CCS Concepts: Software and its engineering → Semantics; Functional languages.
Additional Key Words and Phrases: jq, JSON, semantics

1 INTRODUCTION
UNIX has popularised the concept of filters and pipes [1]: A filter is a program that reads from an
input stream and writes to an output stream. Pipes are used to compose filters.
JSON (JavaScript Object Notation) is a widely used data serialisation format [2]. A JSON value
is either null, a boolean, a number, a string, an array of values, or an associative map from strings
to values.
jq is a tool that provides a language to define filters and an interpreter to execute them. Where
UNIX filters operate on streams of characters, jq filters operate on streams of JSON values. This
allows to manipulate JSON data with relatively compact filters. For example, given as input the
public JSON dataset of streets in Paris [3], jq retrieves the number of streets (6528) with the fil-
ter “length”, the names of the streets with the filter “.[].nomvoie”, and the total length of all
streets (1574028 m) with the filter “[.[].longueur] | add”. jq provides syntax to update data; for
example, to remove geographical data obtained by “.[].geo_shape”, but leaving intact all other
data, we can use “.[].geo_shape |= empty”. This shrinks the dataset from ~25 MB to ~7 MB.
jq provides a Turing-complete language that is interesting on its own; for example, “[0, 1] |
recurse([.[1], add])[0]" generates the stream of Fibonacci numbers. This makes jq a widely
used tool. We refer to the program jq as “jq” and to its language as “the jq language”.
The jq language is a dynamically typed, lazily evaluated functional programming language with
second-class higher-order functions [4]. The semantics of the jq language are only informally
specified, for example in the jq manual [5]. However, the documentation frequently does not
cover certain cases, and historically, the implementation often contradicted the documentation.
The underlying issue is that there existed no formally specified semantics to rely on. Having such
semantics allows to determine whether certain behaviour of a jq implementation is accidental or
intended.
However, a formal specification of the behaviour of jq would be very verbose, because jq has
many special cases whose merit is not apparent. Therefore, we have striven to create denotational
semantics (Section 5) that closely resemble those of jq such that in most cases, their behaviour

Authors’ addresses: Michael Färber, [email protected].

This work is licensed under a Creative Commons Attribution 4.0 International License.
© 2024 Copyright held by the owner/author(s).
2 Färber

coincides, whereas they may differ in more exotic cases. The goals for creating these semantics
were, in descending order of importance:
• Simplicity: The semantics should be easy to describe, understand, and implement.
• Performance: The semantics should allow for performant execution.
• Compatibility: The semantics should be consistent with jq.
We created these semantics experimentally, by coming up with jq filters and observing their out-
put for all kinds of inputs. From this, we synthesised mathematical definitions to model the be-
haviour of jq. The most significant improvement over jq behaviour described in this text are the
new update semantics (Section 6), which are simpler to describe and implement, eliminate a range
a potential errors, and allow for more performant execution.
The structure of this text is as follows: Section 2 introduces jq by a series of examples that give
a glimpse of actual jq syntax and behaviour. From that point on, the structure of the text follows
the execution of a jq program as shown in Figure 1. Section 3 formalises a subset of jq syntax and
shows how jq syntax can be transformed to increasingly low-level intermediate representations
called HIR (Section 3.1) and MIR (Section 3.2). After this, the semantics part starts: Section 4 de-
fines the type of JSON values and the elementary operations that jq provides for it. Furthermore,
it defines other basic data types such as errors, exceptions, and streams. Section 5 shows how to
evaluate jq filters on a given input value. Section 6 then shows how to evaluate a class of jq filters
that update values using a filter called path that defines which parts of the input to update, and
a filter that defines what the values matching the path should be replaced with. The semantics of
jq and those that will be shown in this text differ most notably in the case of updates. Finally, we
show how to prove properties of jq programs by equational reasoning in Section 7.

2 TOUR OF JQ
This goal of this section is to convey an intuition about how jq functions. The official documenta-
tion of jq is its user manual [5].
jq programs are called filters. For now, let us consider a filter to be a function from a value to
a (lazy, possibly infinite) stream of values. Furthermore, in this section, let us assume a value to
be either a boolean, an integer, or an array of values. (We introduce the full set of JSON values in
Section 4.)

input value Update evaluation

jq program HIR MIR Evaluation Value operations

output values & errors

Figure 1: Evaluation of a jq program with an input value. Solid lines indicate data flow, whereas a
dashed line indicates that a component is defined in terms of another.
A formal specification of the jq language 3

The identity filter “.” returns a stream containing the input.³


Arithmetic operations, such as addition, subtraction, multiplication, division, and remainder,
are available in jq. For example, “. + 1” returns a stream containing the successor of the input.
Here, “1” is a filter that returns the value 1 for any input.
Concatenation is an important operator in jq: The filter “f, g” concatenates the outputs of the
filters f and g. For example, the filter “., .” returns a stream containing the input value twice.
Composition is one of the most important operators in jq: The filter “f | g” maps the filter g
over all outputs of the filter f. For example, “(1, 2, 3) | (. + 1)” returns 2, 3, 4.
Arrays are created from a stream produced by f using the filter “[f]”. For example, the filter
“[1, 2, 3]” concatenates the output of the filters “1”, “2”, and “3” and puts it into an array, yielding
the value [1, 2, 3]. The inverse filter “.[]” returns a stream containing the values of an array
if the input is an array. For example, running “.[]” on the array [1, 2, 3] yields the stream 1,
2, 3 consisting of three values. We can combine the two shown filters to map over arrays; for
example, when given the input [1, 2, 3], the filter “[.[] | (. + 1)]” returns a single value
[2, 3, 4]. The values of an array at indices produced by f are returned by “.[f]”. For example,
given the input [1, 2, 3], the filter “.[0, 2, 0]” returns the stream 1, 3, 1.
Case distinctions can be performed with the filter “if f then g else h end”. For every value
v produced by f, this filter returns the output of g if v is true and the output of h otherwise. For
example, given the input 1, the filter “if (. < 1, . == 1, . >= 1) then . else [] end” returns
[], 1, 1.
We can define filters by using the syntax “def f(x1; ...; xn): g;”, which defines an fil-
ter f taking n arguments by g, where g can refer to x1 to xn. For example, jq provides the filter
“recurse(f)” to calculate fix points, which could be defined by “def recurse(f): ., (f |
recurse(f));”. Using this, we can define a filter to calculate the factorial function, for example.
Example 2.1 (Factorial): Let us define a filter fac that should return 𝑛! for any input number 𝑛.
We will define fac using the fix point of a filter update. The input and output of update shall be
an array [n, acc], satisfying the invariant that the final output is acc times the factorial of n. The
initial value passed to update is the array “[., 1]”. We can retrieve n from the array with “.[0]
” and acc with “.[1]”. We can now define update as “if .[0] > 1 then [.[0] - 1, .[0] * .
[1]] else empty end”, where “empty” is a filter that returns an empty stream. Given the input
value 4, the filter “[., 1] | recurse(update)” returns [4, 1], [3, 4], [2, 12], [1, 24]. We
are, however, only interested in the accumulator contained in the last value. So we can write “[.,
1] | last(recurse(update)) | .[1]”, where “last(f)” is a filter that outputs the last output
of f. This then yields a single value 24 as result.
Composition can also be used to bind values to variables. The filter “f as $x | g” performs the
following: Given an input value i, for every output o of the filter f applied to i, the filter binds
the variable $x to the value o, making it accessible to g, and yields the output of g applied to the
original input value i. For example, the filter “(0, 2) as $x | ((1, 2) as $y | ($x + $y))”
yields the stream 1, 2, 3, 4. Note that in this particular case, we could also write this as “(0,

³The filters in this section can be executed on most UNIX shells by echo $INPUT | jq $FILTER,
where $INPUT is the input value in JSON format and $FILTER is the jq program to be executed. Often, it
is convenient to quote the filter; for example, to run the filter “.” with the input value 0, we can run
echo 0 | jq '.'. In case where the input value does not matter, we can also use jq -n $FILTER,
which runs the filter with the input value null. We use jq 1.7.
4 Färber

2) + (1, 2)”, because arithmetic operators such as “f + g” take as inputs the Cartesian product
of the output of f and g.⁴ However, there are cases where variables are indispensable.
Example 2.2 (Variables Are Necessary) : jq defines a filter “in(xs)” that expands to “. as $x | xs
| has($x)”. Given an input value i, “in(xs)” binds it to $x, then returns for every value produced
by xs whether its domain contains $x (and thus i). Here, the domain of an array is the set of its
indices. For example, for the input 1, the filter “in([5], [42, 3], [])” yields the stream false,
true, false, because only [42, 3] has a length greater than 1 and thus a domain that contains
1. The point of this example is that we wish to pass xs as input to has, but at the same point, we
also want to pass the input given to in as an argument to has. Without variables, we could not
do both.
Folding over streams can be done using reduce and foreach: The filter “reduce xs as $x (init;
f)” keeps a state that is initialised with the output of init. For every element $x yielded by the
filter xs, reduce feeds the current state to the filter f, which may reference $x, then sets the state
to the output of f. When all elements of xs have been yielded, reduce returns the current state.
For example, the filter “reduce .[] as $x (0; . + $x)” calculates the sum over all elements of
an array. Similarly, “reduce .[] as $x (0; . + 1)” calculates the length of an array. These two
filters are called “add” and “length” in jq, and they allow to calculate the average of an array by
“add / length”. The filter “foreach xs as $x (init; f)” is similar to reduce, but also yields
all intermediate states, not only the last state. For example, “foreach .[] as $x (0; . + $x)”
yields the cumulative sum over all array elements.
Updating values can be done with the operator “|=”, which has a similar function as lens setters
in languages such as Haskell [6]–[8]: Intuitively, the filter “p |= f” considers any value v returned
by p and replaces it by the output of f applied to v. We call a filter on the left-hand side of “|=” a
path expression. For example, when given the input [1, 2, 3], the filter “.[] |= (. + 1)” yields
[2, 3, 4], and the filter “.[1] |= (. + 1)” yields [1, 3, 3]. We can also nest these filters;
for example, when given the input [[1, 2], [3, 4]], the filter “(.[] | .[]) |= (. + 1)”
yields [[2, 3], [4, 5]]. However, not every filter is a path expression; for example, the filter
“1” is not a path expression because “1” does not point to any part of the input value but creates
a new value.
Identities such as “.[] |= f” being equivalent to “[.[] | f]” when the input value is an array,
or “. |= f” being equivalent to f, would allow defining the behaviour of updates. However, these
identities do not hold in jq due the way it handles filters f that return multiple values. In particular,
when we pass 0 to the filter “. |= (1, 2)”, the output is 1, not (1, 2) as we might have expected.
Similarly, when we pass [1, 2] to the filter “.[] |= (., .)”, the output is [1, 2], not [1, 1, 2,
2] as expected. This behaviour of jq is cumbersome to define and to reason about. This motivates
in part the definition of more simple and elegant semantics that behave like jq in most typical use
cases but eliminate corner cases like the ones shown. We will show such semantics in Section 6.

3 SYNTAX
This section describes the syntax for a subset of the jq language that will be used later to define
the semantics in Section 5. To set the formal syntax apart from the concrete syntax introduced in
Section 2, we use cursive font (as in “𝑓”, “𝑣”) for the specification instead of the previously used
typewriter font (as in “f”, “v”).

⁴Haskell users might appreciate the similarity of the two filters to their Haskell analoga “[0, 2] >>=
(\x -> [1, 2] >>= (\y -> return (x+y)))” and “(+) <$> [0, 2] <*> [1, 2]”, which both return
[1, 2, 3, 4].
A formal specification of the jq language 5

We will start by introducing high-level intermediate representation (HIR) syntax in Section 3.1.
This syntax is very close to actual jq syntax. Then, we will identify a subset of HIR as mid-level
intermediate representation (MIR) in Section 3.2 and provide a way to translate from HIR to MIR.
This will simplify our semantics in Section 5. Finally, in Section 3.3, we will show how HIR relates
to actual jq syntax.

3.1 HIR
A filter 𝑓 is defined by
𝑓 ≔𝑛 ‖ 𝑠 ‖ .
‖ (𝑓) ‖ 𝑓? ‖ [𝑓] ‖ {𝑓 : 𝑓, …, 𝑓 : 𝑓} ‖ 𝑓𝑝? …𝑝?
‖ 𝑓⋆𝑓 ‖ 𝑓 ⚬𝑓
‖ 𝑓 as $𝑥 | 𝑓 ‖ 𝜙 𝑓 as $𝑥(𝑓; 𝑓) ‖ $𝑥
‖ label $𝑥 | 𝑓 ‖ break $𝑥
‖ if 𝑓 then 𝑓 else 𝑓 ‖ try 𝑓 catch 𝑓
‖ 𝑥 ‖ 𝑥(𝑓; …; 𝑓)

where 𝑝 is a path part of the shape


𝑝 ≔ [] ‖ [𝑓] ‖ [𝑓 :] ‖ [: 𝑓] ‖ [𝑓 : 𝑓],

𝑥 is an identifier (such as “empty”), 𝑛 is a number (such as 42 or 3.14), and 𝑠 is a string (such as


“Hello world!”). We use the superscript “?” to denote an optional presence of “?”; in particular,
𝑓𝑝? …𝑝? can be 𝑓𝑝, 𝑓𝑝?, 𝑓𝑝𝑝, 𝑓𝑝?𝑝, 𝑓𝑝𝑝?, 𝑓𝑝?𝑝?, 𝑓𝑝𝑝𝑝, and so on. The potential instances of the
operators ⋆ and ⚬ are given in Table 1. All operators ⋆ and ⚬ are left-associative, except for “|”,
“=”, “⊧”, and “⊙=”. A folding operation 𝜙 is either “reduce” or “foreach”.
A filter definition has the shape “𝑓(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑔”. Here, 𝑓 is an 𝑛-ary filter with filter argu-
ments 𝑥𝑖 , where 𝑔 may refer to 𝑥𝑖 . For example, this allows us to define filters that produce the
booleans, by defining true() ≔ (0 = 0) and false() ≔ (0 ≠ 0).
We are assuming a few preconditions that must be fulfilled for a filter to be well-formed. For
this, we consider a definition 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝜑:
• Arguments must be bound: The only filter arguments that 𝜑 can refer to are 𝑥1 , …, 𝑥𝑛 .
• Labels must be bound: If 𝜑 contains a statement break $𝑥, then it must occur as a subterm of
𝑔, where label $𝑥 | 𝑔 is a subterm of 𝜑.
• Variables must be bound: If 𝜑 contains any occurrence of a variable $𝑥, then it must occur as a
subterm of 𝑔, where either 𝑓 as $𝑥 | 𝑔 or 𝜙 𝑥 as $𝑥(𝑦; 𝑔) is a subterms of 𝜑.

Name Symbol Operators


Complex ⋆ “|”, “,”, (“=”, “⊧”, “⊙=”, “⫽=”), “⫽”, “or”, “and”
Cartesian ⚬ (≟, ≠), (<, ≤, >, ≥), ⊙
Arithmetic ⊙ (+, −), (×, ÷), %
Table 1: Binary operators, given in order of increasing precedence. Operators surrounded by
parentheses have equal precedence.
6 Färber

3.2 MIR
We are now going to identify a subset of HIR called MIR and show how to lower a HIR filter to a
semantically equivalent MIR filter.
A MIR filter 𝑓 has the shape
𝑓 ≔𝑛 ‖ 𝑠 ‖ .
‖ [𝑓] ‖ {} ‖ {𝑓 : 𝑓} ‖ .𝑝
‖ 𝑓⋆𝑓 ‖ $𝑥 ⚬ $𝑥
‖ 𝑓 as $𝑥 | 𝑓 ‖ 𝜙 𝑓 as $𝑥(.; 𝑓) ‖ $𝑥
‖ if $𝑥 then 𝑓 else 𝑓 ‖ try 𝑓 catch 𝑓
‖ label $𝑥 | 𝑓 ‖ break $𝑥
‖ 𝑥 ‖ 𝑥(𝑓; …; 𝑓)

where 𝑝 is a path part of the shape


𝑝 ≔ [] ‖ [$𝑥] ‖ [$𝑥 : $𝑥].

Furthermore, the set of complex operators ⋆ in MIR does not include “=” and “⊙=” anymore.
Compared to HIR, MIR filters have significantly simpler path operations (.𝑝 versus 𝑓𝑝? …𝑝? )
and replace certain occurrences of filters by variables (e.g. $𝑥 ⚬ $𝑥 versus 𝑓 ⚬ 𝑓).
Table 2 shows how to lower an HIR filter 𝜑 to a semantically equivalent MIR filter ⌊𝜑⌋. In
particular, this desugars path operations and makes it explicit which operations are Cartesian
or complex. By convention, we write $𝑥′ to denote a fresh variable. Notice that for some com-
plex operators ⋆, namely “=”, “⊙=”, “⫽=”, “and”, and “or”, Table 2 specifies individual lowerings,
whereas for the remaining complex operators ⋆, namely “|”, “,”, “⊧”, and “⫽”, Table 2 specifies a
uniform lowering ⌊𝑓 ⋆ 𝑔⌋ = ⌊𝑓⌋ ⋆ ⌊𝑔⌋.
Table 3 shows how to lower a path part 𝑝? to MIR filters. Like in Section 3.1, the meaning of
superscript “?” is an optional presence of “?”. In the lowering of 𝑓𝑝1? …𝑝𝑛? in Table 2, if 𝑝𝑖 in the first
column is directly followed by “?”, then ⌊𝑝𝑖? ⌋ in the second column stands for ⌊𝑝𝑖 ?⌋$𝑥 , otherwise
$𝑥
for ⌊𝑝𝑖 ⌋$𝑥 . Similarly, in Table 3, if 𝑝 in the first column is followed by “?”, then all occurrences of
superscript “?” in the second column stand for “?”, otherwise for nothing.
Example 3.2.1: The HIR filter (.[]?[]) is lowered to (. as $𝑥′ | . | .[]? | .[]). Semantically, we will
see that this is equivalent to (.[]? | .[]).
Example 3.2.2 : The HIR filter 𝜇 ≡ .[0] is lowered to ⌊𝜇⌋ ≡ . as $𝑥 | . | ($𝑥 | 0) as $𝑦 | .[$𝑦].
Semantically, we will see that ⌊𝜇⌋ is equivalent to 0 as $𝑦 | .[$𝑦].

𝑝? ⌊𝑝? ⌋
$𝑥
[]? .[]?
[𝑓]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | .[$𝑦′ ]?
[𝑓 :]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | length()? as $𝑧 ′ | .[$𝑦′ : $𝑧 ′ ]?
[: 𝑓]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | 0 as $𝑧 ′ | .[$𝑧 ′ : $𝑦′ ]?
[𝑓 : 𝑔]? ($𝑥 | ⌊𝑓⌋) as $𝑦′ | ($𝑥 | ⌊𝑔⌋) as $𝑧 ′ | .[$𝑦′ : $𝑧 ′ ]?
Table 3: Lowering of a path part 𝑝? with input $𝑥 to a MIR filter.
A formal specification of the jq language 7

𝜑 ⌊𝜑⌋
𝑛, 𝑠, ., $𝑥, or break $𝑥 𝜑
(𝑓) ⌊𝑓⌋
𝑓? try ⌊𝑓⌋ catch empty()
[] [empty()]
[𝑓] [⌊𝑓⌋]
{} {}
{𝑓 : 𝑔} ⌊𝑓⌋ as $𝑥 | ⌊𝑔⌋ as $𝑦′ | {$𝑥′ : $𝑦′ }

{𝑓1 : 𝑔1 , …, 𝑓𝑛 : 𝑔𝑛 } ⌊∑𝑖 {𝑓𝑖 : 𝑔𝑖 }⌋


𝑓𝑝1? …𝑝𝑛? . as $𝑥′ | ⌊𝑓⌋ | ⌊𝑝1? ⌋ | … | ⌊𝑝𝑛? ⌋
$𝑥′ $𝑥′
′ ′
𝑓 =𝑔 ⌊𝑔⌋ as $𝑥 | ⌊𝑓 ⊧ $𝑥 ⌋
𝑓 ⊙= 𝑔 ⌊𝑓 ⊧ . ⊙ 𝑔⌋
𝑓 ⫽= 𝑔 ⌊𝑓 ⊧ . ⫽ 𝑔⌋
𝑓 and 𝑔 ⌊𝑓⌋ as $𝑥′ | $𝑥′ and ⌊𝑔⌋
𝑓 or 𝑔 ⌊𝑓⌋ as $𝑥′ | $𝑥′ or ⌊𝑔⌋
𝑓⋆𝑔 ⌊𝑓⌋ ⋆ ⌊𝑔⌋
𝑓 ⚬𝑔 ⌊𝑓⌋ as $𝑥 | ⌊𝑔⌋ as $𝑦′ | $𝑥 ⚬ $𝑦

𝑓 as $𝑥 | 𝑔 ⌊𝑓⌋ as $𝑥 | ⌊𝑔⌋
𝜙 𝑓𝑥 as $𝑥(𝑓𝑦 ; 𝑓) . as $𝑥 | ⌊𝑓𝑦 ⌋ | 𝜙⌊$𝑥′ | 𝑓𝑥 ⌋ as $𝑥(.; ⌊𝑓⌋)

if 𝑓𝑥 then 𝑓 else 𝑔 ⌊𝑓𝑥 ⌋ as $𝑥′ | if $𝑥′ then ⌊𝑓⌋ else ⌊𝑔⌋


try 𝑓 catch 𝑔 try ⌊𝑓⌋ catch ⌊𝑔⌋
label $𝑥 | 𝑓 label $𝑥 | ⌊𝑓⌋
𝑥 𝑥
𝑥(𝑓1 ; …; 𝑓𝑛 ) 𝑥(⌊𝑓1 ⌋; …; ⌊𝑓𝑛 ⌋)
Table 2: Lowering of a HIR filter 𝜑 to a MIR filter ⌊𝜑⌋.

The HIR filter 𝜑 ≡ [3] | .[0] = (length(), 2) is lowered to the MIR filter
⌊𝜑⌋ ≡ [3] | (length(), 2) as $𝑧 | ⌊𝜇⌋ ⊧ $𝑧. In Section 5, we will see that its output is ⟨[1], [2]⟩.
This lowering assumes the presence of one filter in the definitions, namely empty. This filter re-
turns an empty stream. We might be tempted to define it as {} | .[], which constructs an empty
object, then returns its contained values, which corresponds to an empty stream as well. However,
such a definition relies on the temporary construction of new values (such as the empty object
here), which is not admissible on the left-hand side of updates (see Section 6). For this reason, we
have to define it in a more complicated way, for example
empty() ≔ ({} | .[]) as $𝑥 | .

This definition ensures that empty can be employed also as a path expression.
8 Färber

The lowering in Table 2 is compatible with the semantics of the jq implementa-


tion, with one notable exception: In jq, Cartesian operations 𝑓 ⚬ 𝑔 would be lowered to
⌊𝑔⌋ as $𝑦′ | ⌊𝑓⌋ as $𝑥′ | $𝑥 ⚬ $𝑦, whereas we lower it to ⌊𝑓⌋ as $𝑥′ | ⌊𝑔⌋ as $𝑦′ | $𝑥 ⚬ $𝑦, thus
inverting the binding order. Note that the difference only shows when both 𝑓 and 𝑔 return multiple
values. We diverge here from jq to make the lowering of Cartesian operations consistent with that
of other operators, such as {𝑓 : 𝑔}, where the leftmost filter (𝑓) is bound first and the rightmost fil-
ter (𝑔) is bound last. That also makes it easier to describe other filters, such as {𝑓1 : 𝑔1 , …, 𝑓𝑛 : 𝑔𝑛 },
which we can lower to ⌊∑𝑖 {𝑓𝑖 : 𝑔𝑖 }⌋, whereas its lowering assuming the jq lowering of Cartesian
operations would be ⌊{𝑓1 : 𝑔1 }⌋ as $𝑥′1 | … | ⌊{𝑓𝑛 : 𝑔𝑛 }⌋ as $𝑥′𝑛 | ∑𝑖 $𝑥′𝑖 .
Example 3.2.3 : The filter (0, 2) + (0, 1) yields ⟨0, 1, 2, 3⟩ using our lowering, and ⟨0, 2, 1, 3⟩ in jq.

3.3 CONCRETE JQ SYNTAX


Let us now go a level above HIR, namely a subset of actual jq syntax⁵ of which we have seen
examples in Section 2, and show how to transform jq programs to HIR and to MIR.
A program is a (possibly empty) sequence of definitions, followed by a main filter f. A defini-
tion has the shape def x(x1; ...; xn): g; or def x: g; where x is an identifier, x1 to xn is
a non-empty sequence of semicolon-separated identifiers, and g is a filter. In HIR, we write the
corresponding definition as 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑔.
The syntax of filters in concrete jq syntax is nearly the same as in HIR. To translate between
the operators in Table 1, see Table 4. The arithmetic update operators in jq, namely +=, -=, *=, /
=, and %=, correspond to the operators ⊙= in HIR, namely +=, −=, ×=, ÷=, and %=. Filters of
the shape if f then g else h end correspond to the filter if 𝑓 then 𝑔 else ℎ in HIR; that is, in
HIR, the final end is omitted.
In jq, it is invalid syntax to call a nullary filter as x() instead of x, or to define a nullary filter
as def x(): f; instead of def x: f;. On the other hand, on the right-hand side of a definition,
x may refer either to a filter argument x or a nullary filter x. To ease our lives when defining
the semantics, we allow the syntax 𝑥() in HIR. We unambiguously interpret 𝑥 as call to a filter
argument and 𝑥() as call to a filter that was defined as 𝑥() ≔ 𝑓.
To convert a jq program to MIR, we do the following:
1. For each definition, convert it to a HIR definition.
2. Convert the main filter f to a HIR filter 𝑓.
3. Replace the right-hand sides of HIR definitions and 𝑓 by their lowered MIR counterparts, us-
ing Table 2.
Example 3.3.1 : Consider the jq program def recurse(f): ., (f | recurse(f)); recurse(.
+ 1), which returns the infinite stream of output values 𝑛, 𝑛 + 1, … when provided with
an input number 𝑛. The definition in this example can be converted to the HIR defini-
tion recurse(𝑓) ≔ ., (𝑓 | recurse(𝑓)) and the main filter can be converted to the HIR filter

jq | , = |= //= // == != < <= > >= + - * / %

HIR | , = ⊧ ⫽= ⫽ ≟ ≠ < ≤ > ≥ + − × ÷ %


Table 4: Operators in concrete jq syntax and their corresponding HIR operators.

⁵Actual jq syntax has a few more constructions to offer, including nested definitions, variable
arguments, string interpolation, modules, etc. However, these constructions can be transformed into
semantically equivalent syntax as treated in this text.
A formal specification of the jq language 9

recurse(. + 1). The lowering of the definition to MIR yields the same as the HIR definition, and
the lowering of the main filter to MIR yields recurse(. as $𝑥′ | 1 as $𝑦′ | $𝑥′ + $𝑦′ ).
Example 3.3.2 : Consider the jq program
def select(f): if f then . else empty end;
def negative: . < 0; .[] | select(negative). When given an array as an input, it yields
those elements of the array that are smaller than 0. Here, the definitions in the example are
converted to the HIR definitions select(𝑓) ≔ if 𝑓 then . else empty() and negative() ≔ . < 0,
and the main filter is converted to the HIR filter .[] | select(negative()). Both the definition of
select(𝑓) and the main filter are already in MIR; the MIR version of the remaining definition is
negative() ≔ . as $𝑥′ | 0 as $𝑦′ | $𝑥′ < $𝑦′ .
We will show in Section 5 how to run the resulting MIR filter 𝑓 in the presence of a set of MIR
definitions. For a given input value 𝑣, the output of 𝑓 will be given by 𝑓|{}
𝑣 .

4 VALUES
In this section, we will define JSON values, errors, exceptions, and streams. Furthermore, we will
define several functions and operations on values.
A JSON value 𝑣 has the shape
𝑣 ≔ null ‖ false ‖ true ‖ 𝑛 ‖ 𝑠 ‖ [𝑣0 , …, 𝑣𝑛 ] ‖ {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 },

where 𝑛 is a number and 𝑠 is a string. We write a string 𝑠 as 𝑐0 …𝑐𝑛 , where 𝑐 is a character. A


value of the shape [𝑣0 , …, 𝑣𝑛 ] is called an array and a value of the shape {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }
is an unordered map from keys 𝑘 to values that we call an object.⁶ In JSON, object keys are
strings.⁷ We assume that the union of two objects is right-biased; i.e., if we have two objects 𝑙 and
𝑟 = {𝑘 ↦ 𝑣, …}, then (𝑙 ∪ 𝑟)(𝑘) = 𝑣 (regardless of what 𝑙(𝑘) might yield).
By convention, we will write in the remainder of this text 𝑣 for values, 𝑛 for numbers, 𝑐 for
characters, and 𝑘 for object keys. We will sometimes write arrays as [𝑣0 , …, 𝑣𝑛 ] and sometimes as
[𝑣1 , …, 𝑣𝑛 ]: The former case is useful to express that 𝑛 is the maximal index of the array (having
length 𝑛 + 1), and the latter case is useful to express that the array has length 𝑛. The same idea
applies also to strings, objects, and streams.
A number can be an integer or a decimal, optionally followed by an integer exponent. For ex-
ample, 0, −42, 3.14, 3 × 108 are valid JSON numbers. This text does not fix how numbers are to
be represented, just like the JSON standard does not impose any representation.⁸ Instead, it just
assumes that the type of numbers has a total order (see Section 4.6) and supports the arithmetic
operations +, −, ×, ÷, and % (modulo).

⁶The JSON syntax uses {𝑘0 : 𝑣0 , …, 𝑘𝑛 : 𝑣𝑛 } instead of {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }. However, in this


text, we will use the {𝑘0 : 𝑣0 , …, 𝑘𝑛 : 𝑣𝑛 } syntax to denote the construction of objects, and use
{𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 } syntax to denote actual objects.
⁷YAML is a data format similar to JSON. While YAML can encode any JSON value, it additionally
allows any YAML values to be used as object keys, where JSON allows only strings to be used as object
keys. This text deliberately distinguishes between object keys and strings. That way, extending the
given semantics to use YAML values should be relatively easy.
⁸jq uses floating-point numbers to encode both integers and decimals. However, several operations
in this text (for example those in Section 4.4) make only sense for natural numbers ℕ or integers ℤ. In
situations where integer values are expected and a number 𝑛 is provided, jq generally substitutes 𝑛 by
⌊𝑛⌋ if 𝑛 ≥ 0 and ⌈𝑛⌉ if 𝑛 < 0. For example, accessing the 0.5-th element of an array yields its 0-th
element. In this text, we use do not document this rounding behaviour for each function.
10 Färber

An error can be constructed from a value by the function error(𝑣). The error function is bijective;
that is, if we have an error 𝑒, then there is a unique value 𝑣 with 𝑒 = error(𝑣). In the remainder
of this text, we will write just “error” to denote calling error(𝑣) with some value 𝑣. This is done
such that this specification does not need to fix the precise error value that is returned when an
operation fails.
An exception either is an error or has the shape break($𝑥). The latter will become relevant
starting from Section 5.
A value result is either a value or an exception.
A stream (or lazy list) is written as ⟨𝑣0 , …, 𝑣𝑛 ⟩. The concatenation of two streams 𝑠1 ,
𝑠2 is written as 𝑠1 + 𝑠2 . Given some stream 𝑙 = ⟨𝑥0 , …, 𝑥𝑛 ⟩, we write ∑𝑥∈𝑙 𝑓(𝑥) to denote
𝑓(𝑥0 ) + … + 𝑓(𝑥𝑛 ). We use this frequently to map a function over a stream, by having 𝑓(𝑥) re-
turn a stream itself.
In this text, we will see many functions that take values as arguments. By convention, for any
of these functions 𝑓(𝑣1 , …, 𝑣𝑛 ), we extend their domain to value results such that 𝑓(𝑣1 , …, 𝑣𝑛 )
yields 𝑣𝑖 (or rather ⟨𝑣𝑖 ⟩ if 𝑓 returns streams) if 𝑣𝑖 is an exception and for all 𝑗 < 𝑖, 𝑣𝑗 is a value. For
example, in Section 4.3, we will define 𝑙 + 𝑟 for values 𝑙 and 𝑟, and by our convention, we extend
the domain of addition to value results such that if 𝑙 is an exception, then 𝑙 + 𝑟 returns just 𝑙, and
if 𝑙 is a value, but 𝑟 is an exception, then 𝑙 + 𝑟 returns just 𝑟.

4.1 CONSTRUCTION
In this subsection, we will introduce operators to construct arrays and objects.
The function [⋅] transforms a stream into an array if all stream elements are values, or into the
first exception in the stream otherwise:
𝑣𝑖 if 𝑣𝑖 is an exception and for all 𝑗 < 𝑖, 𝑣𝑗 is a value
[⟨𝑣0 , …, 𝑣𝑛 ⟩] ≔ {
[𝑣0 , …, 𝑣𝑛 ] otherwise

Given two values 𝑘 and 𝑣, we can make an object out of them:


{𝑘 ↦ 𝑣} if 𝑘 is a string and 𝑣 is a value
{𝑘 : 𝑣} ≔ {
error otherwise
We can construct objects with multiple keys by adding objects, see Section 4.3.

4.2 SIMPLE FUNCTIONS


We are now going to define several functions that take a value and return a value.
The keys of a value are defined as follows:
⎧⟨0, …, 𝑛⟩ if 𝑣 = [𝑣0 , …, 𝑣𝑛 ]
{
{
{
{
{⟨𝑘0 ⟩ + keys(𝑣 ) if 𝑣 = {𝑘0 ↦ 𝑣0 } ∪ 𝑣′ and 𝑘0 = min(dom(𝑣))

keys(𝑣) ≔ ⎨
{ ⟨⟩ if 𝑣 = {}
{
{
{⟨error⟩
{ otherwise

For an object 𝑣, keys(𝑣) returns the domain of the object sorted by ascending order. For the used
ordering, see Section 4.6.
We define the length of a value as follows:
A formal specification of the jq language 11

⎧0 if 𝑣 = null
{
{
{
{ |𝑛| if 𝑣 is a number 𝑛
{
{
{𝑛 if 𝑣 = 𝑐1 …𝑐𝑛
|𝑣| ≔ ⎨
{ 𝑛 if 𝑣 = [𝑣1 , …, 𝑣𝑛 ]
{
{
{ 𝑛 if 𝑣 = {𝑘1 ↦ 𝑣1 , …, 𝑘𝑛 ↦ 𝑣𝑛 }
{
{
{error otherwise (if 𝑣 ∈ {true, false})

The boolean value of a value 𝑣 is defined as follows:
false if 𝑣 = null or 𝑣 = false
bool(𝑣) ≔ {
true otherwise
We can draw a link between the functions here and jq: When called with the input value 𝑣, the jq
filter keys yields ⟨[keys(𝑣)]⟩, the jq filter length yields ⟨|𝑣|⟩, and the jq filter true and . yields
⟨bool(𝑣)⟩.

4.3 ARITHMETIC OPERATIONS


We will now define a set of arithmetic operations on values. We will link these later directly to
their counterparts in jq: Suppose that the jq filters f and g yield ⟨𝑙⟩ and ⟨𝑟⟩, respectively. Then the
jq filters f + g, f - g, f * g, f / g, and f % g yield ⟨𝑙 + 𝑟⟩, ⟨𝑙 − 𝑟⟩, ⟨𝑙 × 𝑟⟩, ⟨𝑙 ÷ 𝑟⟩, and ⟨𝑙 % 𝑟⟩,
respectively.

4.3.1 ADDITION
We define addition of two values 𝑙 and 𝑟 as follows:
⎧𝑣 if 𝑙 = null and 𝑟 = 𝑣, or 𝑙 = 𝑣 and 𝑟 = null
{
{𝑛 + 𝑛 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2
{ 1 2
{
{𝑐𝑙,1 …𝑐𝑙,𝑚 𝑐𝑟,1 …𝑐𝑟,𝑛 if 𝑙 = 𝑐𝑙,1 …𝑐𝑙,𝑚 and 𝑟 = 𝑐𝑟,1 …𝑐𝑟,𝑛
𝑙+𝑟 ≔⎨
{[⟨𝑙1 , …, 𝑙𝑚 , 𝑟1 , …, 𝑟𝑛 ⟩] if 𝑙 = [𝑙1 , …, 𝑙𝑚 ] and 𝑟 = [𝑟1 , …, 𝑟𝑛 ]
{
{𝑙 ∪ 𝑟 if 𝑙 = {…} and 𝑟 = {…}
{
{error otherwise

Here, we can see that null serves as a neutral element for addition. For strings and arrays, addition
corresponds to their concatenation, and for objects, it corresponds to their union.

4.3.2 MULTIPLICATION
Given two objects 𝑙 and 𝑟, we define their recursive merge 𝑙 ⋓ 𝑟 as:
⎧{𝑘 ↦ 𝑣 ⋓ 𝑣 } ∪ 𝑙′ ⋓ 𝑟′ if 𝑙 = {𝑘 ↦ 𝑣 } ∪ 𝑙′ , 𝑟 = {𝑘 ↦ 𝑣 } ∪ 𝑟′ , and 𝑣 , 𝑣 are objects
{ 𝑙 𝑟 𝑙 𝑟 𝑙 𝑟
{
{{𝑘 ↦ 𝑣𝑟 } ∪ 𝑙′ ⋓ 𝑟′ if 𝑙 = {𝑘 ↦ 𝑣𝑙 } ∪ 𝑙′ , 𝑟 = {𝑘 ↦ 𝑣𝑟 } ∪ 𝑟′ , and 𝑣𝑙 or 𝑣𝑟 is not an object
𝑙⋓𝑟 ≔⎨ ′
{{𝑘 ↦ 𝑣𝑟 } ∪ 𝑙 ⋓ 𝑟 if 𝑘 ∉ dom(𝑙) and 𝑟 = {𝑘 ↦ 𝑣𝑟 } ∪ 𝑟′
{
{𝑙 otherwise (if 𝑟 = {})

We use this in the following definition of multiplication of two values 𝑙 and 𝑟:
12 Färber

⎧𝑛 × 𝑛 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2


{ 1 2
{𝑙 + 𝑙 × (𝑟 − 1) if 𝑙 is a string and 𝑟 ∈ ℕ ∖ {0}
{
{null if 𝑙 is a string and 𝑟 = 0
𝑙×𝑟 ≔⎨
{𝑟 × 𝑙 if 𝑟 is a string and 𝑙 ∈ ℕ
{𝑙 ⋓ 𝑟 if 𝑙 and 𝑟 are objects
{
{error otherwise

We can see that multiplication of a string 𝑠 with a natural number 𝑛 > 0 returns ∑𝑛𝑖=1 𝑠; that is,
the concatenation of 𝑛 times the string 𝑠. The multiplication of two objects corresponds to their
recursive merge as defined above.

4.3.3 SUBTRACTION
We now define subtraction of two values 𝑙 and 𝑟:
⎧𝑛1 − 𝑛2 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2
{
{
𝑙 − 𝑟 ≔ ⎨[∑ ⟨𝑙 ⟩] if 𝑙 = [𝑙0 , …, 𝑙𝑛 ] and 𝑟 = [𝑟0 , …, 𝑟𝑛 ]
𝑖,𝑙𝑖 ∈{𝑟0 ,…,𝑟𝑛 } 𝑖
{
{error
⎩ otherwise

When both 𝑙 and 𝑟 are arrays, then 𝑙 − 𝑟 returns an array containing those values of 𝑙 that are not
contained in 𝑟.

4.3.4 DIVISION
We will now define a function that splits a string 𝑦 + 𝑥 by some non-empty separator string 𝑠.
The function preserves the invariant that 𝑦 does not contain 𝑠:
⎧split(𝑐1 …𝑐𝑛 , 𝑠, 𝑦 + 𝑐0 ) if 𝑥 = 𝑐0 …𝑐𝑛 and 𝑐0 …𝑐|𝑠| −1 ≠ 𝑠
{
{
split(𝑥, 𝑠, 𝑦) ≔ ⎨[⟨𝑦⟩] + split(𝑐|𝑠| …𝑐𝑛 , 𝑠, "") if 𝑥 = 𝑐0 …𝑐𝑛 and 𝑐0 …𝑐|𝑠| −1 = 𝑠
{
{[⟨𝑦⟩] otherwise (|𝑥| = 0)

We use this splitting function to define division of two values:
⎧𝑛 ÷ 𝑛 if 𝑙 is a number 𝑛1 and 𝑟 is a number 𝑛2
{ 1 2
{
{[] if 𝑙 and 𝑟 are strings and |𝑙| = 0
{
𝑙 ÷ 𝑟 ≔ ⎨[∑ ⟨𝑐𝑖 ⟩] if 𝑙 = 𝑐0 …𝑐𝑛 , 𝑟 is a string, |𝑙| > 0, and |𝑟| = 0
𝑖
{
{split(𝑙, 𝑟, "") if 𝑙 and 𝑟 are strings, |𝑙| > 0, and |𝑟| > 0
{
{error otherwise

Example 4.3.4.1 : Let 𝑠 = "ab". We have that 𝑠 ÷ 𝑠 = ["", ""]. Furthermore, "𝑐" ÷ 𝑠 = ["𝑐"],
(𝑠 + "𝑐"+𝑠) ÷ 𝑠 = ["", "𝑐", ""] and (𝑠 + "𝑐"+𝑠 + "de") ÷ 𝑠 = ["", "𝑐", "de"].
From this example, we can infer the following lemma.
Lemma 4.3.4.1 : Let 𝑙 and 𝑟 strings with |𝑙| > 0 and |𝑟| > 0. Then 𝑙 ÷ 𝑟 = [𝑙0 , …, 𝑙𝑛 ] for some 𝑛 > 0
such that 𝑙 = (∑𝑛−1
𝑖=0 𝑖
(𝑙 + 𝑟)) + 𝑙𝑛 and for all 𝑖, 𝑙𝑖 is a string that does not contain 𝑟 as substring.
A formal specification of the jq language 13

4.3.5 REMAINDER
For two values 𝑙 and 𝑟, the arithmetic operation 𝑙 % 𝑟 (modulo) yields 𝑚 % 𝑛 if 𝑙 and 𝑟 are numbers
𝑚 and 𝑛, otherwise it yields an error.

4.4 ACCESSING
We will now define three access operators. These serve to extract values that are contained within
other values.
The value 𝑣[𝑖] of a value 𝑣 at index 𝑖 is defined as follows:
⎧𝑣 if 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑖 ∈ ℕ, and 𝑖 ≤ 𝑛
{ 𝑖
{
{null if 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑖 ∈ ℕ, and 𝑖 > 𝑛
{
{𝑣[𝑛 + 𝑖] if 𝑣 = [𝑣 , …, 𝑣 ], 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
0 𝑛
𝑣[𝑖] ≔ ⎨
𝑣
{ 𝑗 if 𝑣 = {𝑘 0 ↦ 𝑣 0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }, 𝑖 is a string, and 𝑘𝑗 = 𝑖
{
{null if 𝑣 = {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }, 𝑖 is a string, and 𝑖 ∉ {𝑘0 , …, 𝑘𝑛 }
{
{error otherwise

The idea behind this index operator is as follows: It returns null if the value 𝑣 does not contain a
value at index 𝑖, but 𝑣 could be extended to contain one. More formally, 𝑣[𝑖] is null if 𝑣 ≠ null and
there exists some value 𝑣′ = 𝑣 + 𝛿 such that 𝑣′ [𝑖] ≠ null.
The behaviour of this operator for 𝑖 < 0 is that 𝑣[𝑖] equals 𝑣[|𝑣| + 𝑖].
Example 4.4.1 : If 𝑣 = [0, 1, 2], then 𝑣[1] = 1 and 𝑣[−1] = 𝑣[3 − 1] = 2.
Using the index operator, we can define the values 𝑣[] in a value 𝑣 as follows:
𝑣[] ≔ ∑ ⟨𝑣[𝑖]⟩
𝑖∈keys(𝑣)

When provided with an array 𝑣 = [𝑣0 , …, 𝑣𝑛 ] or an object 𝑣 = {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 } (where


𝑘0 < … < 𝑘𝑛 ), 𝑣[] returns the stream ⟨𝑣0 , …, 𝑣𝑛 ⟩.
The last operator that we define here is a slice operator:
⎧ 𝑗−1
{[∑𝑘=𝑖 ⟨𝑣𝑘 ⟩] if 𝑣 = [𝑣0 , …, 𝑣𝑛 ] and 𝑖, 𝑗 ∈ ℕ
{
{ 𝑗−1
{∑𝑘=𝑖 𝑐𝑘 if 𝑣 = 𝑐0 …𝑐𝑛 and 𝑖, 𝑗 ∈ ℕ
𝑣[𝑖 : 𝑗] ≔ ⎨
𝑣[(𝑛 + 𝑖) : 𝑗] if |𝑣| = 𝑛, 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
{
{𝑣[𝑖 : (𝑛 + 𝑗)] if |𝑣| = 𝑛, 𝑗 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑗
{
{error otherwise

Note that unlike 𝑣[] and 𝑣[𝑖], 𝑣[𝑖 : 𝑗] may yield a value if 𝑣 is a string. If we have that 𝑖, 𝑗 ∈ ℕ and
either 𝑖 > 𝑛 or 𝑖 ≥ 𝑗, then 𝑣[𝑖 : 𝑗] yields an empty array if 𝑣 is an array, and an empty string if 𝑣
is a string.
Example 4.4.2 : If 𝑣 = [0, 1, 2, 3], then 𝑣[1 : 3] = [1, 2].
The operator 𝑣[] is the only operator in this subsection that returns a stream of value results in-
stead of only a value result.
14 Färber

4.5 UPDATING
For each access operator in Section 4.4, we will now define an updating counterpart. Intuitively,
where an access operator yields some elements contained in a value 𝑣, its corresponding update
operator replaces these elements in 𝑣 by the output of a function. The access operators will be used
in Section 5, and the update operators will be used in Section 6.
All update operators take at least a value 𝑣 and a function 𝑓 from a value to a stream of value
results. We extend the domain of 𝑓 to value results such that 𝑓(𝑒) = ⟨𝑒⟩ if 𝑒 is an exception.
The first update operator will be a counterpart to 𝑣[]. For all elements 𝑥 that are yielded by 𝑣[],
𝑣[] ⊧ 𝑓 replaces 𝑥 by 𝑓(𝑥):
⎧[∑ 𝑓(𝑣 )] if 𝑣 = [𝑣0 , …, 𝑣𝑛 ]
{ 𝑖 𝑖
{
𝑣[] ⊧ 𝑓 ≔ ⎨⋃ {{𝑘𝑖 :ℎ} if 𝑓(𝑣𝑖 )=⟨ℎ⟩+𝑡 if 𝑣 = {𝑘 ↦ 𝑣 , …, 𝑘 ↦ 𝑣 }
0 0 𝑛 𝑛
{ 𝑖 {} otherwise
{error otherwise

For an input array 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑣[] ⊧ 𝑓 replaces each 𝑣𝑖 by the output of 𝑓(𝑣𝑖 ), yielding
[𝑓(𝑣0 ) + … + 𝑓(𝑣𝑛 )]. For an input object 𝑣 = {𝑘0 ↦ 𝑣0 , …, 𝑘𝑛 ↦ 𝑣𝑛 }, 𝑣[] ⊧ 𝑓 replaces each 𝑣𝑖
by the first output yielded by 𝑓(𝑣𝑖 ) if such an output exists, otherwise it deletes {𝑘𝑖 ↦ 𝑣𝑖 } from
the object. Note that updating arrays diverges from jq, because jq only considers the first value
yielded by 𝑓.
For the next operators, we will use the following function head(𝑙, 𝑒), which returns the head of
a list 𝑙 if it is not empty, otherwise 𝑒:
ℎ if 𝑙 = ⟨ℎ⟩ + 𝑡
head(𝑙, 𝑒) ≔ {
𝑒 otherwise
The next function takes a value 𝑣 and replaces its 𝑖-th element by the first output of 𝑓, or deletes
it if 𝑓 yields no output:
⎧𝑣[0 : 𝑖] + [head(𝑓(𝑣[𝑖]), ⟨⟩)] + 𝑣[(𝑖 + 1) : 𝑛] if 𝑣 = [𝑣 , …, 𝑣 ], 𝑖 ∈ ℕ, and 𝑖 ≤ 𝑛
{ 0 𝑛
{
{𝑣[𝑛 + 𝑖] ⊧ 𝑓 if 𝑣 = [𝑣0 , …, 𝑣𝑛 ], 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
{
𝑣[𝑖] ⊧ 𝑓 ≔ ⎨𝑣 + {𝑖 : ℎ} if 𝑣 = {…} and 𝑓(𝑣[𝑖]) = ⟨ℎ⟩ + 𝑡
{⋃ {𝑘 ↦ 𝑣[𝑘]} if 𝑣 = {…} and 𝑓(𝑣[𝑖]) = ⟨⟩
{ 𝑘∈ dom(𝑣)∖{𝑖}
{
{error otherwise

Note that this diverges from jq if 𝑣 = [𝑣0 , …, 𝑣𝑛 ] and 𝑖 > 𝑛, because jq fills up the array with null.
The final function here is the update counterpart of the operator 𝑣[𝑖 : 𝑗]. It replaces the slice
𝑣[𝑖 : 𝑗] by the first output of 𝑓 on 𝑣[𝑖 : 𝑗], or by the empty array if 𝑓 yields no output.
⎧𝑣[0 : 𝑖] + head(𝑓(𝑣[𝑖 : 𝑗]), []) + 𝑣[𝑗 : 𝑛] if 𝑣 = [𝑣 , …, 𝑣 ], 𝑖, 𝑗 ∈ ℕ, and 𝑖 ≤ 𝑗
{ 0 𝑛
{
{𝑣 if 𝑣 = [𝑣 , …, 𝑣 𝑛 ], 𝑖, 𝑗 ∈ ℕ, and 𝑖 > 𝑗
{ 0
𝑣[𝑖 : 𝑗] ⊧ 𝑓 ≔ ⎨𝑣[(𝑛 + 𝑖) : 𝑗] ⊧ 𝑓 if |𝑣| = 𝑛, 𝑖 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑖
{
{𝑣[𝑖 : (𝑛 + 𝑗)] ⊧ 𝑓 if |𝑣| = 𝑛, 𝑗 ∈ ℤ ∖ ℕ, and 0 ≤ 𝑛 + 𝑗
{
{error otherwise

Unlike its corresponding access operator 𝑣[𝑖 : 𝑗], this operator unconditionally fails when 𝑣 is a
string. This operator diverges from jq if 𝑓 yields null, in which case jq returns an error, whereas
this operator treats this as equivalent to 𝑓 returning [].
A formal specification of the jq language 15

Example 4.5.1 : If 𝑣 = [0, 1, 2, 3] and 𝑓(𝑣) = [4, 5, 6], then 𝑣[1 : 3] ⊧ 𝑓 = [0, 4, 5, 6, 3].

4.6 ORDERING
In this subsection, we establish a total order on values.⁹
We have that
null < false < true < 𝑛 < 𝑠 < 𝑎 < 𝑜,

where 𝑛 is a number, 𝑠 is a string, 𝑎 is an array, and 𝑜 is an object. We assume that there is a total
order on numbers and characters. Strings and arrays are ordered lexicographically.
Two objects 𝑜1 and 𝑜2 are ordered as follows: For both objects 𝑜𝑖 (𝑖 ∈ {1, 2}), we sort the array
[keys(𝑜𝑖 )] by ascending order to obtain the ordered array of keys 𝑘𝑖 = [𝑘1 , …, 𝑘𝑛 ], from which we
obtain 𝑣𝑖 = [𝑜[𝑘1 ], …, 𝑜[𝑘𝑛 ]]. We then have
𝑘1 < 𝑘2 if 𝑘1 < 𝑘2 or 𝑘1 > 𝑘2
𝑜1 < 𝑜2 ⟺ {
𝑣1 < 𝑣2 otherwise (𝑘1 = 𝑘2 )

5 EVALUATION SEMANTICS
In this section, we will define a function 𝜑|𝑐𝑣 that returns the output of the filter 𝜑 in the context
𝑐 on the input value 𝑣.
Let us start with a few definitions. A context 𝑐 is a mapping from variables $𝑥 to values and from
identifiers 𝑥 to pairs (𝑓, 𝑐), where 𝑓 is a filter and 𝑐 is a context. Contexts store what variables
and filter arguments are bound to.
We are now going to introduce a few helper functions. The first function helps define filters
such as if-then-else and alternation (𝑓 ⫽ 𝑔):
𝑡 if 𝑣 = 𝑖
ite(𝑣, 𝑖, 𝑡, 𝑒) = {
𝑒 otherwise
Next, we define a function that is used to define alternation. trues(𝑙) returns those elements of 𝑙
whose boolean values are not false. Note that in our context, “not false” is not the same as “true”,
because the former includes exceptions, whereas the latter excludes them, and bool(𝑥) can return
exceptions, in particular if 𝑥 is an exception.
trues(𝑙) ≔ ∑ ⟨𝑥⟩
𝑥∈𝑙, bool(𝑥)≠ false

The evaluation semantics are given in Table 5. Let us discuss its different cases:
• “.”: Returns its input value. This is the identity filter.
• 𝑛 or 𝑠: Returns the value corresponding to the number 𝑛 or string 𝑠.
• $𝑥: Returns the value currently bound to the variable $𝑥, by looking it up in the context. Well-
formedness of the filter (as defined in Section 3.1) ensures that such a value always exists.
• [𝑓]: Creates an array from the output of 𝑓, using the operator defined in Section 4.1.
• {}: Creates an empty object.
• {$𝑥 : $𝑦}: Creates an object from the values bound to $𝑥 and $𝑦, using the operator defined in
Section 4.1.
• 𝑓, 𝑔: Concatenates the outputs of 𝑓 and 𝑔.
• 𝑓 | 𝑔: Composes 𝑓 and 𝑔, returning the outputs of 𝑔 applied to all outputs of 𝑓.

⁹Note that jq does not implement a strict total order on values; in particular, its order on (floating-
point) numbers specifies nan < nan, from which follows that nan ≠ nan and nan ≯ nan.
16 Färber

𝜑 𝜑|𝑐𝑣
. ⟨𝑣⟩
𝑛 or 𝑠 ⟨𝜑⟩
$𝑥 ⟨𝑐($𝑥)⟩
[𝑓] ⟨[𝑓|𝑐𝑣 ]⟩
{} ⟨{}⟩
{$𝑥 : $𝑦} ⟨{𝑐($𝑥) : 𝑐($𝑦)}⟩
𝑓, 𝑔 𝑓|𝑐𝑣 + 𝑔|𝑐𝑣
𝑓 |𝑔 ∑𝑥∈𝑓|𝑐 𝑔|𝑐𝑥
𝑣

𝑓 ⫽𝑔 ite(trues(𝑓|𝑐𝑣 ), ⟨⟩, 𝑔|𝑐𝑣 , trues(𝑓|𝑐𝑣 ))


𝑓 as $𝑥 | 𝑔 ∑𝑥∈𝑓|𝑐 𝑔|𝑐{$𝑥↦𝑥}
𝑣
𝑣

$𝑥 ⚬ $𝑦 ⟨𝑐($𝑥) ⚬ 𝑐($𝑦)⟩
𝑔|𝑐𝑒 if 𝑥= error(𝑒)
try 𝑓 catch 𝑔 ∑𝑥∈𝑓|𝑐 {
𝑣 ⟨𝑥⟩ otherwise

label $𝑥 | 𝑓 label(𝑓|𝑐𝑣 , $𝑥)


break $𝑥 ⟨break($𝑥)⟩
$𝑥 and 𝑓 junction(𝑐($𝑥), false, 𝑓|𝑐𝑣 )
$𝑥 or 𝑓 junction(𝑐($𝑥), true, 𝑓|𝑐𝑣 )
if $𝑥 then 𝑓 else 𝑔 ite(bool(𝑐($𝑥)), true, 𝑓|𝑐𝑣 , 𝑔|𝑐𝑣 )
.[] 𝑣[]
.[$𝑥] ⟨𝑣[𝑐($𝑥)]⟩
.[$𝑥 : $𝑦] ⟨𝑣[𝑐($𝑥) : 𝑐($𝑦)]⟩
𝜙 𝑥 as $𝑥(.; 𝑓) 𝜙𝑐𝑣 (𝑥|𝑐𝑣 , $𝑥, 𝑓)
𝑐∪ ⋃𝑖 {𝑥𝑖 ↦(𝑓𝑖 ,𝑐)}
𝑥(𝑓1 ; …; 𝑓𝑛 ) 𝑓|𝑣 if 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑓

𝑥 𝑓|𝑐𝑣 if 𝑐(𝑥) = (𝑓, 𝑐′ )
𝑓 ⊧𝑔 see Section 6
Table 5: Evaluation semantics.
A formal specification of the jq language 17

• 𝑓 ⫽ 𝑔: Returns 𝑙 if 𝑙 is not empty, else the outputs of 𝑔, where 𝑙 are the outputs of 𝑓 whose
boolean values are not false.
• 𝑓 as $𝑥 | 𝑔: For every output of 𝑓, binds it to the variable $𝑥 and returns the output of 𝑔, where
𝑔 may reference $𝑥. Unlike 𝑓 | 𝑔, this runs 𝑔 with the original input value instead of an output
of 𝑓. We can show that the evaluation of 𝑓 | 𝑔 is equivalent to that of 𝑓 as $𝑥′ | $𝑥′ | 𝑔, where
$𝑥′ is a fresh variable. Therefore, we could be tempted to lower 𝑓 | 𝑔 to ⌊𝑓⌋ as $𝑥′ | $𝑥′ | ⌊𝑔⌋
in Table 2. However, we cannot do this because we will see in Section 6 that this equivalence
does not hold for updates; that is, (𝑓 | 𝑔) ⊧ 𝜎 is not equal to (𝑓 as $𝑥′ | $𝑥′ | 𝑔) ⊧ 𝜎.
• $𝑥 ⚬ $𝑦: Returns the output of a Cartesian operation “⚬” (any of ≟, ≠, <, ≤, >, ≥, +, −, ×, ÷,
and %, as given in Table 1) on the values bound to $𝑥 and $𝑦. The semantics of the arithmetic
operators are given in Section 4.3, the comparison operators are defined by the ordering given
in Section 4.6, 𝑙 ≟ 𝑟 returns whether 𝑙 equals 𝑟, and 𝑙 ≠ 𝑟 returns its negation.
• try 𝑓 catch 𝑔: Replaces all outputs of 𝑓 that equal error(𝑒) for some 𝑒 by the output of 𝑔
on the input 𝑒. Note that this diverges from jq, which aborts the evaluation of 𝑓 after the
first error. This behaviour can be simulated in our semantics, by replacing try 𝑓 catch 𝑔 with
label $𝑥′ | try 𝑓 catch (𝑔, break $𝑥′ ).
• label $𝑥 | 𝑓: Returns all values yielded by 𝑓 until 𝑓 yields an exception break($𝑥). This uses the
function label(𝑙, $𝑥), which returns all elements of 𝑙 until the current element is an exception
of the form break($𝑥):
⟨ℎ⟩ + label(𝑡, $𝑥) if 𝑙 = ⟨ℎ⟩ + 𝑡 and ℎ ≠ break($𝑥)
label(𝑙, $𝑥) ≔ {
⟨⟩ otherwise
• break $𝑥: Returns a value break($𝑥). Similarly to the evaluation of variables $𝑥 described
above, wellformedness of the filter (as defined in Section 3.1) ensures that the returned value
break($𝑥) will be eventually handled by a corresponding filter label $𝑥 | 𝑓. That means that
the evaluation of a wellformed filter can only yield values and errors, but never break($𝑥).
• $𝑥 and 𝑓: Returns false if $𝑥 is bound to either null or false, else returns the output of 𝑓 mapped
to boolean values. This uses the function junction(𝑥, 𝑣, 𝑙), which returns just 𝑣 if the boolean
value of 𝑥 is 𝑣 (where 𝑣 will be true or false), otherwise the boolean values of the values in 𝑙.
Here, bool(𝑣) returns the boolean value as given in Section 4.2.
junction(𝑥, 𝑣, 𝑙) ≔ ite(bool(𝑥), 𝑣, ⟨𝑣⟩, ∑⟨bool(𝑦)⟩)
𝑦∈𝑙
• $𝑥 or 𝑓: Similar to its “and” counterpart above.
• if $𝑥 then 𝑓 else 𝑔: Returns the output of 𝑓 if $𝑥 is bound to either null or false, else returns
the output of 𝑔.
• .[], .[$𝑥], or .[$𝑥 : $𝑦]: Accesses parts of the input value; see Section 4.4 for the definitions of the
operators.
• 𝜙 𝑥 as $𝑥(.; 𝑓): Folds 𝑓 over the values returned by 𝑥, starting with the current input as accu-
mulator. The current accumulator value is provided to 𝑓 as input value and 𝑓 can access the
current value of 𝑥 by $𝑥. If 𝜙 = reduce, this returns only the final values of the accumulator,
whereas if 𝜙 = foreach, this returns also the intermediate values of the accumulator. We will
define the functions reduce𝑐𝑣 (𝑙, $𝑥, 𝑓) and foreach𝑐𝑣 (𝑙, $𝑥, 𝑓) in Section 5.1.
• 𝑥(𝑓1 ; …; 𝑓𝑛 ): Calls an 𝑛-ary filter 𝑥 that is defined by 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑓. The output is that of
the filter 𝑓, where each filter argument 𝑥𝑖 is bound to (𝑓𝑖 , 𝑐). This also handles the case of calling
nullary filters such as empty.
• 𝑥: Calls a filter argument. By the well-formedness requirements given in Section 3.1, this must
occur within the right-hand side of a definition whose arguments include 𝑥. This requirement
18 Färber

also ensures that 𝑥 ∈ dom(𝑐), because an 𝑥 can only be evaluated as part of a call to the filter
where it was bound, and by the semantics of filter calls above, this adds a binding for 𝑥 to the
context.
• 𝑓 ⊧ 𝑔: Updates the input at positions returned by 𝑓 by 𝑔. We will discuss this in Section 6.
An implementation may also define custom semantics for named filters. For example, an imple-
mentation may define error|𝑐𝑣 ≔ error(𝑣), keys|𝑐𝑣 ≔ keys(𝑣), and length|𝑐𝑣 ≔ |𝑣|, see Section 4.2.
In the case of keys, for example, there is no obvious way to implement it by definition, in par-
ticular because there is no simple way to obtain the domain of an object {…} using only the
filters for which we gave semantics in Table 5. For length, we could give a definition, using
reduce .[] as $𝑥(0; . + 1) to obtain the length of arrays and objects, but this would inherently
require linear time to yield a result, instead of constant time that can be achieved by a proper jq
implementation.

5.1 FOLDING
In this subsection, we will define the functions 𝜙𝑐𝑣 (𝑙, $𝑥, 𝑓) (where 𝜙 is either foreach or reduce),
which underlie the semantics for the folding operators 𝜙 𝑥 as $𝑥(.; 𝑓).
Let us start by defining a general folding function fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜): It takes a stream of value
results 𝑙, a variable $𝑥, a filter 𝑓, and a function 𝑜(𝑥) from a value 𝑥 to a stream of values. This
function folds over the elements in 𝑙, starting from the accumulator value 𝑣. It yields the next
accumulator value(s) by evaluating 𝑓 with the current accumulator value as input and with the
variable $𝑥 bound to the first element in 𝑙. If 𝑙 is empty, then 𝑣 is called a final accumulator value
and is returned, otherwise 𝑣 is called an intermediate accumulator value and 𝑜(𝑣) is returned.

{𝑜(𝑣) + ∑𝑥∈𝑓|𝑐{$𝑥↦ℎ} fold𝑐𝑥 (𝑡, $𝑥, 𝑓, 𝑜) if 𝑙 = ⟨ℎ⟩ + 𝑡
fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜) ≔ ⎨ 𝑣

{
⎩⟨𝑣⟩ otherwise (𝑙 = ⟨⟩)

We use two different functions for 𝑜(𝑣); the first returns nothing, corresponding to reduce which
does not return intermediate values, and the other returns just 𝑣, corresponding to foreach which
returns intermediate values. Instantiating fold with these two functions, we obtain the following:
reduce𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜) where 𝑜(𝑣) = ⟨ ⟩
for𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝑜) where 𝑜(𝑣) = ⟨𝑣⟩

Here, reduce𝑐𝑣 (𝑙, $𝑥, 𝑓) is the function that is used in Table 5. However, for𝑐𝑣 (𝑙, $𝑥, 𝑓) does not im-
plement the semantics of foreach, because it yields the initial accumulator value, whereas foreach
omits it.
Example 5.1.1 : If we would set foreach𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ for𝑐𝑣 (𝑙, $𝑥, 𝑓), then evaluating
foreach (1, 2, 3) as $𝑥(0; . + $𝑥) would yield ⟨0, 1, 3, 6⟩, but jq evaluates it to ⟨1, 3, 6⟩.
For that reason, we define foreach in terms of for, but with a special treatment for the initial ac-
cumulator:

{∑
𝑐
𝑐{$𝑥↦ℎ} for𝑥 (𝑡, $𝑥, 𝑓) if 𝑙 = ⟨ℎ⟩ + 𝑡
foreach𝑐𝑣 (𝑙, $𝑥, 𝑓) ≔ ⎨ 𝑥∈𝑓|𝑣
{
⎩⟨⟩ otherwise

We will now look at what the evaluation of the various folding filters expands to. Apart from
reduce and foreach, we will also consider a hypothetical filter for 𝑥 as $𝑥(.; 𝑓) that is defined by
the function for𝑐𝑣 (𝑙, $𝑥, 𝑓), analogously to the other folding filters.
Assuming that the filter 𝑥 evaluates to ⟨𝑥0 , …, 𝑥𝑛 ⟩, then reduce and for expand to
A formal specification of the jq language 19

reduce 𝑥 as $𝑥(.; 𝑓) = 𝑥0 as $𝑥 | 𝑓 for 𝑥 as $𝑥(.; 𝑓) = ., (𝑥0 as $𝑥 | 𝑓


|… |…
| 𝑥𝑛 as $𝑥 | 𝑓 | ., (𝑥𝑛 as $𝑥 | 𝑓)…)

and foreach expands to


foreach 𝑥 as $𝑥(.; 𝑓) = 𝑥0 as $𝑥 | 𝑓
| ., (𝑥1 as $𝑥 | 𝑓
|…
| ., (𝑥𝑛 as $𝑥 | 𝑓)…).

We can see that the special treatment of the initial accumulator value also shows up in the expan-
sion of foreach. In contrast, the hypothetical for filter looks more symmetrical to reduce.
Note that jq implements only a restricted version of these folding operators that discards all
output values of 𝑓 after the first output. That means that in jq, 𝜙 𝑥 as $𝑥(.; 𝑓) is equivalent to
𝜙 𝑥 as $𝑥(.; first(𝑓)). Here, we assume the definition first(𝑓) ≔ label $𝑥 | 𝑓 | (., break $𝑥). This
returns the first output of 𝑓 if 𝑓 yields any output, else nothing.

6 UPDATE SEMANTICS
In this section, we will discuss how to evaluate updates 𝑓 ⊧ 𝑔. First, we will show how the original
jq implementation executes such updates, and show which problems this approach entails. Then,
we will give alternative semantics for updates that avoids these problems, while enabling faster
performance by forgoing the construction of temporary path data.

6.1 JQ UPDATES VIA PATHS


jq’s update mechanism works with paths. A path is a sequence of indices 𝑖𝑗 that can be written as
.[𝑖1 ]…[𝑖𝑛 ]. It refers to a value that can be retrieved by the filter “.[𝑖1 ] | … | .[𝑖𝑛 ]”. Note that “.” is
a valid path, referring to the input value.
The update operation “𝑓 ⊧ 𝑔” attempts to first obtain the paths of all values returned by 𝑓, then
for each path, it replaces the value at the path by 𝑔 applied to it. Note that 𝑓 is not allowed to
produce new values; it may only return paths.
Example 6.1.1 : Consider the input value [[1, 2], [3, 4]]. We can retrieve the arrays [1, 2] and [3, 4]
from the input with the filter “.[]”, and we can retrieve the numbers 1, 2, 3, 4 from the input with
the filter “.[] | .[]”. To replace each number with its successor, we run “(.[] | .[]) ⊧ . + 1”, obtaining
[[2, 3], [4, 5]]. Internally, in jq, this first builds the paths .[0][0], .[0][1], .[1][0], .[1][1], then updates
the value at each of these paths with 𝑔.
This approach can yield surprising results when the execution of the filter 𝑔 changes the input
value in a way that the set of paths changes midway. In such cases, only the paths constructed
from the initial input are considered. This can lead to paths pointing to the wrong data, paths
pointing to non-existent data, and missing paths.
Example 6.1.2 : Consider the input value {"𝑎" ↦ {"𝑏" ↦ 1}} and the filter (.[], .[][]) ⊧ 𝑔, where
𝑔 is []. Executing this filter in jq first builds the path .["𝑎"] stemming from “.[]”, then .["𝑎"]["𝑏"]
stemming from “.[][]”. Next, jq folds over the paths, using the input value as initial accumula-
tor and updating the accumulator at each path with 𝑔. The final output is thus the output of
(.["𝑎"] ⊧ 𝑔) | (.["𝑎"]["𝑏"] ⊧ 𝑔). The output of the first step .["𝑎"] ⊧ 𝑔 is {"𝑎" ↦ []}. This value is the
input to the second step .["𝑎"]["𝑏"] ⊧ 𝑔, which yields an error because we cannot index the array
[] at the path .["𝑎"] by .["𝑏"].
20 Färber

We can also have surprising behaviour that does not manifest any error.
Example 6.1.3 : Consider the same input value and filter as in Example 6.1.2, but now with 𝑔 set to
{"𝑐" : 2}. The output of the first step .["𝑎"] ⊧ 𝑔 is {"𝑎" ↦ {"𝑐" ↦ 2}}. This value is the input to the
second step .["𝑎"]["𝑏"] ⊧ 𝑔, which yields {"𝑎" ↦ {"𝑐" ↦ 2, "𝑏" ↦ {"𝑐" ↦ 2}}}. Here, the remain-
ing path (.["𝑎"]["𝑏"]) pointed to data that was removed by the update on the first path, so this data
gets reintroduced by the update. On the other hand, the data introduced by the first update step
(at the path .["𝑎"]["𝑐"]) is not part of the original path, so it is not updated.
We found that we can interpret many update filters by simpler filters, yielding the same output
as jq in most common cases, but avoiding the problems shown above. To see this, let us see what
would happen if we would interpret (𝑓1 , 𝑓2 ) ⊧ 𝑔 as (𝑓1 ⊧ 𝑔) | (𝑓2 ⊧ 𝑔). That way, the paths of 𝑓2
would point precisely to the data returned by 𝑓1 ⊧ 𝑔, thus avoiding the problems depicted by the
examples above. In particular, with such an approach, Example 6.1.2 would yield {"𝑎" ↦ []} in-
stead of an error, and Example 6.1.3 would yield {"𝑎" ↦ {"𝑐" ↦ {"𝑐" ↦ 2}}}.
In the remainder of this section, we will show semantics that extend this idea to all update op-
erations. The resulting update semantics can be understood to interleave calls to 𝑓 and 𝑔. By doing
so, these semantics can abandon the construction of paths altogether, which results in higher per-
formance when evaluating updates.

6.2 PROPERTIES OF NEW SEMANTICS


Table 6 gives a few properties that we want to hold for updates 𝜇 ⊧ 𝜎. Let us discuss these for the
different filters 𝜇:
• empty(): Returns the input unchanged.
• “.”: Returns the output of the update filter 𝜎 applied to the current input. Note that while jq only
returns at most one output of 𝜎, these semantics return an arbitrary number of outputs.
• 𝑓 | 𝑔: Updates at 𝑓 with the update of 𝜎 at 𝑔. This allows us to interpret (.[] | .[]) ⊧ 𝜎 in
Example 6.1.1 by .[] ⊧ (.[] ⊧ 𝜎), yielding the same output as in the example.
• 𝑓, 𝑔: Applies the update of 𝜎 at 𝑔 to the output of the update of 𝜎 at 𝑓. We have already seen
this at the end of Section 6.1.
• if $𝑥 then 𝑓 else 𝑔: Applies 𝜎 at 𝑓 if $𝑥 holds, else at 𝑔.
• 𝑓 ⫽ 𝑔: Applies 𝜎 at 𝑓 if 𝑓 yields some output whose boolean value (see Section 4.2) is not false,
else applies 𝜎 at 𝑔. See Section 5.1 for the definition of first.
While Table 6 allows us to define the behaviour of several filters by reducing them to more prim-
itive filters, there are several filters 𝜇 which cannot be defined this way. We will therefore give

𝜇 𝜇⊧𝜎
empty() .
. 𝜎
𝑓 |𝑔 𝑓 ⊧ (𝑔 ⊧ 𝜎)
𝑓, 𝑔 (𝑓 ⊧ 𝜎) | (𝑔 ⊧ 𝜎)
if $𝑥 then 𝑓 else 𝑔 if $𝑥 then 𝑓 ⊧ 𝜎 else 𝑔 ⊧ 𝜎
𝑓 ⫽𝑔 if first(𝑓 ⫽ null) then 𝑓 ⊧ 𝜎 else 𝑔 ⊧ 𝜎
Table 6: Update semantics properties.
A formal specification of the jq language 21

the actual update semantics of 𝜇 ⊧ 𝜎 in Section 6.4 by defining (𝜇 ⊧ 𝜎)|𝑐𝑣 , not by translating 𝜇 ⊧ 𝜎
to equivalent filters.

6.3 LIMITING INTERACTIONS


To define (𝜇 ⊧ 𝜎)|𝑐𝑣 , we first have to understand how to prevent unwanted interactions between
𝜇 and 𝜎. In particular, we have to look at variable bindings and error catching.

6.3.1 VARIABLE BINDINGS


We can bind variables in 𝜇; that is, 𝜇 can have the shape 𝑓 as $𝑥 | 𝑔. Here, the intent is that 𝑔
has access to $𝑥, whereas 𝜎 does not! This is to ensure compatibility with jq’s original semantics,
which execute 𝜇 and 𝜎 independently, so 𝜎 should not be able to access variables bound in 𝜇.
Example 6.3.1.1 : Consider the filter 0 as $𝑥 | 𝜇 ⊧ 𝜎, where 𝜇 is (1 as $𝑥 | .[$𝑥]) and 𝜎 is $𝑥. This
updates the input array at index 1. If 𝜎 had access to variables bound in 𝜇, then the array element
would be replaced by 1, because the variable binding 0 as $𝑥 would be shadowed by 1 as $𝑥.
However, in jq, 𝜎 does not have access to variables bound in 𝜇, so the array element is replaced
by 0, which is the value originally bound to $𝑥. Given the input array [1, 2, 3], the filter yields the
final result [1, 0, 3].
We take the following approach to prevent variables bound in 𝜇 to “leak” into 𝜎: When evaluating
(𝜇 ⊧ 𝜎)|𝑐𝑣 , we want 𝜎 to always be executed with the same 𝑐. That is, evaluating (𝜇 ⊧ 𝜎)|𝑐𝑣 should
never evaluate 𝜎 with any context other than 𝑐. In order to ensure that, we will define (𝜇 ⊧ 𝜎)|𝑐𝑣
not for a filter 𝜎, but for a function 𝜎(𝑥), where 𝜎(𝑥) returns the output of the filter 𝜎|𝑐𝑥 . This allows
us to extend the context 𝑐 with bindings on the left-hand side of the update, while executing the
update filter 𝜎 always with the same original context 𝑐.

6.3.2 ERROR CATCHING


We can catch errors in 𝜇; that is, 𝜇 can have the shape try 𝑓 catch 𝑔. However, this should catch
only errors that occur in 𝜇, not errors that are returned by 𝜎.
Example 6.3.2.1 : Consider the filter 𝜇 ⊧ 𝜎, where 𝜇 is .[]? and 𝜎 is . + 1. The filter 𝜇 is lowered
to the MIR filter try .[] catch empty(). The intention of 𝜇 ⊧ 𝜎 is to update all elements .[] of the
input value, and if .[] returns an error (which occurs when the input is neither an array nor an
object, see Section 4.4), to just return the input value unchanged. When we run 𝜇 ⊧ 𝜎 with the
input 0, the filter .[] fails with an error, but because the error is caught immediately afterwards,
𝜇 ⊧ 𝜎 consequently just returns the original input value 0. The interesting part is what happens
when 𝜎 throws an error: This occurs for example when running the filter with the input [{}]. This
would run . + 1 with the input {}, which yields an error (see Section 4.3). This error is returned
by 𝜇 ⊧ 𝜎.
This raises the question: How can we execute (try 𝑓 catch 𝑔) ⊧ 𝜎 and distinguish errors stem-
ming from 𝑓 from errors stemming from 𝜎?
We came up with the solution of polarised exceptions. In a nutshell, we want every exception
that is returned by 𝜎 to be marked in a special way such that it can be ignored by a try-catch
in 𝜇. For this, we assume the existence of two functions polarise(𝑥) and depolarise(𝑥) from a
value result 𝑥 to a value result. If 𝑥 is an exception, then polarise(𝑥) should return a polarised ver-
sion of it, whereas depolarise(𝑥) should return an unpolarised version of it, i.e. it should remove
any polarisation from an exception. Every exception created by error(𝑒) is unpolarised. With this
method, when we evaluate an expression try 𝑓 catch 𝑔 in 𝜇, we can analyse the output of 𝑓 ⊧ 𝜎,
22 Färber

and only catch unpolarised errors. That way, errors stemming from 𝜇 are propagated, whereas
errors stemming from 𝑓 are caught.

6.4 NEW SEMANTICS


We will now give semantics that define the output of (𝑓 ⊧ 𝑔)|𝑐𝑣 as referred to in Section 5.
We will first combine the techniques in Section 6.3 to define (𝑓 ⊧ 𝑔)|𝑐𝑣 for two filters 𝑓 and 𝑔 by
(𝑓 ⊧ 𝜎)|𝑐𝑣 , where 𝜎 now is a function from a value to a stream of value results:
(𝑓 ⊧ 𝑔)|𝑐𝑣 ≔ ∑ depolarise(𝑦), where 𝜎(𝑥) = ∑ polarise(𝑦).
𝑦∈(𝑓⊧𝜎)|𝑐𝑣 𝑦∈𝑔|𝑐𝑥

We use a function instead of a filter on the right-hand side to limit the scope of variable bindings
as explained in Section 6.3.1, and we use polarise to restrict the scope of caught exceptions, as
discussed in Section 6.3.2. Note that we depolarise the final outputs of 𝑓 ⊧ 𝑔 in order to prevent
leaking polarisation information outside the update.
Table 7 shows the definition of (𝜇 ⊧ 𝜎)|𝑐𝑣 . Several of the cases for 𝜇, like “.”, “𝑓 | 𝑔”, “𝑓, 𝑔”,
and “if $𝑥 then 𝑓 else 𝑔” are simply relatively straightforward consequences of the properties in
Table 6. We discuss the remaining cases for 𝜇:
• 𝑓 ⫽ 𝑔: Updates using 𝑓 if 𝑓 yields some non-false value, else updates using 𝑔. Here, 𝑓 is called
as a “probe” first. If it yields at least one output that is considered “true” (see Section 5 for the
definition of trues), then we update at 𝑓, else at 𝑔. This filter is unusual because is the only kind
where a subexpression is both updated with ((𝑓 ⊧ 𝜎)|𝑐𝑣 ) and evaluated (𝑓|𝑐𝑣 ).
• .[], .[$𝑥], .[$𝑥 : $𝑦]: Applies 𝜎 to the current value using the operators defined in Section 4.5.

𝜇 (𝜇 ⊧ 𝜎)|𝑐𝑣
. 𝜎(𝑣)

𝑓 |𝑔 (𝑓 ⊧ 𝜎 )|𝑐𝑣 where 𝜎′ (𝑥) = (𝑔 ⊧ 𝜎)|𝑐𝑥
𝑓, 𝑔 ∑𝑥∈(𝑓⊧𝜎)|𝑐 (𝑔 ⊧ 𝜎)|𝑐𝑥
𝑣

𝑓 ⫽𝑔 ite(trues(𝑓|𝑐𝑣 ), ⟨⟩, (𝑔 ⊧ 𝜎)|𝑐𝑣 , (𝑓 ⊧ 𝜎)|𝑐𝑣 )


.[] ⟨𝑣[] ⊧ 𝜎(𝑣)⟩
.[$𝑥] ⟨𝑣[𝑐($𝑥)] ⊧ 𝜎(𝑣)⟩
.[$𝑥 : $𝑦] ⟨𝑣[𝑐($𝑥) : 𝑐($𝑦)] ⊧ 𝜎(𝑣)⟩
𝑓 as $𝑥 | 𝑔 reduce𝑐𝑣 (𝑓|𝑐𝑣 , $𝑥, (𝑔 ⊧ 𝜎))
if $𝑥 then 𝑓 else 𝑔 ite(𝑐($𝑥), true, (𝑓 ⊧ 𝜎)|𝑐𝑣 , (𝑔 ⊧ 𝜎)|𝑐𝑣 )
try 𝑓 catch 𝑔 ∑𝑥∈(𝑓⊧𝜎)|𝑐 catch(𝑥, 𝑔, 𝑐, 𝑣)
𝑣

break $𝑥 ⟨break($𝑥)⟩
𝜙 𝑥 as $𝑥(.; 𝑓) 𝜙𝑐𝑣 (𝑥|𝑐𝑣 , $𝑥, 𝑓, 𝜎)
𝑐∪ ⋃𝑖 {𝑥𝑖 ↦(𝑓𝑖 ,𝑐)}
𝑥(𝑓1 ; …; 𝑓𝑛 ) (𝑓 ⊧ 𝜎)|𝑣 if 𝑥(𝑥1 ; …; 𝑥𝑛 ) ≔ 𝑓

𝑥 (𝑓 ⊧ 𝜎)|𝑐𝑣 if 𝑐(𝑥) = (𝑓, 𝑐′ )
Table 7: Update semantics. Here, 𝜇 is a filter and 𝜎(𝑣) is a function from a value 𝑣 to a stream of
value results.
A formal specification of the jq language 23

• 𝑓 as $𝑥 | 𝑔: Folds over all outputs of 𝑓, using the input value 𝑣 as initial accumulator and up-
dating the accumulator by 𝑔 ⊧ 𝜎, where $𝑥 is bound to the current output of 𝑓. The definition
of reduce is given in Section 5.1.
• try 𝑓 catch 𝑔: Returns the output of 𝑓 ⊧ 𝜎, mapping errors occurring in 𝑓 to 𝑔. The definition
of the function catch is
⎧∑ 𝑐 ⟨error(𝑦)⟩ if 𝑥 = error(𝑒), 𝑥 is unpolarised, and 𝑔|𝑐𝑥 ≠ ⟨⟩
{
{ 𝑦∈𝑔|𝑒
catch(𝑥, 𝑔, 𝑐, 𝑣) ≔ ⎨⟨𝑣⟩ if 𝑥 = error(𝑒), 𝑥 is unpolarised, and 𝑔|𝑐𝑥 = ⟨⟩
{
{⟨𝑥⟩
⎩ otherwise

The function catch(𝑥, 𝑔, 𝑐, 𝑣) analyses 𝑥 (the current output of 𝑓): If 𝑥 is no unpolarised error, 𝑥
is returned. For example, that is the case if the original right-hand side of the update returns an
error, in which case we do not want this error to be caught here. However, if 𝑥 is an unpolarised
error, that is, an error that was caused on the left-hand side of the update, it has to be caught
here. In that case, catch analyses the output of 𝑔 with input 𝑥: If 𝑔 yields no output, then it
returns the original input value 𝑣, and if 𝑔 yields output, all its output is mapped to errors! This
behaviour might seem peculiar, but it makes sense when we consider the jq way of implement-
ing updates via paths: When evaluating some update 𝜇 ⊧ 𝜎 with an input value 𝑣, the filter 𝜇
may only return paths to data contained within 𝑣. When 𝜇 is try 𝑓 catch 𝑔, the filter 𝑔 only
receives inputs that stem from errors, and because 𝑣 cannot contain errors, these inputs cannot
be contained in 𝑣. Consequentially, 𝑔 can never return any path pointing to 𝑣. The only way,
therefore, to get out alive from a catch is for 𝑔 to return … nothing.
• break($𝑥): Breaks out from the update.¹⁰
• 𝜙 𝑥 as $𝑥(.; 𝑓): Folds 𝑓 over the values returned by $𝑥. We will discuss this in Section 6.5.
• 𝑥(𝑓1 ; …; 𝑓𝑛 ), 𝑥: Calls filters. This is defined analogously to Table 5.
There are many filters 𝜇 for which (𝜇 ⊧ 𝜎)|𝑐𝑣 is not defined, for example $𝑥, [𝑓], and {}. In such
cases, we assume that (𝜇 ⊧ 𝜎)|𝑐𝑣 returns an error just like jq, because these filters do not return
paths to their input data. Our semantics support all kinds of filters 𝜇 that are supported by jq,
except for label $𝑥 | 𝑔.
Example 6.4.1 (The Curious Case of Alternation): The semantics of (𝑓 ⫽ 𝑔) ⊧ 𝜎 can be rather sur-
prising: For the input {"𝑎" ↦ true}, the filter (.["𝑎"] ⫽ .["𝑏"]) ⊧ 1 yields {"𝑎" ↦ 1}. This is what
we might expect, because the input has an entry for "𝑎". Now let us evaluate the same filter on
the input {"𝑎" ↦ false}, which yields {"𝑎" ↦ false, "𝑏" ↦ 1}. Here, while the input still has an
entry for "𝑎" like above, its boolean value is not true, so .["𝑏"] ⊧ 1 is executed. In the same spirit,
for the input {} the filter yields {"𝑏" ↦ 1}, because .["𝑎"] yields null for the input, which also has
the boolean value false, therefore .["𝑏"] ⊧ 1 is executed.
For the input {}, the filter (false ⫽ .["𝑏"]) ⊧ 1 yields {"𝑏" ↦ 1}. This is remarkable insofar as
false is not a valid path expression because it returns a value that does not refer to any part of
the original input, yet the filter does not return an error. This is because false triggers .["𝑏"] ⊧ 1,

¹⁰Note that unlike in Section 5, we do not define the update semantics of label $𝑥 | 𝑓, which could
be used to resume an update after a break. The reason for this is that this requires an additional type of
break exceptions that carries the current value alongside the variable, as well as variants of the value
update operators in Section 4.5 that can handle unpolarised breaks. Because making update operators
handle unpolarised breaks renders them considerably more complex and we estimate that label
expressions are rarely used in the left-hand side of updates anyway, we think it more beneficial for the
presentation to forgo label expressions here.
24 Färber

so false is never used as path expression. However, running the filter (true ⫽ .["𝑏"]) ⊧ 1 does yield
an error, because true triggers true ⊧ 1, and true is not a valid path expression.
Finally, on the input [], the filter (.[] ⫽ error) ⊧ 1 yields error([]). That is because .[] does not
yield any value for the input, so error ⊧ 1 is executed, which yields an error.

6.5 FOLDING
In Section 5.1, we have seen how to evaluate folding filters of the shape 𝜙 𝑥 as $𝑥(.; 𝑓), where 𝜙
is either reduce or foreach. Here, we will define update semantics for these filters. These update
operations are not supported in jq 1.7; however, we will show that they arise quite naturally from
previous definitions.
Let us start with an example to understand folding on the left-hand side of an update.
Example 6.5.1 : Let 𝑣 = [[[2], 1], 0] be our input value and 𝜇 be the filter 𝜙(0, 0) as $𝑥(.; .[$𝑥]). The
regular evaluation of 𝜇 with the input value as described in Section 5 yields
⎧⟨ [2]⟩ if 𝜙 = reduce
{
{}
𝜇|𝑣 = ⎨⟨𝑣, [[2], 1], [2]⟩ if 𝜙 = for
{⟨ [[2], 1], [2]⟩ if 𝜙 = foreach

When 𝜙 = for, the paths corresponding to the output are ., .[0], and .[0][0], and when 𝜙 = reduce,
the paths are just .[0][0]. Given that all outputs have corresponding paths, we can update over
them. For example, taking . + [3] as filter 𝜎, we should obtain the output
⎧⟨[[[2, 3], 1 ], 0 ]⟩ if 𝜙 = reduce
{}
{
(𝜇 ⊧ 𝜎)𝑣 = ⎨⟨[[[2, 3], 1, 3], 0, 3]⟩ if 𝜙 = for
{⟨[[[2, 3], 1, 3], 0 ]⟩ if 𝜙 = foreach

First, note that for folding filters, the lowering in Table 2 and the defining equations in Section 5.1
only make use of filters for which we have already introduced update semantics in Table 7. This
should not be taken for granted; for example, we originally lowered 𝜙 𝑓𝑥 as $𝑥(𝑓𝑦 ; 𝑓) to
⌊𝑓𝑦 ⌋ as $𝑦 | 𝜙⌊𝑓𝑥 ⌋ as $𝑥($𝑦; ⌊𝑓⌋)

instead of the more complicated lowering found in Table 2, namely


. as $𝑥′ | ⌊𝑓𝑦 ⌋ | 𝜙⌊$𝑥′ | 𝑓𝑥 ⌋ as $𝑥(.; ⌊𝑓⌋).

While both lowerings produce the same output for regular evaluation, we cannot use the original
lowering for updates, because the defining equations for 𝜙 𝑥 as $𝑥($𝑦; 𝑓) would have the shape
$𝑦 | …, which is undefined on the left-hand side of an update. However, the lowering in Table 2
avoids this issue by not binding the output of 𝑓𝑦 to a variable, so it can be used on the left-hand
side of updates.
To obtain an intuition about how the update evaluation of a fold looks like, we can take
𝜙 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎, substitute the left-hand side by the defining equations in Section 5.1 and ex-
pand everything using the properties in Section 6.2. This yields
reduce 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎 = ((𝑥0 as $𝑥 | 𝑓) for 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎 = 𝜎 | ((𝑥0 as $𝑥 | 𝑓)
⊧… ⊧…
⊧ ((𝑥𝑛 as $𝑥 | 𝑓) ⊧ 𝜎 | ((𝑥𝑛 as $𝑥 | 𝑓)
⊧ 𝜎)…) ⊧ 𝜎)…)

and foreach steps out of line again by not applying 𝜎 initially:


A formal specification of the jq language 25

foreach 𝑥 as $𝑥(.; 𝑓) ⊧ 𝜎 = ((𝑥0 as $𝑥 | 𝑓)


⊧ 𝜎 | ((𝑥1 as $𝑥 | 𝑓)
⊧…
⊧ 𝜎 | ((𝑥𝑛 as $𝑥 | 𝑓)
⊧ 𝜎)…).

Example 6.5.2 : To see the effect of above equations, let us reconsider the input value and the filters
from Example 6.5.1. Using some liberty to write .[0] instead of 0 as $𝑥 | .[$𝑥], we have:
⎧ .[0] ⊧ .[0] ⊧ 𝜎 if 𝜙 = reduce
{
𝜇 ⊧ 𝜎 = ⎨𝜎 | (.[0] ⊧ 𝜎 | (.[0] ⊧ 𝜎)) if 𝜙 = for
{ .[0] ⊧ 𝜎 | (.[0] ⊧ 𝜎) if 𝜙 = foreach

We will now formally define the functions 𝜙𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎) used in Table 7. For this, we first intro-
duce a function fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎, 𝑜), which resembles its corresponding function in Section 5.1,
but which adds an argument for the update filter 𝜎:

{∑ (𝑓 ⊧ 𝜎′ )|𝑐{$𝑥↦ℎ} if 𝑙 = ⟨ℎ⟩ + 𝑡 and 𝜎′ (𝑥) = fold𝑐𝑥 (𝑡, $𝑥, 𝑓, 𝜎, 𝑜)
𝑦
fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎, 𝑜) ≔ ⎨ 𝑦∈𝑜(𝑣)
{
⎩𝜎(𝑣) otherwise (𝑙 = ⟨⟩)

Using this function, we can now define


reduce𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎) ≔ fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎, 𝑜) where 𝑜(𝑣) = ⟨𝑣⟩
for𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎) ≔ fold𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎, 𝑜) where 𝑜(𝑣) = 𝜎(𝑣)

as well as
(𝑓 ⊧ 𝜎′ )|𝑐{$𝑥↦ℎ}
𝑣 if 𝑙 = ⟨ℎ⟩ + 𝑡 and 𝜎′ (𝑥) = for𝑐𝑥 (𝑡, $𝑥, 𝑓, 𝜎)
foreach𝑐𝑣 (𝑙, $𝑥, 𝑓, 𝜎) ≔ {
⟨𝑣⟩ otherwise

7 EQUATIONAL REASONING SHOWCASE: OBJECT CONSTRUCTION


We will now show how to prove properties about HIR filters by equational reasoning. For this,
we use the lowering in Section 3.2 and the semantics defined in Section 5. As an example, we will
show a few properties of object construction.
Let us start by proving a few helper lemmas, where 𝑐 and 𝑣 always denote some arbitrary con-
text and value, respectively.
Lemma 7.1: For any HIR filters 𝑓 and 𝑔 and any Cartesian operator ⚬ (such as addition, see Table 1),
we have ⌊𝑓 ⚬ 𝑔⌋|𝑐𝑣 = ∑𝑥∈⌊𝑓⌋|𝑐 ∑𝑦∈⌊𝑔⌋|𝑐 ⟨𝑥 ⚬ 𝑦⟩.
𝑣 𝑣

Proof: The lowering in Table 2 yields ⌊𝑓 ⚬ 𝑔⌋|𝑐𝑣 = (⌊𝑓⌋ as $𝑥′ | ⌊𝑔⌋ as $𝑦′ | $𝑥′ ⚬ $𝑦′ )|𝑐𝑣 .
Using the evaluation semantics in Table 5, we can further expand this to

↦𝑥,$𝑦′ ↦𝑦}
∑𝑥∈⌊𝑓⌋|𝑐 ∑𝑦∈⌊𝑔⌋𝑐{$𝑥′↦𝑥} ($𝑥′ ⚬ $𝑦′ )|𝑐{$𝑥
𝑣 . Because $𝑥′ and $𝑦′ are fresh variables, we
𝑣 𝑣 ′
know that they cannot occur in ⌊𝑔⌋, so ⌊𝑔⌋𝑐{$𝑥 ↦𝑥}
𝑣
= ⌊𝑔⌋𝑐𝑣 . Furthermore, by the evaluation se-
′ ′
mantics, we have ($𝑥′ ⚬ $𝑦′ )|𝑐{$𝑥
𝑣
↦𝑥,$𝑦 ↦𝑦}
= ⟨𝑥 ⚬ 𝑦⟩. From these two observations, the conclu-
sion immediately follows. □
Lemma 7.2 : For any HIR filters 𝑓 and 𝑔, we have ⌊{𝑓 : 𝑔}⌋|𝑐𝑣 = ∑𝑥∈⌊𝑓⌋|𝑐 ∑𝑦∈⌊𝑔⌋|𝑐 ⟨{𝑥 : 𝑦}⟩.
𝑣 𝑣
26 Färber

Proof: Analogously to the proof of Lemma 7.1. □


We can now proceed by stating a central property of object construction.
Theorem 7.3: For any 𝑛 ∈ ℕ with 𝑛 > 0, we have that ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛 : 𝑣𝑛 }⌋|𝑐𝑣 is equivalent to
∑ ∑ … ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 }⟩.
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛 ∈⌊𝑘𝑛 ⌋|𝑐𝑣 𝑣𝑛 ∈⌊𝑣𝑛 ⌋|𝑐𝑣 𝑖

Proof: We will prove by induction on 𝑛. The base case 𝑛 = 1 directly follows from Lemma 7.2. For
the induction step, we have to show that ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 is equivalent to
𝑛+1
∑ ∑ … ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 }⟩.
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛+1 ∈⌊𝑘𝑛+1 ⌋|𝑐𝑣 𝑣𝑛+1 ∈⌊𝑣𝑛+1 ⌋|𝑐𝑣 𝑖

We start by
(lowering)
⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 =

= ⌊∑{𝑘𝑖 : 𝑣𝑖 }⌋|𝑐𝑣 =
𝑖
𝑛
(Lemma 7.1)
= ⌊∑{𝑘𝑖 : 𝑣𝑖 } + {𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 =
𝑖=1

= ∑ ∑ ⟨𝑥 + 𝑦⟩.
𝑥∈⌊∑𝑛
𝑖=1
{𝑘𝑖 :𝑣𝑖 }⌋|𝑐𝑣 𝑦∈⌊{𝑘𝑛+1 :𝑣𝑛+1 }⌋|𝑐𝑣

Here, we observe that ⌊∑𝑛𝑖=1 {𝑘𝑖 : 𝑣𝑖 }⌋|𝑐𝑣 = ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛 : 𝑣𝑛 }⌋|𝑐𝑣 , which by the induction hy-
pothesis equals
𝑛
∑ ∑ … ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 }⟩.
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛 ∈⌊𝑘𝑛 ⌋|𝑐𝑣 𝑣𝑛 ∈⌊𝑣𝑛 ⌋|𝑐𝑣 𝑖

We can use this to resume the simplification of ⌊{𝑘1 : 𝑣1 , …, 𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 to
𝑛
∑ ∑ … ∑ ∑ ∑ ⟨∑{𝑘𝑖 : 𝑣𝑖 } + 𝑦⟩
𝑘1 ∈⌊𝑘1 ⌋|𝑐𝑣 𝑣1 ∈⌊𝑣1 ⌋|𝑐𝑣 𝑘𝑛 ∈⌊𝑘𝑛 ⌋|𝑐𝑣 𝑣𝑛 ∈⌊𝑣𝑛 ⌋|𝑐𝑣 𝑦∈⌊{𝑘𝑛+1 :𝑣𝑛+1 }⌋|𝑐𝑣 𝑖

Finally, applying Lemma 7.2 to ⌊{𝑘𝑛+1 : 𝑣𝑛+1 }⌋|𝑐𝑣 proves the induction step. □
We can use this theorem to simplify the evaluation of filters such as the following one.
Example 7.1: The evaluation of {"𝑎" : (1, 2), ("𝑏", "𝑐") : 3, "𝑑" : 4} yields ⟨𝑣0 , 𝑣1 , 𝑣2 , 𝑣3 ⟩, where
𝑣0 = {"𝑎" ↦ 1, "𝑏" ↦ 3, "𝑑" ↦ 4},
𝑣1 = {"𝑎" ↦ 1, "𝑐" ↦ 3, "𝑑" ↦ 4},
𝑣2 = {"𝑎" ↦ 2, "𝑏" ↦ 3, "𝑑" ↦ 4},
𝑣3 = {"𝑎" ↦ 2, "𝑐" ↦ 3, "𝑑" ↦ 4}.

8 CONCLUSION
We have shown formal syntax and semantics of a large subset of the jq programming language.
On the syntax side, we first defined formal syntax (HIR) that closely corresponds to actual jq
syntax. We then gave a lowering that reduces HIR to a simpler subset (MIR), in order to simplify
A formal specification of the jq language 27

the semantics later. We finally showed how a subset of actual jq syntax can be translated into HIR
and thus MIR.
On the semantics side, we gave formal semantics based on MIR. First, we defined values and
basic operations on them. Then, we used this to define the semantics of jq programs, by specifying
the outcome of the execution of a jq program. A large part of this was dedicated to the evaluation
of updates: In particular, we showed a new approach to evaluate updates. This approach, unlike
the approach implemented in jq, does not depend on separating path building and updating, but
interweaves them. This allows update operations to cleanly handle multiple output values in cases
where this was not possible before. Furthermore, in practice, this avoids creating temporary data
to store paths, thus improving performance. This approach is also mostly compatible with the
original jq behaviour, yet it is unavoidable that it diverges in some corner cases.
We hope that our work is useful in several ways: For users of the jq programming language, it
provides a succinct reference that precisely documents the language. Our work should also ben-
efit implementers of tools that process jq programs, such as compilers, interpreters, or linters. In
particular, this specification should be sufficient to implement the core of a jq compiler or inter-
preter. Finally, our work enables equational reasoning about jq programs. This makes it possible to
prove correctness of jq programs or to implement provably correct optimisations in jq compilers/
interpreters.

BIBLIOGRAPHY
[1] D. M. Ritchie, “The UNIX system: The evolution of the UNIX time-sharing system”, AT&T Bell
Lab. Tech. J., vol. 63, no. 8, pp. 1577–1593, 1984, doi: 10.1002/j.1538-7305.1984.tb00054.x.
[2] T. Bray, “The JavaScript Object Notation (JSON) Data Interchange Format”. Accessed: Feb. 22,
2023. [Online]. Available: https://www.rfc-editor.org/info/rfc8259
[3] Paris Data, “Dénominations des emprises des voies actuelles”. Accessed: Feb. 22,
2023. [Online]. Available: https://opendata.paris.fr/explore/dataset/denominations-emprises-
voies-actuelles/
[4] N. Williams and jqlang contributors, “jq language description”. Accessed: Feb. 20, 2023. [On-
line]. Available: https://github.com/jqlang/jq/wiki/jq-Language-Description
[5] S. Dolan and jqlang contributors, “jq 1.7 manual”. Accessed: Feb. 20, 2023. [Online]. Available:
https://jqlang.github.io/jq/manual/v1.7/
[6] J. N. Foster, A. Pilkiewicz, and B. C. Pierce, “Quotient lenses”, in Proceeding of the 13th
ACM SIGPLAN international conference on Functional programming, ICFP 2008, Victoria, BC,
Canada, September 20-28, 2008, J. Hook and P. Thiemann, Eds., ACM, 2008, pp. 383–396. doi:
10.1145/1411204.1411257.
[7] J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, and A. Schmitt, “Combinators for bi-di-
rectional tree transformations: a linguistic approach to the view update problem”, in Proceed-
ings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL 2005, Long Beach, California, USA, January 12-14, 2005, J. Palsberg and M. Abadi, Eds.,
ACM, 2005, pp. 233–246. doi: 10.1145/1040305.1040325.
[8] M. Pickering, J. Gibbons, and N. Wu, “Profunctor Optics: Modular Data Accessors”, Art Sci.
Eng. Program., vol. 1, no. 2, p. 7, 2017, doi: 10.22152/programming-journal.org/2017/1/7.

You might also like