DRLT Correction Note
DRLT Correction Note
September 2024
1
to be part of the pool (as defined in A) is included. These two cases lead
to the following changes in the ith row of Â, and in both cases the choice
of j ∈ [p] is random uniform: Case 1: Âij = −1 but Aij = 1, Case 2:
Âij = 1 but Aij = −1.
2
the bit-flip is found. Note that as per SSM, a given row can contain bit-flips in
at the most one entry. The procedure for correction of bit-flips in ASM is quite
similar to the one in Alg. 1. Here, instead of toggling Aij in Step 2 of Alg. 1,
we instead do as follows: If Ai,j ̸= Ai,j ′ , then swap the values of Ai,j and Ai,j ′
where j ′ = mod(j + 1, p). The remaining steps are exactly as in Alg. 1. For
RSM, we do the following: In any row of A, we check for all pairs of unequal
entries Ai,j1 and Ai,j2 with j1 ̸= j2 and swap their values. The rest of the steps
are exactly as in Alg. 1. Detailed versions of these modifications for ASM, RSM,
are given in [1]. The matrix thus obtained from the correction algorithm can
then be used to re-estimate β ∗ using Lasso.
Algorithm 1 Correction for bit-flips following the Single Switch Model (SSM)
Require: Measurement vector y, pooling matrix A, Lasso estimate β̂λ1 , λ
and the set J of corrupted measurements estimated by ODrlt method
Ensure: Bit-flip corrected matrix Ã
1: for every i ∈ J do
2: for every j ∈ [p] do
3: if β̂λ1 ,j ̸= 0 then
4: Set bf := −1 (bit-flip flag), α := 0.01, Aij := −Aij (induce a bit-flip
at Aij ).
5: Find the solution β̂λ1 , δ̂λ2 to the convex program given in Eqn.(6)
of [2].
6: Compute the weight matrix W of as given in Alg.2 of [2].
7: Calculate the debiased Lasso estimate δ̂W given by Eqn.(15) of [2].
3
Run # NP NP T RRMSE-ODrlt-D RRMSE-ODrlt-C
1 8 7 0.0477 0.0322
2 8 6 0.0512 0.0452
3 8 6 0.0498 0.0428
4 8 7 0.0465 0.0366
Table 1: Avg. #RSM errors (NP ) and avg. #RSM errors detected correctly
(NP T ) athe RRMSE of reconstruction of β ∗ using Lasso after discarding the
measurements in J (ODrlt-D) and the RRMSE of reconstruction of β ∗ using
Rl after correction using y, Ã (ODrlt-C)
4
Algorithm 2 Correction for bit-flips following the Random Switch Model
(RSM) using W = A
Require: Measurement vector y, pooling matrix A, Lasso estimate β̂λ1 , λ
and the set J of corrupted measurements estimated by ODrlt method
Ensure: Bit-flip corrected matrix Ã
1: for every i ∈ J do
2: Set bf 1 := −1, bf 2 = −1 (bit-flip flag), max-p value= 0.01.
3: for every j ∈ [p − 1] do
4: for l ∈ [p] do
5: if {{Aij == −1}and{Ail == 1}}or{{Aij == 1}and{Ail ==
−1}} then
6: Anew = A.
7: if Aij == 1 then
8: Aij = −1, Ail = 1.
9: else if Aij == −1 then
10: Aij = 1, Ail = −1.
11: end if
12: Find the solution β̂λ1 , δ̂λ2 to the convex program given in Eqn.(6)
of [2].
13: Calculate the debiased Lasso estimate δ̂W given by Eqn.(15) using
W = A of [2].
14: Set pval = 1 − Φ(TH,i = √[δ̂W ]i ), where Σδ is defined in Eqn.(28)
[Σδ ]ii
of [2].
15: if pval ≥ max-p value then
16: set bf 1 := j, bf 2 := l, max-p value = pval .
17: end if
18: if bf 1 ! = −1 and bf 2 ! = −1 then {bit-flip not detected at Aij }
19: Aij = −Aij , Ail = −Ail {reverse induced bit-flip in Aij }
20: end if
21: end for
22: end for
23: end for
24: return à = A.
5
Algorithm 3 Correction for bit-flips following the Random Switch Model
(RSM) using W = copt A
Require: Measurement vector y, pooling matrix A, Lasso estimate β̂λ1 , λ
and the set J of corrupted measurements estimated by ODrlt method
Ensure: Bit-flip corrected matrix Ã
1: for every i ∈ J do
2: Set bf 1 := −1, bf 2 = −1 (bit-flip flag), max-p value= 0.01.
3: for every j ∈ [p − 1] do
4: for l ∈ [p] do
5: if {{Aij == −1}and{Ail == 1}}or{{Aij == 1}and{Ail ==
−1}} then
6: if Aij == 1 then
7: Aij = −1, Ail = 1.
8: else if Aij == −1 then
9: Aij = 1, Ail = −1.
10: end if
11: Find the solution β̂λ1 , δ̂λ2 to the convex program given in Eqn.(6)
of [2].
12: Calculate the debiased Lasso estimate δ̂W given by Eqn.(15) using
W = copt A of [2].
13: Set pval = 1 − Φ(TH,i = √[δ̂W ]i ), where Σδ is defined in Eqn.(28)
[Σδ ]ii
of [2].
14: if pval ≥ max-p value then
15: set bf 1 := j, bf 2 := l, max-p value = pval .
16: end if
17: if bf 1 ! = −1 and bf 2 ! = −1 then {bit-flip not detected at Aij }
18: Aij = −Aij , Ail = −Ail {reverse induced bit-flip in Aij }
19: end if
20: end for
21: end for
22: end for
23: return à = A.
6
Ip − nc A⊤ A is 1 − c. This implies, |1 − c| ≤ µ1 . Clearly, −µ2 + 1/p ≤ µ1 + 1
p
and −µ3 1 − n/p + 1 ≤ µ1 + 1. Hence, |1 − copt | ≤ µ + 1. In case of the off-
diagonal elements, for l, j ∈ [p], the (j, l)th element of the matrix Ip − nc A⊤ A
a⊤
.j a.l a⊤
.j a.l
is c n . Since, −1 ≤ n ≤ 1, we have −µ1 ≤ c ≤ µ1 . Clearly, copt satisfies
this condition too. This implies copt A satisfies constraint C1.
Next, if we plug
in the solution cA in the constraint C2, we get
1 c ⊤
p I n − n AA A ≤ µ2 . For i ∈ [n] and j ∈ [p], the (i, j)th element
∞
a Pp Pn
of the matrix p1 In − nc AA⊤ A is given by pij − np c
l=1 k=1 ail akl akj as
shown
Pp inP result (2) of Lemma 5 of [2]. Since A is Rademacher, we have, −1 ≤
1 n
np l=1 k=1 ail akl akj ≤ 1. Hence, we have, −µ2 + 1/p ≤ c ≤ µ2 + 1. Clearly,
copt satisfies the constraint C2.
Lastly, let us plug in p
the solution cA in the constraint C3, we get
In − pc AA⊤ ≤ 1 − n/pµ3 . For all i ∈ [n] the (i, i)th element of the
∞
matrix In − pc AA⊤ is 1−c. This implies, |1−c| ≤ µ1 . Clearly, −µ2 +1/p ≤ µ1 +1
p
and −µ3 1 − n/p + 1 ≤ µ1 + 1. Hence, |1 − copt | ≤ µ + 1. In case of the off-
diagonal elements, for i, k ∈ [n], the (i, k)th element of the matrix Ip − nc A⊤ A
a a⊤ a a⊤ p p
is c i.p k. . Since, −1 ≤ i.p k. ≤ 1, we have − 1 − n/pµ3 ≤ c ≤ 1 − n/pµ3 .
Clearly, copt satisfies this condition too. This implies copt A satisfies constraint
C3.
Hence, copt A is a feasible solution to the optimal solution
Pp of Alg.2 of [2].
Note that, the objective function of Alg.2 of [2] is given by j=1 w.j ⊤ w.j . For
W = cA, the objective function is given by nc2 . Since we want to minimise
the objective function nc2 , the p smallest value it can take is for c = copt =
max{−µ1 + 1, −µ2 + 1/p, −µ3 1 − n/p + 1} based on the constraints C1, C2
and C3. Therefore, among all feasible solutions of the form W = cA, copt A is
the optimal solution.
7
Run # NP NP A NP cA
1 4 2 3
2 2 1 2
3 4 2 3
4 3 2 2
5 5 3 4
Table 3: Run-1: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
corrected bit-flips ( C), #RSM errors (NP ) and #RSM errors detected correctly
(NP T ) and incorrectly (NP F ), after each stage of correction in Drlt-C for r = 8
RSM errors.
Table 4: Run-2: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
corrected bit-flips ( C), #RSM errors (NP ) and #RSM errors detected correctly
(NP T ) and incorrectly (NP F ), after each stage of correction in Drlt-C for r = 8
RSM errors.
8
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,13,44,67,72,145,219,276,361) (2,67,145,219,361)
2 (72,145,392) (13,72,145,276,392) (72,392)
3 (145) (13,145) (145)
4 ϕ (13) ϕ
Table 5: First β, first run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 6: fσ = 0.01: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 6
(ii ) We then perform debiasing of these estimates as shown in Eqn. (14) and
(15) of [2]. (iii ) Based on the new set of detected permuted measurements, we
perform correction given by Alg. 3. After each stage of correction, we report
the average number of measurements correctly detected to have RSM errors,
the average number incorrectly detected to have RSM errors and the average
actual number of RSM errors (over 20 noise runs). These results are presented
in Table 4 which shows that after the fifth stage, the test only falsely detects a
small number of permutations in the model and there are no permutations left
in the sixth stage.
9
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,67,89,192,219,222,307,392) (2,67,219,392)
2 (72,361) (72,89,192,222) (72,192)
3 (192,361) (89,192,222,361) (192,361)
4 ϕ (89,222) ϕ
Table 7: First β, second run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 8: First β, second run: Pre correction and post correction RRMSE,
Sensitivity and Specificity, after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 9: First β, third run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 10: First β, third run: Pre correction and post correction RRMSE, Sensi-
tivity and Specificity, after each stage of correction in Drlt-C for RSM errors.
The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
10
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (23,96,117,156,213,272,329,336,346) (96,117,213,329,346)
2 (213,272) (23,96,213,276) (213,272)
3 ϕ (23) ϕ
Table 11: Second β, first run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
Table 12: Second β, first run: Pre correction and post correction RRMSE,
Sensitivity and Specificity, after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
Table 13: Second β, second run: Set of true bit-flips (B), set of detected bitf-
flips (J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
Table 14: Second β, second run: Pre correction and post correction RRMSE,
Sensitivity and Specificity, after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
Table 15: Second β, third run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
11
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0602 0.0452 0.8 1 0.992 0.995 1e-13
2 0.0452 0.0427 1 1 0.995 0.997 1e-8
3 0.0452 0.0427 1 1 0.995 0.997 0.00044
Table 16: Second β, third run: Pre correction and post correction RRMSE,
Sensitivity and Specificity, after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (23,44,166,192,245,289,356) (23,44,98,166,177,192,245,289,332,396) (23,44,166,192,245,332)
2 (289,332,356) (98,177,289,332,356) (98,289,356)
3 (98,332) (98,177,332) (98,332)
4 ϕ ϕ ϕ
Table 17: Third β, first run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 7
# RRMSE Sens Sens p-value sim
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0692 0.0577 0.713 0.8571 0.989 0.995 1e-12
2 0.0577 0.0512 0.8571 1 0.995 0.998 1e-7
3 0.0512 0.0478 1 1 0.997 1 0.00034
Table 18: Third β, first run: Pre correction and post correction RRMSE, Sensi-
tivity and Specificity, after each stage of correction in Drlt-C for RSM errors.
The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 7
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (23,44,166,192,245,289,356) (44,67,98,166,192,203,245,289,317,334,396) (44,166,192,245,289,334)
2 (23,334,356) (23,67,98,334,356) (23,334,356)
3 ϕ (67,98) ϕ
Table 19: Third β, second run: Set of true bit-flips (B), set of detected bitf-flips
(J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C
for RSM errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 7
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0688 0.0492 0.857 1 0.992 0.995 1e-11
2 0.0492 0.0441 1 1 0.995 0.997 1e-7
3 0.0492 0.0441 1 1 0.995 0.997 0.00093
Table 20: Third β, second run: Pre correction and post correction RRMSE,
Sensitivity and Specificity, after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 7
12
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,13,44,67,72,145,219,276,361) (2,67,145,219,361)
2 (72,145,392) (13,72,145,276,392) (72,392)
3 (145) (13,145) (145)
4 ϕ (13) ϕ
Table 21: fσ = 0.01: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of altered bit-flips ( C), after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, s = 10, r = 6
Table 22: fσ = 0.01: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 6
13
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,22,52,67,89,194,198,219.232,271,302,311,361,387) (2,67,89,219,311)
2 (72,89,311,361,392) (22,52,72,89,198,271,302,311,387,392) (72,89,198,311)
3 (198,361,392) (22,198,271,311,361,392) (198,271,392)
4 (271,361) (22,271,311,361) (271)
5 (361) (22,361) (361)
6 ϕ (22) ϕ
Table 23: fσ = 0.03: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of altered bit-flips ( C), after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, s = 10, r = 6
Table 24: fσ = 0.03: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 6
Table 25: fσ = 0.05: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of altered bit-flips ( C), after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, s = 10, r = 6
14
# RRMSE Sens Spec p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.1033 0.0898 0.67 0.6 0.951 0.974 1e-16
2 0.0898 0.0852 0.6 0.8 0.974 0.989 1e-12
3 0.0852 0.0831 0.8 0.75 0.989 0.992 1e-9
4 0.0801 0.0774 0.75 1 0.992 0.995 1e-6
5 0.0774 0.0752 1 1 0.995 0.997 0.000252
6 0.0752 0.0737 1 1 0.997 0.997 0.00121
Table 26: fσ = 0.05: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 6
Table 27: s = 5: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of altered bit-flips ( C), after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, r = 2
Table 28: s = 5: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 2
Table 29: s = 10: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of altered bit-flips ( C), after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, r = 5
15
# RRMSE Sens Spec p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0598 0.0407 0.8 1 0.987 0.992 1e-6
2 0.0407 0.0392 1 1 0.992 1 0.00091
3 0.0392 0.0379 1 1 1 1 0.0253
Table 30: s = 10: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 5
Table 31: s = 15: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of altered bit-flips ( C), after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, r = 7
Table 32: s = 15: Pre correction and post correction RRMSE, Sensitivity and
Specificity and p-value for simultaneous tests as given in (35) after each stage
of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 7
16
Indices of true effective bit-flips fσ = 0.01 fσ = 0.03 fσ = 0.05
2 1(1) 0.9 (1) 0.7 (0.9)
67 1(1) 1 (1) 0.9 (1)
72 0.8 (0.8) 0.6 (0.8) 0.2 (0.5)
219 0.6 (0.9) 0.5 (0.7) 0.5 (0.6)
361 0.9 (1) 0.8 (0.8) 0.5 (0.7)
392 0.4 (0.5) 0.1 (0.3) 0 (0.1)
Table 33: Effective Bit-flips correctly altered: first column represent the indices
that have true effective bit-flips. Columns 2,3,4 represent the proportion of
times that particular bitflip has been correctly altered by the RSM correction
algorithm for fσ = 0.01, fσ = 0.03 and fσ = 0.05 respectively. The fixed
parameters of the experiments are p = 500, n = 400, s = 10, r = 6. In brackets
are the proportion of the number of times that bit-flip was detected in the first
stage of RSM correction.
particular bitflip has been correctly altered by the RSM correction algorithm
for fσ = 0.01, fσ = 0.03 and fσ = 0.05 respectively. The objective of this
experiment is to observe how the correction of bit-flips worsen with increase in
noise variance.
17
Run # NP NP cA Rmse pre-correction Rmse post-correction
1 4 4 23.25 5.75
2 2 2 11.5 3.5
3 4 3 15.5 5.25
4 3 3 17.67 6.67
5 5 4 25.2 8.4
Table 34: True number of bitflips (NP ), the number of corrected bitflips by
Alg.4, RM SEJ pre-correction and RM SEJ post-correction
Table 35: Run-3: Set of true bit-flips (B), set of detected bitf-flips (J) and
set of corrected bit-flips ( C), #RSM errors (NP ) and #RSM errors detected
correctly (NP T ) and incorrectly (NP F ), after each stage of correction in Drlt-
C for r = 8 RSM errors.
18
Algorithm 4 Correction for bit-flips following the Random Switch Model
(RSM) using W = copt A
Require: Measurement vector y, pooling matrix A, Lasso estimate β̂λ1 , δ̂λ2 ,
λ and the set J of corrupted measurements estimated by ODrlt method
Ensure: Bit-flip corrected matrix Ã
1: for every i ∈ J do
2: Set bf 1 := −1, bf 2 = −1 (bit-flip flag), cutoff= |δ̂λ2i |.
3: for every j ∈ [p − 1] do
4: for l ∈ [p] do
5: if {{Aij == −1}and{Ail == 1}}or{{Aij == 1}and{Ail ==
−1}} then
6: if Aij == 1 then
7: Aij = −1, Ail = 1.
8: else if Aij == −1 then
9: Aij = 1, Ail = −1.
10: end if
11: Find the solution β̂λ1 , δ̂λ2 to the convex program given in Eqn.(6)
of [2].
12: if |δ̂λ2i | < cutoff then
13: set bf 1 := j, bf 2 := l, cutoff = |δ̂λ2i |.
14: end if
15: if bf 1 ! = −1 and bf 2 ! = −1 then {bit-flip not detected at Aij }
16: Aij = −Aij , Ail = −Ail {reverse induced bit-flip in Aij }
17: end if
18: end for
19: end for
20: end for
21: return à = A.
19
Algorithm 5 Construction of M (Alg 1 of [3])
Require: Measurement vector y, design matrix A, µ
Ensure: M
⊤
1: Set Σ̂ = A A/n.
2: Let mi ∈ Rp for i = 1, 2, . . . , p be a solution of:
minimize mi ⊤ Σ̂1 mi
subject to ∥Σ̂mi − ei ∥∞ ≤ µ, (2)
q
where ei ∈ Rp is the ith column of the identity matrix I and µ = 2 logn p .
3: Set M = (m1 | . . . |mp )⊤ . If any of the above problems is not feasible, then
set M = Ip .
20
two sets of component wise inequalities:
1 ⊤
−µ ≤ a w.j − ejl ; l ∈ [p]
n .l
1 ⊤
a w.j − ejl ≤ µ ; l ∈ [p]
n .l
The Lagrangian function for γ1 p×1 ≥ 0, γ2 p×1 ≥ 0 is given by, for j ∈ [p],
⊤ p p
w.j w.j
X
X 1 ⊤ 1 ⊤
L(w.j , γ1 , γ2 ) = + γ1l a w.j − ejl − µ + γ2l ejl − a.l w.j − µ (5)
n n .l n
l=1 l=1
The dual is given by g(γ1 , γ2 ) = min n L(w.j , γ1 , γ2 ). Hence, the dual optimi-
w.j ∈R
sation problem D is given by,
We will now attempt to find the exact form of g(γ1 , γ2 ). Hence, we want to find
the minimum value of L(w.j , γ1 , γ2 ) w.r.t. w.j for all j ∈ [p]. Hence, setting
the first order derivative of L(w.j , γ1 , γ2 ) w.r.t. w.j to zero, we get,
∂L(w.j , γ1 , γ2 )
= 0
∂w.j ⊤
p p
2w.j ⊤ X a.l X a.l
=⇒ + γ1l − γ2l = 0
n n n
l=1 l=1
p
1X 1
=⇒ ŵ.j = − (γ1l − γ2l )a.l = − A(γ1 − γ2 ). (7)
2 2
l=1
∂L(w.j , γ1 , γ2 ) 2
= Ip > 0 (8)
∂w.j ⊤ w.j n
21
from (7), we get,
g(γ1 , γ2 ) = L(ŵ.j , γ1 , γ2 )
⊤ p p
ŵ.j ŵ.j X
X
1 ⊤ 1 ⊤
= + γ1l a ŵ.j − ejl − µ + γ2l ejl − a.l ŵ.j − µ
n n .l n
l=1 l=1
1 1
= (γ1 − γ2 )⊤ A⊤ A(γ1 − γ2 ) + γ1 ⊤ − A⊤ A(γ1 − γ2 ) − ej − µ1
4n 2n
1 ⊤
+ γ2 ⊤ A A(γ1 − γ2 ) + ej − µ1
2n
1 1
= (γ1 − γ2 )⊤ A⊤ A(γ1 − γ2 ) − (γ1 − γ2 )⊤ A⊤ A(γ1 − γ2 )
4n 2n
− γ1 ⊤ (ej + µ1) + γ2 ⊤ (ej − µ1)
1
= − (γ1 − γ2 )⊤ A⊤ A(γ1 − γ2 ) − γ1 ⊤ (ej + µ1) + γ2 ⊤ (ej − µ1) (9)
4n
Hence, the dual problem is given by
1
Dj := max − (γ1 − γ2 )⊤ A⊤ A(γ1 − γ2 ) − γ1 ⊤ (ej + µ1) + γ2 ⊤ (ej − µ1).
γ1 ≥0,γ2 ≥0 4n
(10)
This completes the proof of the dual problem. Now, we will show that, for all
j ∈ [p], w̃.j = (1 −( µ)a.j and γ̃1 = 0 and
2(1 − µ) ; if j ′ = j
for j ∈ [p] γ̃2j ′ = are the optimal solutions to the pri-
0 ; otherwise
mal (P) and dual (Dj ) problems respectively. In order to show that, we will
invoke the Strong Duality Theorem, i.e., we will show that the optimal objective
function values of the primal and dual problems at the solutions w̃.j , γ̃1 and γ̃2
are equal for all j ∈ [p].
Recall that the objective function for the primal problem is given by, f (w.j ) =
w.j ⊤ w.j /n. Hence, the value of f (w.j ) at w̃.j for all j ∈ [p] is given by,
a.j ⊤ a.j
f (w̃.j ) = f ((1 − µ)a.j ) = (1 − µ)2 = (1 − µ)2 . (11)
n
Now, the values of the dual objective function for γ̃1 and γ̃2 is given by,
1 a.j ⊤ a.j
g(γ̃1 , γ̃2 ) = − 4(1 − µ)2 + 2(1 − µ)(1 − µ) = (1 − µ)2 (12)
4 n
Hence, from (11) and (12), that both the primal and dual objective function
values are equal to (1−µ)2 for the solutions w̃.j , γ̃1 and γ̃2 for all j ∈ [p]. Hence,
by Strong Duality Theorem, they are the optimal solutions for the primal and
dual problems. This completes the proof. ■
22
6 Optimal solution for Alg.2 of [2]
Now, we will provide a theorem which derives the optimal solution for Alg.6 We
state this algorithm in Alg.6.
Algorithm 6 Design of W
Require: A, µ1 , µ2 and µ3
Ensure: W
1: We solve the following optimisation problem :
p
X
PW ≜ minimize w.j ⊤ w.j /n
W
j=1
q q q
2 log(p)
where µ1 ≜ 2 n , µ2 ≜ 2 log(2np)
np + 1
n and µ3 ≜ √ 2 2 log(n)
p .
1−n/p
2: If the above problem is not feasible, then set W = A.
23
Proof of Theorem 1: The primal problem PW can be written with element-
wise constraints as follows:
p n
1 XX 2
PW ≜ minimize wkl
W n
l=1 k=1
n
1 X
2
subject to C0 : wkl ≤ 1 ∀ l ∈ [p]
n
k=1
n
1X
C1 : −µ1 ≤ Ipl1 l2 − wkl1 akl2 ≤ µ1 ∀ l1 , l2 ∈ [p],
n
k=1
n p
ak1 l1 1 X X
C2 : −µ2 ≤ − wk1 l2 ak2 l2 ak2 l1 ≤ µ2 ∀ l1 ∈ [p], k1 ∈ [n],
p np
k2 =1 l2 =1
p
p 1X p
C3 : −µ3 1 − n/p ≤ Ink1 k2 − wk1 l ak2 l ≤ µ3 1 − n/p ∀ k1 , k2 ∈ [n].
p
l=1
Note that, there are 7 separate set of constraints. Hence, we can define the La-
p×p p×p
grangian function with the Lagrangian parameters Λ1 ∈ R+ , Λ2 ∈ R+ , Λ3 ∈
n×p n×p n×n n×n p
R+ , Λ4 ∈ R+ , Λ5 ∈ R+ , Λ6 ∈ R+ , λ7 ∈ R+ as follows:
L(W , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 )
p p
n n
!
1 XX 2 X 1X 2
= wkl + λ7l wkl − 1
n n
l=1 k=1 l=1 k=1
p p n
!
XX 1X
+ Λ1l1 l2 −µ1 − Ipl1 l2 − wkl1 akl2
n
l1 =1 l2 =1 k=1
p X
p n
!
X 1X
+ Λ2l1 l2 Ipl1 l2 − wkl1 akl2 − µ1
n
l1 =1 l2 =1 k=1
p p
n X n
!
X ak1 l1 1 X X
+ Λ3k1 l1 −µ2 − − wk1 l2 ak2 l2 ak2 l1
p np
k1 =1 l1 =1 k2 =1 l2 =1
p p
n X n
!
X ak1 l1 1 X X
+ Λ4k1 l1 − wk1 l2 ak2 l2 ak2 l1 − µ2
p np
k1 =1 l1 =1 k2 =1 l2 =1
p
n X
n
!
X p 1X
+ Λ5k1 k2 −µ3 1 − n/p − Ink1 k2 − wk1 l ak2 l
p
k1 =1 k2 =1 l=1
p
n X
n
!
X 1X p
+ Λ6k1 k2 Ink1 k2 − wk1 l ak2 l − µ3 1 − n/p . (14)
p
k1 =1 k2 =1 l=1
24
The dual optimisation problem DW is given by:
We will now attempt to find the exact form of g(.). Hence, we want to find the
minima of L(.) w.r.t. W . Hence, evaluating the first order derivative of L(.)
w.r.t. wij , we get, for all i ∈ [n], j ∈ [p],
dL(·) ∂L(W , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 )
=
dW ij ∂wij
p p n p
2 2λ7j 1 X 1 X 1 X X
= wij + wij − Λ1jl2 ail2 − Λ2jl2 ail2 − Λ3il1 ak2 j ak2 l1
n n n n np
l2 =1 l2 =1 k2 =1 l1 =1
n p n n
1 X X 1 X 1 X
− Λ4il1 ak2 j ak2 l1 − Λ5ik2 ak2 j − Λ6ik2 ak2 j . (17)
np p p
k2 =1 l1 =1 k2 =1 k2 =1
Setting the first order derivative in (17) to 0 to satisfy the stationary condition
for all i ∈ [n], j ∈ [p], we get
∂L(W ,Λ1 ,Λ2 .Λ3 ,Λ4 ,Λ5 ,Λ6 ,λ7 )
∂wij =0
2(1+λ7j ) 1 1
=⇒ n ŵij = (Λ1j. + Λ2j. )⊤ ai. + (Λ3i. + Λ4i. )⊤ A⊤ a.j
n np
1
+ (Λ5i. + Λ6i. )⊤ a.j
p
(
1
=⇒ ŵij = (Λ1j. + Λ2j. )⊤ ai. (18)
2(1 + λ7j )
)
1 ⊤ ⊤ n ⊤
+ (Λ3i. + Λ4i. ) A a.j + (Λ5i. + Λ6i. ) a.j
p p
The second order derivative of L(·) w.r.t. wij for all i ∈ [n], j ∈ [p] is given as
follows:
d2 L(·) 2(1 + λ7j )
= >0 (19)
(dwij )2 n
The last inequality is true because λ7 ≥ 0. Hence, from (19), the solution
ŵij given in (18) is the minima. Now we will plug in the value of ŵij for all
i ∈ [n], j ∈ [p] in L(W , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 ) in (14). In order to do so,
25
we re-write L(W , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 ) as follows:
L(W , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 )
p p p
p X
1 X X X
= (1 + λ7l1 )w·l1 ⊤ w·l1 − λ7l1 − (Λ1l1 l2 + Λ2l1 l2 )µ1
n
l1 =1 l1 =1 l1 =1 l2 =1
X p
p X p
n X
X
+ (Λ2l1 l2 − Λ1l1 l2 )Ipl1 l2 − (Λ3k1 l1 + Λ4k1 l1 )µ2
l1 =1 l2 =1 k1 =1 l1 =1
n X p n X n
1 X X p
+ (Λ4k1 l1 − Λ3k1 l1 )ak1 l1 − (Λ5k1 k2 + Λ6k1 k2 )µ3 1 − n/p
p
k1 =1 l1 =1 k1 =1 k2 =1
n X n p X p
X 1 X
⊤
− (Λ6k1 k2 − Λ5k1 k2 )Ink1 k2 − (Λ1l1 l2 + Λ2l1 l2 )w·l a·l2
n 1
k1 =1 k2 =1 l1 =1 l2 =1
n X p n n
1 X 1 X X
− (Λ3k1 l1 + Λ4k1 l1 )wk⊤1 · A⊤ a·l1 − (Λ5k1 k2 + Λ6k1 k2 )wk⊤1 · a(20)
k2 · .
np p
k1 =1 l1 =1 k1 =1 k2 =1
Note that the last three terms of (20) are the only terms dependent on W . Now,
we will find the minimum value of L(W , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 ) which is
given by L(W c , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 ). In order to obtain this, we evaluate
the first term and the last three terms of (20) at W c separately. The first term
expanded is given as follows:
p p
(
1 X ⊤ 1 X 1
(1 + λ7l1 )ŵ·l1 ŵ·l1 = (Λ1l1 · + Λ2l1 · )⊤ A⊤ A(Λ1l1 · + Λ2l1 · )
n 4n 1 + λ7l1
l1 =1 l1 =1
1 ⊤
+ a A(Λ3 + Λ4 )⊤ (Λ3 + Λ4 )A⊤ a·l1
p2 ·l1
n2 ⊤
+ a (Λ5 + Λ6 )⊤ (Λ5 + Λ6 )a·l1
p2 ·l1
2
+ (Λ1l1 · + Λ2l1 · )⊤ A⊤ (Λ3 + Λ4 )A⊤ a·l1
p
2n
+ (Λ1l1 · + Λ2l1 · )⊤ A⊤ (Λ5 + Λ6 )a·l1
p
)
2n ⊤
+ a A(Λ3 + Λ4 )⊤ (Λ5 + Λ6 )a·l1 (21)
p2 ·l1
26
The second last term of (20) at W
c can be written as:
n p
1 X X
− (Λ3k1 l1 + Λ4k1 l1 )wk⊤1 · A⊤ a·l1
np
k1 =1 l1 =1
p
n
(
1 X X
= − (Λ3k1 l1 + Λ4k1 l1 ) a⊤ ⊤ ⊤
k1 · (Λ1 + Λ2 )Λ7 A a·l1
np
k1 =1 l1 =1
)
1 ⊤ ⊤ ⊤ ⊤ n ⊤ ⊤ ⊤
+ (Λ3k1 · + Λ4k1 · ) A AΛ7 A a·l1 + (Λ5k1 · + Λ6k1 · ) AΛ7 A a·l(23)
1 ,
p p
n n
1 X X
− (Λ5k1 k2 + Λ6k1 k2 )wk⊤1 · ak2 ·
p
k1 =1 k2 =1
n n
(
1 X X
= − (Λ5k1 k2 + Λ6k1 k2 ) a⊤ ⊤
k1 · (Λ1 + Λ2 )Λ7 ak2 ·
p
k1 =1 k2 =1
)
1 ⊤ ⊤ ⊤ n ⊤
+ (Λ3k1 · + Λ4k1 · ) A AΛ7 ak2 · + (Λ5k1 · + Λ6k1 · ) Aak2 · ,(24)
p p
27
of Trace of matrices to make it simpler to write. Hence, we get,
g(Λ1 , Λ2 , Λ3 , Λ4 , Λ5 , Λ6 , λ7 ) = L(W
c , Λ1 , Λ2 .Λ3 , Λ4 , Λ5 , Λ6 , λ7 )
= −1⊤ ⊤ ⊤
p λ7 − µ1 1p (Λ1 + Λ2 )1p + T r[Λ2 − Λ1 ] − µ2 1n (Λ3 + Λ4 )1p
T r[A⊤ (Λ4 − Λ3 )] p
+ − µ3 1 − n/p 1⊤ n (Λ5 + Λ6 )1n + T r[Λ6 − Λ5 ]
p
1 1
+ T r[Λ⊤ ⊤ ⊤
7 (Λ1 + Λ2 ) A A(Λ1 + Λ2 )] + T r[Λ⊤ ⊤ ⊤
7 A A(Λ3 + Λ4 ) (Λ3 + Λ4 )A A]
⊤
4n 4np2
n 1
+ 2
T r[Λ⊤ ⊤ ⊤
7 A (Λ5 + Λ6 ) (Λ5 + Λ6 )A] + T r[Λ⊤ ⊤ ⊤
7 (Λ1 + Λ2 ) A (Λ3 + Λ4 )A A]
⊤
4p 2np
1 1
+ T r[Λ⊤ ⊤ ⊤
7 (Λ1 + Λ2 ) A (Λ5 + Λ6 )A] + T r[Λ⊤ ⊤ ⊤
7 A A(Λ3 + Λ4 ) (Λ5 + Λ6 )A]
2p 2p2
1 1
− T r[Λ⊤ ⊤ ⊤ ⊤
7 (Λ1 + Λ2 ) (Λ1 + Λ2 ) A A] − T r[Λ⊤ ⊤ ⊤
7 (Λ1 + Λ2 ) A A(Λ3 + Λ4 ) A]
⊤
2n 2np
1 1
− T r[Λ⊤ ⊤ ⊤ ⊤
7 (Λ1 + Λ2 ) A (Λ5 + Λ6 ) A] − T r[(Λ3 + Λ4 )⊤ A⊤ (Λ1 + Λ2 )Λ⊤ ⊤
7 A A]
2p 2np
1 1
− 2
T r[(Λ3 + Λ4 )⊤ (Λ3 + Λ4 )A⊤ AΛ⊤ ⊤
7 A A] − T r[(Λ3 + Λ4 )⊤ (Λ5 + Λ6 )AΛ⊤ ⊤
7 A A]
2np 2p2
1 1
− T r[(Λ5 + Λ6 )⊤ A⊤ (Λ1 + Λ2 )Λ⊤ ⊤
7 A )] − T r[(Λ5 + Λ6 )⊤ (Λ3 + Λ4 )A⊤ AΛ⊤ ⊤
7 A ]
2p 2p2
n
− T r[(Λ5 + Λ6 )⊤ (Λ5 + Λ6 )AΛ⊤ ⊤
7 A ], (25)
2p2
where Λ7 is as defined before, 1n is vector of length n of ones. This completes
the derivation of the dual problem. Now, we find the values of the primal
problem and dual problem at the given points of solution
p respectively. We have
given the choice Ŵ = copt A where, copt = 1 − µ3 1 − n/p. Hence, the value
of the primal problem is given as,
p p
1X ⊤ X p
2
ŵ.j ŵ.j = copt a⊤ 2 2
.j a.j /n = copt = (1 − µ3 1 − n/p) . (26)
n j=1 j=1
28
Clearly we see from (26) and (27), for the given choices of parameters, the dual
and primal solutions are equal. Hence by Strong Duality Theorem, the solutions
given are one the optimal solutions for both the primal andpdual respectively
with the corresponding optimal values is given by (1 − µ3 1 − n/p)2 . This
completes the proof. ■
29
This theorem provides us with the following simultaneous hypothesis test.
Simultaneous test for β: Let K ⊂ [p] such that ∥K∥0 = k. We reject the
∗ ∗
null hypothesis G0 : βK = 0 vs the alternate G1 : βK ̸= 0 at α% level of
significance if:
√ √
{ n(β̂W )K }⊤ ΣβK −1 { n(β̂W )K } > χ2k,1−α (34)
⊤
( ) ( )
1 ⊤
∗ −1 1 ⊤
∗
+ √ WK δ − δ̂λ2 ΣβK √ WK δ − δ̂λ2
n n
( )⊤ ( )
−1 √
1 ⊤ 1 ⊤ ∗
+ 2 √ WK η ΣβK n Ip − W A (β − β̂λ1 )
n n K
( )⊤ ( )
1 ⊤ −1 1 ⊤
∗
+ 2 √ WK η ΣβK √ WK δ − δ̂λ2
n n
( )⊤ ( )
√
1 ⊤ −1 1 ⊤
+ 2 n Ip − W A (β ∗ − β̂λ1 ) ΣβK √ WK δ ∗ − δ̂λ2 (36)
n K n
In Lemma 5, we will show that ΣβK is positive definite, hence ΣβK −1 exists.
Next in Lemma 3, we will show that the last 5 terms (all except the first term on
the RHS) of (36) goes to 0 in probability. Lastly, we see that the first term on the
( )⊤ ( )
RHS of (36) can be written as ΣβK −1/2 √1n WK ⊤ η ΣβK −1/2 √1n WK ⊤ η .
30
Therefore, by applying Slutsky’s Theorem on (36), we have,
√ √ D
{ n(β̂W − β ∗ )K }⊤ ΣβK −1 { n(β̂W − β ∗ )K } → χ2k . (37)
This completes the proof of (32). Now, we prove the joint distribution given in
(33) with the same approach. Let us first expand (33) using the structure given
in (29). We have,
( )⊤ ( )
1 1
{(δ̂W − δ ∗ )L }⊤ ΣδL −1 {(δ̂W − δ ∗ )L } = In − W A⊤ η ΣδL −1 In − W A⊤ η
n L n L
( )⊤ ( )
1 1
+ In − W A⊤ A(β ∗ − β̂λ1 ) ΣδL −1 In − W A⊤ A(β ∗ − β̂λ1 )
n L n L
⊤
( ) ( )
1 ⊤
∗ −1 1 ⊤
∗
+ WL A δ − δ̂λ2 ΣδL WL A δ − δ̂λ2
n n
( )⊤ ( )
1 ⊤ −1 1 ⊤ ∗
+ 2 In − W A η ΣδL In − W A A(β − β̂λ1 )
n L n L
( )⊤ ( )
1 ⊤ −1 1 ⊤
∗
− 2 In − W A η ΣδL WL A δ − δ̂λ2
n L n
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤
∗
− 2 In − W A A(β − β̂λ1 ) ΣδL WL A δ − δ̂λ2 (38)
n L n
In Lemma 5, we will show that ΣδL is positive definite, hence ΣδL −1 exists.
Next in Lemma 4, we will show that the last 5 terms (all except the first term on
the RHS) of (38) goes to 0 in probability. Lastly, we see that the first term on the
( )⊤ ( )
RHS of (38) can be written as ΣδL −1/2 In − n1 W A⊤ η ΣδL −1/2 In − n1 W A⊤ η .
L L
2
Since, η is Gaussian with mean(0 and variance covariance matrix
) σ In , by
linearity of Gaussian, we have, ΣδL −1/2 In − n1 W A⊤ η ∼ Nk (0, Il ).
L
( )⊤ ( )
−1/2 ⊤ −1/2
Hence, ΣδL In − n1 W A η ΣδL In − n1 W A⊤ η ∼ χ2l .
L L
31
such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with both k, l being fixed as
n, p → ∞. Furthermore, let (β̂W − β ∗ )K be as defined in (28) and ΣβK be as
defined in (30). If n log n is o(p) and n is ω(((s + r) log(p))2 ) then, we have,
( )⊤ ( )
√ 1 ⊤
∗ −1 √
1 ⊤
∗ P
1. n Ip − n W A (β − β̂λ1 ) ΣβK n Ip − n W A (β − β̂λ1 ) →
K K
0.
( )⊤ ( )
P
2. √1 WK ⊤ ∗
δ − δ̂λ2 ΣβK −1 √1 WK ⊤ ∗
δ − δ̂λ2 →0
n n
( )⊤ ( )
√
P
3. 2 √1 WK ⊤ η ΣβK −1
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 ) → 0.
n K
( )⊤ ( )
P
4. 2 √1 WK ⊤ η ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2 → 0.
n n
( )⊤ ( )
√
P
5. 2 n Ip − n1 W ⊤ A ∗
(β − β̂λ1 ) Σβ K −1 √1 WK ⊤
n
∗
δ − δ̂λ2 →
K
0.
■
Proof of Lemma3:
Result 1.: We will expand the quadratic form and utilize individual proba-
bilistic rates derived in [2] to prove the claim. We have
( )⊤ ( )
√ √
1 1 ⊤
n Ip − W ⊤ A ∗
(β − β̂λ1 ) ΣβK −1
n Ip − W A ∗
(β − β̂λ1 )
n K n K
( )( )
√ √
XX
−11 ⊤ ∗ 1 ⊤ ∗
= [ΣβK ]lj n Ip − W A (β − β̂λ1 ) n Ip − W A (β − β̂λ1 )
n j n l
j∈K l∈K
( ) ( )
√ √
XX 1 1
≤ [ΣβK −1 ]lj n Ip − W ⊤ A (β ∗ − β̂λ1 ) n Ip − W ⊤ A (β ∗ − β̂λ1 )
n j n l
j∈K l∈K
2 X X
√
1
≤ n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj (40)
n ∞ j∈K l∈K
32
oP (1). Since k is fixed as n, p → ∞, the proof is complete.
Result 2.: We follow the same process as in the proof of Result 1. We have
( )⊤ ( )
1 1
√ WK ⊤ δ ∗ − δ̂λ2 ΣβK −1
√ WK ⊤ δ ∗ − δ̂λ2
n n
( )( )
XX
−1 1 ⊤
∗
1 ⊤
∗
= [ΣβK ]lj √ W j δ − δ̂λ2 √ W l δ − δ̂λ2
n n
j∈K l∈K
( ) ( )
XX 1 1
≤ [ΣβK −1 ]lj √ W⊤ δ ∗ − δ̂λ2 √ W⊤ δ ∗ − δ̂λ2
n j n l
j∈K l∈K
1 XX
≤ √ W ⊤ δ ∗ − δ̂λ2 [ΣβK −1 ]lj . (41)
n ∞ j∈K l∈K
( )⊤ ( )
√
1 1
2 √ WK ⊤ η ΣβK −1 n Ip − W ⊤ A (β ∗ − β̂λ1 )
n n K
( )
−1/2 √
⊤ 1 ⊤ ∗
= Z ΣβK n Ip − W A (β − β̂λ1 )
n K
√
XX
−1 1 ⊤
≤ [ΣβK ]lj |Zj | n Ip − W A (β ∗ − β̂λ1 )
n l
j∈K l∈K
√
1 X X
≤ n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj |Zj | . (42)
n ∞ j∈K l∈K
√
We have from (23) of Theorem 3 of [2], under the given conditions n Ip − n1 W ⊤ A (β ∗ − β̂λ1 ) =
∞
oP (1). Since k is fixed and we have from Lemma 5 j∈K l∈K [ΣβK −1 ]lj =
P P
OP (1) as n, p → ∞, the proof is complete.
33
Result 4.: By similar arguments as in (42), we have,
( )⊤ ( )
1 1
2 √ WK ⊤ η ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n n
( )
⊤ −1/2 1 ⊤
∗
= Z ΣβK √ WK δ − δ̂λ2
n
XX 1
≤ [ΣβK −1 ]lj |Zj | √ W ⊤ l δ ∗ − δ̂λ2
n
j∈K l∈K
1 XX
≤ √ W ⊤ δ ∗ − δ̂λ2 [ΣβK −1 ]lj |Zj | . (43)
n ∞ j∈K l∈K
We have from (24) of Theorem 3 of [2], under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 =
∞
oP (1). Since k is fixed and we have from Lemma 5 j∈K l∈K [ΣβK −1 ]lj =
P P
OP (1) as n, p → ∞, the proof is complete.
Result 5.: Expanding the quadratic form, we have
( )⊤ ( )
√
1 ⊤ ∗ −1 1 ⊤
∗
2 n Ip − W A (β − β̂λ1 ) ΣβK √ WK δ − δ̂λ2
n K n
XX 1 √ 1 ⊤
−1 ⊤ ∗
≤ [ΣβK ]lj √ W l δ − δ̂λ2 n Ip − W A (β ∗ − β̂λ1 )
n n j
j∈K l∈K
√
1 1 X X
≤ √ W ⊤ δ ∗ − δ̂λ2 n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj .
n ∞ n ∞ j∈K l∈K
We have from (23),(24) of Theorem 3 of [2], under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 =
∞
√
oP (1) and n Ip − n1 W ⊤ A (β ∗ − β̂λ1 ) = oP (1). Since k is fixed and we
∞
P P −1
have from Lemma 5 j∈K l∈K [ΣβK ]lj = OP (1) as n, p → ∞, the proof
is complete.
0.
34
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤ ∗ P
2. n WL A δ − δ̂λ2 ΣδL n WL A δ − δ̂λ2 → 0.
( )⊤ ( )
⊤ −1 ⊤ P
3. 2 In − n1 W A η ΣδL In − n1 W A A(β ∗ − β̂λ1 ) → 0.
L L
( )⊤ ( )
1 ⊤ −1 1 ⊤ ∗ P
4. 2 In − nW A η ΣδL n WL A δ − δ̂λ2 → 0.
L
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤ ∗ P
5. 2 In − nW A A(β − β̂λ1 ) ΣδL n WL A δ − δ̂λ2 → 0.
L
■
Proof of Lemma4:
Result 1.: We will expand the quadratic form and utilize individual proba-
bilistic rates derived in [2] to prove the claim. We have
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤ ∗
In − W A A(β − β̂λ1 ) ΣδL In − W A A(β − β̂λ1 )
n L n L
( )( )
XX
−1 1 ⊤ ∗ 1 ⊤ ∗
= [ΣδL ]ik In − W A A(β − β̂λ1 ) In − W A A(β − β̂λ1 )
n i n k
i∈L k∈L
( ) ( )
XX
−1 1 ⊤ ∗ 1 ⊤ ∗
≤ [ΣδL ]ik In − W A A(β − β̂λ1 ) In − W A A(β − β̂λ1 )
n i n k
i∈L k∈L
2
n 1 ⊤ 1 XX
≤ p In − W A ∗
A(β − β̂λ1 ) n 2 [ΣδL −1 ]ik . (44)
p 1 − n/p n 2
∞ p (1−n/p) i∈L k∈L
1
P P −1
We have from Lemma5, n2 i∈L k∈L [ΣδL ]ik = OP (1). Now, we
2
p (1−n/p)
have from (25) of Theorem 3 of [2], under the given conditions In − n1 W A⊤ A(β ∗ − β̂λ1 ) =
∞
oP (1). Since k is fixed as n, p → ∞, the proof is complete.
Result 2.: We follow the same process as in the proof of Result 1. We have
⊤
( ) ( )
1 ⊤
∗ −1 1 ⊤
∗
WL A δ − δ̂λ2 ΣδL WL A δ − δ̂λ2
n n
( ) ( )
XX
−1 1 ⊤
∗
1 ⊤
∗
≤ [ΣδL ]ik Wi A δ − δ̂λ2 Wk A δ − δ̂λ2
n n
i∈L k∈L
2
n 1 1 XX
≤ p W A⊤ δ ∗ − δ̂λ2 n2
[ΣδL −1 ]ik (45)
p n/p n ∞ p2 (1−n/p) i∈L k∈L
35
We have from (26) of Theorem 3 of [2], under the given conditions √ n 1
n W A⊤ δ ∗ − δ̂λ2 =
p 1−n/p
∞
1
P P −1
oP (1). Since k is fixed and we have from Lemma5, n2 i∈L k∈L [ΣδL ]ik =
p2 (1−n/p)
OP (1) as n, p → ∞, the proof(is complete. )
−1/2 1 ⊤
Result 3.: Recall that Z = ΣδL In − n W A η ∼ Nk (0, Il ). We
L
have
( )⊤ ( )
1 1
2 In − W A⊤ η ΣδL −1 In − W A⊤ A(β ∗ − β̂λ1 )
n L n L
( )
⊤ −1 1 ⊤
= Z ΣδL In − W A A(β ∗ − β̂λ1 )
n L
XX
−1 1 ⊤
≤ [ΣδL ]ik |Zi | In − W A A(β ∗ − β̂λ1 )
n k
i∈L k∈L
1 ⊤
XX
≤ In − W A A(β ∗ − β̂λ1 ) [ΣδL −1 ]ik |Zj |
n ∞ i∈L k∈L
n 1 1 XX
≤ p In − W A⊤ A(β ∗ − β̂λ1 ) 3 n 2 [ΣδL −1 ](46)
ik .
p 1 − n/p n p2 (1−n/p) ∞ i∈L k∈L
We have from (25) of Theorem 3 of [2], under the given conditions √ n In − n1 W A⊤ A(β ∗ − β̂λ1 ) =
p 1−n/p
∞
1
P P −1
oP (1). Since k is fixed and we have from Lemma5, n2 i∈L k∈L [ΣδL ]ik =
p2 (1−n/p)
OP (1) as n, p → ∞, the proof is complete.
Result 4.: By similar arguments as in (46), we have,
( )⊤ ( )
1 ⊤ −1 1 ⊤
∗
2 In − W A η ΣδL WL A δ − δ̂λ2
n L n
( )
−1/2 1
⊤ ⊤ ∗
= Z ΣδL WL A δ − δ̂λ2
n
XX 1
≤ [ΣδL −1/2 ]ik |Zi | Wk A⊤ δ ∗ − δ̂λ2
n
i∈L k∈L
np 1 XX
≤ 1 − n/p W A⊤ δ ∗ − δ̂λ2 [ΣδL −1/2 ]ik |Zj |
p n ∞ i∈L k∈L
n 1 1 XX
≤ p W A⊤ δ ∗ − δ̂λ2 3 n2
[ΣδL −1 ]ik (47)
.
p 1 − n/p n p2 (1−n/p)
∞ i∈L k∈L
1 − n/p n1 W A⊤ δ ∗ − δ̂λ2
n
p
We have from (26) of Theorem 3 of [2], under the given conditions p =
∞
36
1
[ΣδL −1 ]ik =
P P
oP (1). Since k is fixed and we have from Lemma5, n2 i∈L k∈L
p2 (1−n/p)
OP (1) as n, p → ∞, the proof is complete.
Result 5.: Expanding the quadratic form, we have
( )⊤ ( )
1 1
2 In − W A⊤ ∗
A(β − β̂λ1 ) ΣδL −1
WL A⊤ δ ∗ − δ̂λ2
n L n
XX 1 1
≤ [ΣδL −1 ]ik W A⊤ A(β ∗ − β̂λ1 )
In − Wk A⊤ δ ∗ − δ̂λ2
n i n
i∈L k∈L
n 1 n 1 1 XX
≤ p In − W A⊤ A(β ∗ − β̂λ1 ) p W A⊤ δ ∗ − δ̂λ2 n2
[
p 1 − n/p n p 1 − n/p n ∞ ∞ p2 (1−n/p) i∈L k∈L
We have from (25),(26) of Theorem 3 of [2], under the given conditions √ n In − n1 W A⊤ A(β ∗ − β̂λ1 )
p 1−n/p
oP (1) and √ n 1
W A⊤ δ ∗ − δ̂λ2 = oP (1). Since k is fixed and we
p 1−n/p n
∞
1
P P −1
have from Lemma5, n2 i∈L k∈L [ΣδL ]ik = OP (1) as n, p → ∞,
p2 (1−n/p)
the proof is complete.
Proof of Lemma5:
Result 1: To prove this, we will exploit the singular value bounds of the
37
Rademacher matrix. First we bound the sum by the maximal element as follows:
XX
[ΣβK −1 ]lj ≤ k 2 ΣβK −1 ∞ (50)
j∈K l∈K
ΣβK −1 ∞
≤ ∥ΣβK −1 ∥2 (51)
Note that by definition, we have,
1 1
∥ΣβK −1 ∥2 = σmax (ΣβK −1 ) = = copt 2 (52)
σmin (ΣβK )
n σmin (A
Note that, for an arbitrary ϵ1 > 0, using Theorem 1.1. from [4] for the mean-zero
sub-Gaussian random matrix A, we have the following
copt p
P √ σmin (AK ) ≤ copt ϵ1 (1 − (k − 1)/n) ≤ (c6 ϵ1 )n−k+1 + (c5 )p . (53)
n
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm
of the entries of ΣβK . Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ. We
have
!
−1 c26
P σmax (ΣβK ) ≤ 2 p ≥ 1 − ((ψ)n−k+1 + (c5 )p ).
copt ψ 2 (1 − (k − 1)/n)2
(54)
Therefore, using (54), (52), (51) and (50) we get,
2 2
XX c k
P [ΣβK −1 ]lj ≤ 6
p ≥ 1−((ψ)n−k+1 +(c5 )p ).
ψ 2 c2 (1 − (k − 1)/n)2
j∈K l∈K opt
(55)
This completes the proof of Result 1.
Result 2: We approach this proof the same way we approach the proof of
Result 1. Using the same results as (50), (52) and (51), we have,
1 XX l2
n2
[ΣδL −1 ]ik ≤ 2 (56)
n2
p2 (1−n/p) i∈L k∈L p2 (1−n/p) σmin W A⊤ /n − In L
For an arbitrary ϵ1 > 0, using Theorem 1.1. from [4] for the mean-zero sub-
Gaussian random matrix A, we have the following
copt
P √ σmin (A⊤ ) ≤ copt ϵ1 (1 − n/p) ≤ (c6 ϵ1 )n−k+1 + (c5 )p .
p
(57)
n
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm
of the entries of ΣδL . Since W A⊤ /n = copt AA⊤ /n, we have,
p
P σmin W A⊤ /n ≤ copt ϵ21 ( p/n − 1)2 ≤ (c6 ϵ1 )n−k+1 + (c5 )p .
(58)
38
Further since σmin W A⊤ /n − In = σmin W A⊤ /n −1 and σmin W A⊤ /n − In L ≥
p
P σmin W A⊤ /n − In L ≤ copt ϵ21 ( p/n − 1)2 − 1 ≤ (c6 ϵ1 )n−k+1 + (c5 )p .
(59)
Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ. Then, we have from (56) and
(59)
!
1 XX
−1 l2
P n2
[ΣδL ]ik ≤ n2
p
2 ( p/n − 1)2 − 1)2
≥ 1−{(c6 ϵ1 )n−k+1 +(c5 )p }.
p2 (1−n/p) i∈L k∈L (c ϵ
p2 (1−n/p) opt 1
(60)
This completes the proof. ■
A
e = A + ∆1 A
y = Âβ ∗ + η = Aβ
e ∗ + (∆A − ∆1 A)β ∗ + η = Aβ
e ∗ + δe∗ + η, (61)
where δe∗ ≜ (∆A − ∆1 A)β ∗ is the post-correction bit-flip vector. Here, δe∗ is
also a sparse vector however, the sparsity level is a random quantity given by
r̃ = ∥δe∗ ∥0 . Here also, we estimate β ∗ and δe∗ using the Robust Lasso estimator
given as follows:
!
βeλ1 1 2
= arg min y − e − δ + λ1 ∥β∥1 + λ2 ∥δ∥ ,
Aβ 1 (62)
δeλ2 β,δ 2n 2
hypothesis test. Let k be the cardinality of J. In order to show r̃ < r with high
39
probability, we will show that R e ⊂ R with high probability as R e ⊂ R implies
r̃ < r. Let us first define the power of the test. We denote the power of the test
as γn,p . Hence, we have,
n
p
Note that, from result 2 of Theorem 2 of [2], √ σ Σδii → σ as n, p → ∞.
p 1−n/p
Furthermore, zα/2 is a constant for a given α. In order to ensure that the power
√ n
|δ ∗ |
1−n/p i
γn,p goes to 1 as n, p → ∞, we need
p
n
√ → ∞ as n, p → ∞. This
√ σ Σδii
p 1−n/p
√ n |δ ∗ |
p 1−n/p i
implies we need,
√ σ → ∞ as n, p → ∞. Hence, under the condition
p 1−n/p
min|δi∗ | = ω n σ , we have that,
i∈R
√n δ∗ √n δ∗
p 1−n/p i p 1−n/p i
lim γn,p = lim 1 − Φ zα/2 − − Φ −zα/2 −
√n √n
p p
n,p→∞ n,p→∞ σ Σδii σ Σδii
p 1−n/p p 1−n/p
40
Algorithm 7 Correction for bit-flips following the Random Switch Model
(RSM) using W = A
Require: Measurement vector y, pooling matrix A, Lasso estimate β̂init , λ
and the set J of corrupted measurements estimated by ODrlt method
where
1
β̂ init = arg min ∥y J c − AJ c β∥22 + λ∥β∥1 . (65)
β 2n
20: end if
21: end for
22: end for
23: end for
24: return à = A.
41
8.2 Rough notes
We will now evaluate the probability that the set of indices with effective bit-flips
post-correction is a subset of the set of indices with effective bit-flips initially.
This event is represented by R̃ ⊆ R. To find the probability of R̃ ⊆ R, we
condition it on the event that all the effective bit-flips were detected in the
detection stage by the marginal tests i.e., we condition R̃ ⊆ R on R ⊆ J.
Hence, using theorem of total probability, we have,
We will now evaluate both the probabilities on the R.H.S. of (68) separately.
Note that, {R ⊆ J} implies that for all the indices that belong to R, the
test statistic |TH,i | is rejected. Hence, the event {R ⊆ J} is equivalent to
the event {R ∩ Jc = ∅} which implies that none of the elements of R be-
longs to Jc . This impliesthat the event {R ∩ Jc = ∅} is equivalent to
c
∪ |TH,i | ≤ zα/2 |δi∗ ̸= 0 = ∩ |TH,i | > zα/2 |δi∗ ̸= 0 . Hence, we have,
i∈R i∈R
P (R ⊆ J) ∩ |TH,i | > zα/2 |δi∗ ̸= 0 = 1 − P ∪ |TH,i | ≤ zα/2 |δi∗ ̸= 0
= P
i∈R i∈R
X
∗
≥ 1− P |TH,i | ≤ zα/2 |δi ̸= 0
i∈R
= 1 − r(1 − γn,p ). (69)
Here, γn,p is the power of the test for a given n, p. Now, we evaluate P ( R̃ ⊆
R|R ⊆ J). Now, using Lemma 6, we have,
P ( R̃ ⊆ R|R ⊆ J) = P ( R̃ ∩ Rc ∩J = ∅) (70)
We will now define a few notations. Let us recall the in the correction algorithm
Alg.3, for each i ∈ J, for all j1 ∈ [p−1], j2 = j1 +1, . . . , p , we swap the elements
the elements aij1 with aij2 . Let us denote the new bit-flip error at this location as
δi∗ (j1 , j2 ). Then we perform Lasso followed by debiasing to obtain the debiased
lasso estimate and its corresponding test statistic denoted by TH,i (j1 , j2 ). Note
that, here we assume that a wrong swap does create a non-zero bit-flip error
δi∗ (j1 , j2 ). Lastly let us define the set of tuples K = {(j1 , j2 ), j1 ∈ [p − 1], j2 =
{j1 + 1, . . . , p} : {aij1 ̸= aij2 } ∩ {{βj∗1 ̸= 0} ∪ {βj∗2 ̸= 0}}}.
Note that, the event { R̃ ∩ Rc ∩J = ∅} implies that none of the elements
of J \ R are in R̃. Hence, the event { R̃ ∩ Rc ∩ J = ∅} implies that for all
i ∈ J \ R, the hypothesis test w.r.t. TH,i (j1 , j2 ) is rejected for all j1 , j2 . Hence,
42
we have,
P ( R̃ ∩ Rc ∩J = ∅) |TH,i (j1 , j2 )| > zα/2 |δi∗ (j1 , j2 ) ̸= 0
= P ∩ ∩
i∈J\R (j1 ,j2 )∈K
|TH,i (j1 , j2 )| ≤ zα/2 |δi∗ (j1 , j2 ) ̸= 0
= 1−P ∪ ∪
i∈J\R (j1 ,j2 )∈K
X X
P ( |TH,i (j1 , j2 )| ≤ zα/2 |δi∗ (j1 , j2 ) ̸= 0 )
≥ 1−
i∈J\R (j1 ,j2 )∈K
X X
= 1− (1 − γn,p )
i∈J\R (j1 ,j2 )∈K
(p − 1)(p − 2)
≥ 1− (k − r)(1 − γn,p ) (71)
2
Joining (68), (69) and (71), we get,
(p − 1)(p − 2)
P ( R̃ ⊆ R) ≥ {1 − r(1 − γn,p )} 1 − (k − r)(1 − γn,p ) . (72)
2
we have P ( R̃ ⊆ R|R ⊆ J) → 1 as n, p → ∞.
Proof of Lemma 6
If (C \ B) ∩ A = ∅, then A ⊆ B | B ⊆ C.
Assume (C \ B) ∩ A = ∅. This implies that no element of A belongs to C \ B,
i.e., for all x ∈ A:
x∈/ C \ B =⇒ x ∈ B ∪ (U \ C),
where U is the universal set. Now, assume B ⊆ C. Since x ∈ B implies x ∈ C,
the complement U \ C does not contain elements of A. Thus, every x ∈ A must
be in B, implying:
A ⊆ B.
Hence, A ⊆ B | B ⊆ C holds.
If A ⊆ B | B ⊆ C, then (C \ B) ∩ A = ∅. Assume B ⊆ C and A ⊆ B. If
B ⊆ C, then every element of B is in C, so A ⊆ B implies A ⊆ C. Additionally,
since A ⊆ B, no element of A can be outside B. Therefore, no element of A can
belong to C \ B, and we have:
(C \ B) ∩ A = ∅.
43
9 Miscl.
√
Let z = a√+ ib = rz eiθz and w = c + id = rw eiθw , where rz = a2 + b2
and rw = c2 + d2 represent the magnitudes of z and w, and θz , θw are their
respective phases.
The squared magnitude of the difference is given by:
Expanding, we get,
2 2
|z − w|2 = rz eiθz + rw eiθw − 2Re rz eiθz · rw eiθw
2 2
Simplify each term, we have, rz eiθz = rz2 , rw eiθw = rw 2
, The cross term
involves the complex conjugate:
Re rz eiθz · rw eiθw = Re rz rw ei(θz −θw ) = rz rw cos(θz − θw ).
|z − w|2 = rz2 + rw
2
− 2rz rw cos(θz − θw ).
√ √
Finally, using rz = a2 + b2 and rw = c2 + d2 , we have,
p p
|z − w|2 = (a2 + b2 ) + (c2 + d2 ) − 2 a2 + b2 c2 + d2 cos(θz − θw ).
This expression shows that the phase difference θz −θw is explicitly captured
in the term cos(θz − θw ).
We now show that,
|z − w|2 = rz2 + rw
2
− 2rz rw cos(θz − θw ),
√ √
where rz = a2 + b2 and rw = c2 + d2 , and the phases are:
b d
θz = arctan , θw = arctan .
a c
z − w = (a − c) + i(b − d).
44
The squared magnitude is:
|u + iv|2 = u2 + v 2 .
|z − w|2 = rz2 + rw
2
− 2rz rw cos(θz − θw ).
Re(z, w̄) ac + bd
cos(θz − θw ) = =√ √ .
|z|.|w| a2 + b2 c2 + d2
Substitute these into the polar form and simplify:
p p ac + bd
|z − w|2 = (a2 + b2 ) + (c2 + d2 ) − 2 a2 + b2 c2 + d2 · √ √ .
a2 + b2 c2 + d2
Cancel the square roots:
Combine terms:
|z − w|2 = (a − c)2 + (b − d)2 .
This shows the equivalence between the polar and Cartesian forms.
where f (·) and g(·) are convex functions, and A ∈ Rm×d is a matrix, is given
by:
Au
maxm −f ∗ − g ∗ (−u),
u∈R n
where f ∗ (·) and g ∗ (·) are the Fenchel conjugates of f (·) and g(·), respectively.
45
1 ⊤
Introduce an auxiliary variable v = n A w, transforming the problem into:
1 ⊤
min f (w) + g(v), s.t. v = A w.
w,v n
The Lagrangian for this problem is:
1
L(w, v, u) = f (w) + g(v) + u⊤ v − A⊤ w ,
n
g(v) + u⊤ v.
—
Step 3: Dual Problem:
Combining the results from the two steps, the dual function is:
Au
D(u) = −f ∗
− g ∗ (−u).
n
46
10 Optimal solution for W - looser assumption
∥a.j ∥22
n o
Theorem 3 Let A be a n×p matrix with d = min n . Let the coherence
j∈[n]
of A is
1
ν(A) = max |⟨a.i , a.j ⟩|.
i̸=j n
Consider the optimisation problem given in (3) as follows:
1 1 ⊺
min ∥w∥2 subject to A w − ej ≤µ
w n n ∞
ν(A) n(1−µ)
and assume that 1 > µ ≥ d+ν(A) . Then wj = a
∥a.j ∥22 .j
is an optimal solution.
Proof of Theorem 3:
ν
Primal feasibility: From the assumption 1 > µ ≥ d+ν , we have that, dµ +
µν ≥ ν which implies (1 − µ)ν/d ≤ µ. Hence, this choice of wj is primal feasible
since
1 ⊺ (1 − µ)
A ∥a ∥2 a.j − ej ≤ max{|µ|, |(1 − µ)ν/d|} = µ.
n .j 2
n ∞
The Fenchel dual problem A way to derive the dual is via Fenchel duality.
This tells us that given an optimisation problem of the form
1 ⊺
min f (w) + g A w
w n
Then we have,
n
f ∗ (u) = sup⟨u, w⟩ − f (w) = ∥u∥2
w 4
47
and
g ∗ (u) = sup⟨u, w⟩ − g(w) = sup ⟨u, w⟩ = yj + µ∥u∥1 .
w ∥w−ej ∥∞ ≤µ
2(1−µ)ej
Dual feasibility: The point y = ∥a.j ∥22 /n
is feasible for the dual (trivially).
1
P (L ≥ c) ≥ 1 − . (73)
p
■
∥a.j ∥22
n o
Proof of Theorem 4: Given L = min n . Let us define for all j ∈ [p],
j∈[n]
Pn
Uj = n1 ∥a.j ∥22 = n1 i=1 a2ij . Since, aij ∼ N (0, 1) i.i.d. for all i ∈ [n], j ∈ [p].
Pn
Hence, a2ij ∼ χ21 i.i.d. for all i ∈ [n], j ∈ [p]. Therefore, we have, i=1 a2ij ∼ χ2n
for all j ∈ [p]. Let, for all j ∈ [p], Vj = ∥a.j ∥22 . Then, Uj = n1 Yj .
Recall that the Moment Generating Function (M.G.F) of χ2n random variable
is given by,
MYj (λ) = (1 − 2λ)−n/2 ; λ < 1/2. (74)
For λ > 0, ϵ ∈ (0, 1), using Markov’s inequality, we have for j ∈ [p],
Now we will minimise the upper bound in (75) w.r.t. λ to tighten the up-
per bound. Let h(λ) ≜ eλn(1−ϵ) (1 + 2λ)−n/2 . Define g(λ) ≜ ln (h(λ)) =
48
λn(1 − ϵ) − n2 ln(1 + 2λ). Note that, since ln(.) is a monotonically increas-
ing function, minimising h(λ) w.r.t.λ is equivalent to minimising g(λ) w.r.t.λ.
Hence, equating g ′ (λ) = 0, we have,
n 2
g ′ (λ) = 0 =⇒ n(1 − ϵ) − =0
2 1 + 2λ
1 1 ϵ
=⇒ λ̂ = −1 = . (76)
2 1−ϵ 2(1 − ϵ)
∥a.j ∥22
n o
Using the bound in (79), we will obtain a lower bound on L = min n =
j∈[n]
min Uj . Recall that, here Uj ’s are independently and identically distributed for
j∈[n]
all j ∈ [p]. Hence, for some constant l ∈ (0, 1), we have,
P (L ≥ l) = P min Uj ≥ l = P (U1 ≥ l, U2 ≥ l, . . . , Up ≥ l)
j∈[n]
p
Y
= P (Uj ≥ l) = [P (U1 ≥ l)]p = (1 − δ)p . (80)
j=1
q
Taking l = 1 − 2 ln(1/δ)
n in (80), for arbitarily small δ, we have,
r !
ln(1/δ)
P L≥1−2 ≥ (1 − δ)p ≈ 1 − pδ. (81)
n
49
8 ln(p)
Now, for some constant c ∈ (0, 1), we have the assumption, n ≥ (1−c)2 . This
q
assumption implies c ≤ 1 − 2 2 ln(p)
n . Hence, from (82), we have,
1
P (L ≥ c) ≥ 1 − . (83)
p
This completes the proof. ■
Lemma 8 Let A be a n×p random matrix with each of the elements being inde-
pendently and identically distributed from standard normal distribution. Define
|a⊤ a |
ν(A) = max .ln .j , where a.j is the j th column of A. Then for any δ > 0,
l̸=j∈[p]
r !
2 ln(p/δ)
P ν(A) ≥ 4 ≤ δ2 . (84)
n
■
a⊤
.l a.j
1
Pn of Lemma 8: Let us define for all l ̸= j ∈ [p], Ulj =
Proof = n
n a a
i=1 il ij . Hence, ν(A) = max|Ulj |. We will first find a tail bound on
l̸=j
Ulj . From Lemma√9, we have that, ail aij are sub-exponentially distributed
with parameters (2 2, 2). We will now √ use the bernstein’s inequality given in
Eqn.(2.20) of [5] on |Ulj | with v ∗ = 2 2 and b∗ = b = 2 to get for some t > 0,
nt2
P (|Ulj | ≥ t) ≤ 2e− 16 . (85)
50
We have X1 , X2 ∼ N (0, 1) independently and Z = X1 X2 . Clearly, E[Z] =
E[X1 X2 ] = E[X1 ]E[X2 ] = 0. Hence, we have for λ ∈ R,
−1/2
λZ λX1 ,X2 λX1 X2 1 2 2 1 2 1
E[e ] = E[e ] = E[E[e |X1 ]] = E[e 2 λ X1 ]= 1−2 λ ; λ2 < 1/2.
2 2
= (1 − λ2 )−1/2 ; |λ| ≤ 1/2 (89)
In (89), the third equality comes from the M.G.F. of standard normal random
variable and the fourth equality comes from the M.G.F of a chi-squared random
variable as X12 ∼ χ21 . √
2 (2 2)2 λ2
Note that (1 − λ2 )−1/2 ≤ e4λ = e 2 √for |λ| ≤ 1/2. Hence, we have
that, Z is sub-exponential with parameters (2 2, 2). This completes the proof.
■
References
[1] Supplemental material. Uploaded on Journal Portal.
51