HW 1
HW 1
Prelude
Please submit your solution using:
handin cs-418 hw1 Your solution should contain two files:
hw1.erl: Erlang source code for your solutions to the questions.
hw1.pdf: Your solutions to written questions.
A template for hw1.erl is available at
http://www.students.cs.ubc.ca/~cs-418/2023-1/hw/1/hw1.html.
Template for hw1_test.erl (test cases), and hw1x.beam (reference implementations) will be released shortly.
Please submit code that compiles without errors or warnings. If your code does not compile, we might
give you zero points on all of the programming problems. If we fix your code to make it compile, we will
take off lots of points for that service. If your code generates compiler warnings, we will take off points for
that as well, but not as many as for code that doesn’t compile successfully. Any question about whether or
not your code compiled as submitted will be determined by trying it on CS department linux machines.
We will take off points for code that prints results unless we specifically asked for print-out. For this
assignment, the functions you write should return the specified values, but they should not print anything to
stdout. Using lists:format when debugging is great, but you need to delete or comment-out such calls before
submitting your solution. Your code must fail with some kind of error when called with invalid arguments.
This assignment is based on the first thirteen sections of Learn You Some Erlang – up through
More on Multiprocessing.
Why the _0 subscript? Because we’re going to write a better version below, and I’m saving the name
mpow for the better version.
(a) Russian Peasant exponentiation (5 points)
The Russian Peasant method is defined by the following recurrence:
X0 = 1
Xn = (X 2 )n/2 , if n is even
Xn = X ∗ (X 2 )(n−1)/2 , if n is odd
Write mpow(X, N, M) using the Russian Peasant algorithm. Make sure that you take the inter-
mediate results modulo-M at each step; otherwise you can produce enormous intermediate results
which will make your code run really slowly. In Erlang, N/2 produces a floating point value even
if N is an even integer. Use div, truncating integer division, when you need an integer result.
1
(b) Is the Russian Peasant algorithm fast? (8 points)
i. (5 points) Use time_it:t from the course library to compare the execution times for
mpow_0(X, N, M) and mpow(X, N, M) for X=1234567, M=10000000, and N ∈ {1000, 10000,
100000, 1000000}. Please measure all times on thetis.students.cs.ubc.ca, and report
the times in a table.
The Russian Peasant algorithm is fast! Below are the recorded times when calling the implemen-
tations of mpow for different values of N:
N mpow0 mpow
1000 5.3 x 10−5 s 1.71 x 10−6 s
10000 5.28 x 10−4 s 2.26 x 10−6 s
100000 5.14 x 10−3 s 2.55 x 10−6 s
1000000 timeout 2.78 x 10−6 s
As we can see from the table, as we increase N by a power of 10, the runtime of mpow0 also
increases by a power of 10. However, the runtime of mpow only increases by a relatively small
amount to the order of 10−6 .
2. Rabin-Miller primality testing (25 points):
The Rabin-Miller algorithm is a cool algorithm, it is widely used (e.g. for generating RSA keys), and I’ll
use it to create “embarrassingly parallel” problem for Question 4. This problem comes with a two-page
introduction – I’ve tried to keep the number theory informal and simple, but give hopefully give you
enough understanding of the algorithm to appreciate it help you when debugging your implementation.
Motivation: Sometimes, we want to find some fairly large prime numbers. For example, the RSA-
2048 encryption uses a public key that is the product of two prime numbers, where the product should
be a 2048 bit integer. Thus, each the primes should be roughly 1024 bit integers, or around 10308 .
How do we find primes with 300 or more digits?
The good news is that for numbers near a large integer, N , the fraction of such integers that are prime
is roughly 1/ log(N ), where log is the natural logarithm. See the Prime Number Theorem if you want
the details. To find a random prime around 10J we just need, on average, to try log(10J ) ≈ 2.3J
numbers. Because we don’t need to test even numbers, we can reduce that to half as many, i.e. 1.15J,
and with further “tricks”, we could reduce the constant factor even more.
Given a large, odd number, Q, how can we determine √ if it’s prime? Testing all factors up to Q is clearly
impractical. We only need to test factors up to Q, but if we are looking for prime numbers around
10308 we still need to try 10154 possible factors for a brute force approach. We need something better,
and for this problem, we’ll use the Rabin-Miller algorithm. I’ll sketch the ideas behind the algorithm
here, and I’ll try to keep the number-theory fairly informal.
The Rabin-Miller primality test is a randomized algorithm. There are three main ideas behind the
algorithm:
Fermat’s Little Theorem: If P is a prime and A is an integer that is not a multiple of P , then
AP −1 mod P = 1.
See Appendix 2.2.1 for a proof and some examples.
The Square Root of 1: If P is a prime and A is an integer such that A ∗ A mod P = 1, then either
A mod P = 1 or A mod P = P − 1.
See Appendix 2.2.2 for a proof and some examples.
Randomized computation: Fermat’s Little Theorem and the observation about squares mod P
can be used to show that an alleged prime, Q is composite, but can’t prove that Q is prime. The
breakthrough that Rabin and Miller made is they combined these two in a way such that the
2
probability that a composite number Q passes both tests for a randomly chosen A is at most 41 .
If we choose K different values of A independently, then the probability of Q passing all K trials
is at most 4−K .
For example, if K=50, then the probability of incorrectly declaring Q to be prime is at most
4−50 ≈ 7.9 × 10−31 . In other words, if we found one alleged prime per nanosecond using the
Rabin-Miller test, we’d expect to have a non-prime slip through about once every 40 trillion years
– about 300 times the age of the universe. If that’s not good enough for you, just choose K large
enough to meet your high standards.
Rabin and Miller combined the Fermat-test and square-test to produce a test that is guaranteed that
the probability of identifying a composite number as prime with a randomly chosen A is at most 41 .
That may not sound very reassuring, but trials with indpendenlty chosen values for A are indpendent.
Thus, if we run N trials, the failure probability is at most 4−N . By choosing N large enough, we can
get an arbitrarily small probabily of declaring a composite number to be prime. Yay!
In more detail, let Q be a number whose primality we wish to test. The main computation in the
Rabin-Miller algorithm is a test that you will implement with a function called rm_check(Q, B, S),
where D be a positive, odd number and S be a non-negative integer such that Q-1 == D*ipow(2, S).
The test performed by rm_check(Q, B, S) is defined by:
Let A be an integer chosen uniformly with 2 ≤ A ≤ Q-2. In my solution, I let
A = 1 + rand:uniform(Q-3).
Let B = mpow(A, D, Q). Note that mpow(A, Q-1, Q) == mpow(B, ipow(2, S), Q).
Let X be the list such that for each I with
1 ≤ I ≤ S+1, lists:nth(I, X) == mpow(B, ipow(2, I-1), Q).
Note that for 1 ≤ I ≤ S,
lists:nth(I+1, X) == misc:mod(lists:nth(I, X)*lists:nth(I,X), Q).
In other words, we can compute each element of X by squaring its predecessor and taking the
result mod Q. Furthermore,
lists:nth(S+1, X) == mpow(B, ipow(2, S), Q) == mpow(A, Q-1, Q).
Q passes the Rabin-Miller check for random witness A iff
rabin_miller(Q, K): return true if Q passes K rounds of the Rabin-Miller test and false otherwise.
rabin_miller(Q, K) calls rm_decompose/1 and rm_check/2.
An implementation of rabin_miller(Q, K) is provided in the hw1.erl template file. It just
checks that the arguments are “reasonable” and handles some trivial cases to avoid special
cases in the code you will write for other functions.
An implementation of rabin_miller(Q) is provided in the hw1.erl template file. It just calls
rabin_miller(Q, 50).
rm_decompose(N): return D and S such that D is a positive, odd number, S is a non-negative integer,
and N == D*ipow(2, S).
• N must be a positive integer.
• rm_decompose(N) is called by rabin_miller/2.
3
rm_check(Q, D, S, K): Perform K trials of the Rabin-Miller test for Q.
• Return true iff all trials pass.
• D and S must satisfy D*ipow(2, S) = Q-1.
• For each trial, rm_check(Q, D, S, K) should generate a random integer value for A with
2 ≤ A ≤ Q-2.
• rm_check(Q, D, S, K) is called by rabin_miller/2.
• rm_check(Q, D, S, K) calls rm_check(Q, B, S) with B = mpow(A, D, Q).
rm_check(Q, B, S): Perform one trial of the Rabin-Miller test.
• Return true iff the trial passes.
• B should be mpow(A, D, Q) where Q == D*ipow(2, S) as described above.
• rm_check(Q, B, S) is called by rm_check/4. I defined the interface between rm_check/4
and rm_check/3 to make testing easier.
• My implementation calls the all_adjacent function provided in the hw1.erl template file.
You are welcome but not required to use all_adjacent.
4
you do implement the spawning, sending, and receiving here to get some experience with that level
of message passing. All processes that you spawn should terminate when you par_primes returns a
result.
What choice for NW (from 4, 16, 32, 64, 128, 256) gets the highest speed-up? What is this peak
speed-up?
Using more than 64 workers, we were able to find around 1450 primes in under 1.2 seconds. Using
a single worker, we were able to find 148 primes. As such, the speed up is around 9.79 times using
the parallel approach than the singular approach.
This peak speed up is because we are only able to allocate as many workers as the number of
cores we have in our machine. As such, using more than 64 workers will result in finding a similar
number of primes given the same amount of time.
5
Figure 1: Solving a tridiagonal linear system (Part 1)
1 2 0 0 0
3 8 1 0 0
A = 0 2 5 0
−1
0 0 4 15 −3
0 0 0 32 4
Let A be a n × n matrix and y be a vector of n elements. We want to find x such that A ∗ x = y. This
can be done in linear time using Gaussian elimination, see Wikipedia. I describe the algorithm below,
but first I’ll state the questions so they don’t get lost.
1 2 0 0 0
0 2 1 0 0
A = 0 2 5 −1 0
′
0 0 4 15 −3
0 0 0 32 4
y′ = [5, 7, 15, 57, 148]T .
0 0 32 4
is tridiagonal. This leads the sequence of Gaussian elimination steps depicted in Figure 1.
From the final equation in Figure 1, we get 10 ∗ x5 = 50 which yields x5 = 5. We return to the previous
equation from Figure 1 to get
16x4 − 3x5 = 49
6
Substituting 5 for x5 and solving for x4 , we get x4 = 4. Continuing in this way, we complete the
solution and get
[x1 , x2 , x3 , x4 , x5 ] = [1, 2, 3, 4, 5].
As noted above, you can find descriptions of this algorithm on Wikipedia or other sources. The pre-
sentation here is intended to point out the inherently recursive nature of the algorithm – that should
make you life happier when coding in Erlang.
2 Supporting Materials
Section 2.1 gives a brief summary of the motivation for each question on this assignment.
2.1 Why?
Question 1: mpow
Russian-Peasant is a cool way to exponentiate – so, you should see it. Also, the Rabin-Miller primality
test would be very slow if we didn’t have a fast algorithm for exponentiation. Needs a bit more thought
to write the code than the brute-force method.
Note: I had considered asking you to measure the times for mpow(X, N, M) when M is larger than a
single machine word (e.g., needs 1024 to 2048 bits for RSA-2048 keys). The time should be somewhere
between log(N)*log(M)*loglog(M) if Erlang uses a “fast” multiplication algorithm (i.e. FFT based),
or log(N)*log(M)2 if Erlang uses a brute-force algorithm. I was expecting brute-force. My measure-
ments were inconclusive; so, I didn’t ask the question. The timing measurements show execution times
for large M that seem clearly faster than what I would expect for “brute force” and clearly slower rate
of growth than what I would expect for FFT. Perhaps Erlang is using Strassen’s algorithm for large
multiplications? I’d need to try more experiments.
Question 2: Rabin-Miller
Another cool algorithm. Brings up more implementation design issues; so, it’s a natural next step for
bringing everyone up-to-speed with programming in Erlang. BTW, if you don’t like my “functional”
presentation of the algorithm, look up the Wikipedia entry (or other on-line sources) and you’ll find
pseudo-code with for-loops. Of course, make sure you cite your sources when you submit your solution.
Question 3: find_primes
Mostly just a step to get to the parallel version in Questions 4. In other words, if your parallel version
doesn’t call find_primes, you probably did too much work.
Question 4: par_primes
This is a course on parallel programming. So, here’s something you can do in parallel.
Question 5: tridiag_solve
For this assignment, this is a question for thinking and coding recursively. In Homework 2, you will
see how to parallelize this algorithm using scan. When we work with CUDA and GPUs, matrices and
vectors play a prominent role. This question is intended to be a bit of a refresher on matrices and
vectors, looking at them from a computational perspective.
7
2.2 Details on Rabin-Miller
My goal here is to take the “magic” out of the algorithm and keep everything simple and informal. Anyone
who has taken MATH 312 or similar has probably seen all of these theorems before and may want to skip
this section, especially if you’re easily offended by informality. In the following, I will write X ≡p Y to
denote (X mod P ) = (Y mod P ).
Fermat’s Little Theorem: If P is prime, and A is an integer that is not a multiple of P , then
AP −1 ≡p = 1
When reasoning about arithmetic mod P , it is sufficient just to consider values in {0, . . . , (P − 1)}. If A is
such an integer, and A is not a multiple of P , that just means that A is in {1, . . . , (P − 1)}. Let SP be the
set {1, . . . , (P − 1)}.
1. If A ∈ SP , then for any integer i ≥ 0, then Ai ∈ SP . This is because P is prime, and the product of
any two positive integers that are less than P cannot be a multiple of P .
2. If A ∈ SP , then all elements of the sequence A0 mod P, A1 mod P, A2 mod P, . . . must be elements
of SP . By the pigeon-hole principle, there must be some value, B that appears multple times in the
sequence. Choose B to be the first such state (note: we could show B = 1, but that’s slightly more
work). Let C be the length of the cycle. I.e. C is the smallest integer such that B ≡p B ∗ AC . Thus,
B, B ∗ A mod P, B ∗ A2 mod P, . . . , B ∗ AC mod P are a cycle.
3. Now we show that C must be a factor of P − 1. If C = P − 1, then we’re done. Otherwise, choose some
element of SP that is not in the cycle, call it D. The sequence D ∗ A0 mod P, D ∗ A1 mod P, . . . D ∗
AC mod P must form a new cycle where no elements on this new cycle are elements of the original
cycle. We can keep doing this until every element of SP is an element of one of these cycles; these
cycles are all disjoint; and all cycles have length C. Thus, if E ∈ SP , E ∗ AC ≡p E which implies
AC ≡p 1.
4. Let F be the total number of cycles found above. Because every element of SP is on exactly on cycle,
F ∗ C = |SP | = P − 1. Therefore
F
AP −1 ≡p AC ≡p 1F = 1
8
Most of the time, AQ−1 ̸≡q 1 if Q is not prime, for example, For example, 12 is not prime; 5 is an integer that
is not a multiple of 12; and
Thus, 12 fails the Fermat-test, and we have shown that 12 is not prime.
Unfortunately, there are numbers called Fermat pseudoprimes that are composite, but pass the Fermat
test for at least one choice of A. For example, 21 is not prime, 13 in an integer that is not a multiple of 21,
and
mpow(13, 20, 21) = misc:mod(ipow(13, 20), 21)
= misc:mod(19004963774880799438801, 21)
= misc:mod(904998274994323782800*21 + 1, 21)
= 1.
In fact, there are pathological choices for Q where Q is composite and mpow(A, (Q-1), Q) = 1 for all integers
A that are not multiples of Q. Such numbers are called Carmichael numbers. 561 is the smallest Carmichael
number.
A ∗ A ≡p 1, given
A ∗ A − 1 ≡p 0, subtract 1 from both sides
(A + 1) ∗ (A − 1) ≡p 0, factoring
If the product of integers X and Y is a multiple of a prime, at least one of X or Y must be a multiple of P .
Therefore, either (A + 1) or (A − 1) must be a multiple of P which proves the claim.
The Square Test – some examples
The contrapositive of the observation that we made above is that if we can find an integer A such that A̸≡q 1,
A̸≡q Q − 1, and A ∗ A ≡q = 1, then Q is not prime. We call this the square-test.
As described in the Question 2, the Rabin-Miller algorithm factors Q-1 = D*ipow(2, S) and computes
mpow(A, Q-1, Q) by first computing mpow(A, D, Q) and squaring the result S times. If they observe a
value, X, such that X̸≡q 1, X̸≡q (Q − 1), and X*X ≡q 1, then by the square-test, Q is not a prime. OTOH, if
after squaring S times, they don’t reach 1, then by the Fermat-test, Q is not prime. The proof that this
fails with a probability of at most 41 and that trials are independent for independently chosen values of A is
beyond the scope of this description.
Going back to our previous example of showing that 21 passes the Fermat test when A=13. For the
Rabin-Miller test:
Q = 21
Q-1 = 20
D = 5
S = 2
A = 13
mpow(13, 5, 21) = 13
misc:mod(13*13, 21) = 1
The square-test fails, and Rabin-Miller refutes the primality of 21.
9
The Library, Errors, Guards, and other good stuff
The CPSC 418 Erlang Library: your code must run on the CS department linux machines.
To access the course library from the CS department machines, give the following command in the Erlang
shell:
1> code:add_path("/home/c/cs-418/public_html/resources/erl").
You can also set the path from the command line when you start Erlang. I’ve included the following in my
.bashrc so that I don’t have to set the code path manually each time I start Erlang:
function erl {
/usr/bin/erl erl -eval ’code:add_path("/home/c/cs-418/public_html/resources/erl")’ "$@" }
See http://erlang.org/doc/man/erl.html for a more detailed description of the erl command and the
options it takes.
If you are running Erlang on your own computer, you can get a copy of the course library from
http://www.students.cs.ubc.ca/~cs-418/resources/erl/erl.tgz
Unpack it in a directory of your choice, and use code:add_path as described above to use it. Changes may
be made to the library to add features or fix bugs as the term progresses. I try to minimize the disruption
and will announce any such changes.
Compiler Errors: if your code doesn’t compile, it is likely that you will get a zero on all coding
questions. Please do not submit code that does not compile successfully. After grading all assignments that
compile successfully, we might look at some of the ones that don’t. This is entirely up to the discretion of
the instructor. If you have half-written code that doesn’t compile, please comment it out or delete it.
Compiler Warnings: your code should compile without warnings. In my experience, most of the Erlang
compiler warnings point to real problems. For example, if the compiler complains about an unused variable,
that often means I made a typo later in the function and referred to the wrong variable, and ended up not
using the one I wanted. Of course, the “base case” in recursive function often has unused parameters – use
a _ to mark these as unused. Other warnings such as functions that are defined but not used, the wrong
number of arguments to an io:format call, etc., generally point to real mistakes in the code. We will take off
points for compiler warnings.
Printing to stdout: please don’t unless we specifically ask you to do so. If you include a short error
message when throwing an error, that’s fine, but not required. If you print anything for a case with normal
execution when no printing was specified, we will take off points.
Guards: in general, guards are a good idea. If you use guards, then your code will tend to fail close to
the actual error, and that makes debugging easier. Guards also make your intentions and assumptions part
of the code. Documenting your assumptions in this way makes it much easier if someone else needs to work
with your code, or if you need to work with your code a few months or a few years after you originally wrote
it. There are some cases where adding guards would cause the code to run much slower. In those cases, it
can be reasonable to use comments instead of guards. Here are a few rules for adding guards:
• If you need the guard to write easy-to-read patterns, use the guard. For example, to have separate
cases for N > 0 and N < 0.
• If adding the guard makes your code easier to read (and doesn’t have a significant run-time penalty),
use the guard.
• If a function is an “entry point” into your code (e.g. an exported function) it’s good to have your
assumptions about arguments clearly stated. Ideally, you this with guards, that is great.
– Often, a function can only be implemented for some values of its arguments. For example, we
might have:
sendSquare(Pid, N) -> Pid ! N*N.
10
A call such as SendSquare([1, 2, 3], cow) doesn’t make sense. Bad calls should throw an
error (e.g. a badarg error). Please, don’t silently ignore bad arguments, for example
sendSquare(Pid, N) ->
if is_pid(Pid) and is_number(N) -> Pid ! {square, N*N};
true -> "messed up actual arguments" % Don’t do this.
end
end.
The caller might very well ignore the return value of SendSquare, and Pid might end up blocking,
waiting for a message that will never arrive. Furthermore, putting tests to see if the return value
is an error code is so C, but throwing explicit exceptions and writing error handlers (if you want
to do something other than killing the process that threw the error) is so much easier to write,
read, and maintain.
We will test your code on bad arguments and make sure that an error gets thrown.
• Adding lots of little guards to every helper function can clutter your code. Write the code that you
would want others to write if you are going to read it.
• In some cases, guards can cause a severe performance penalty. In that case, it’s better to use a wrapper
function so you can test the guards once and then go on from there. Or you can use comments;
comments don’t slow down the code. Any exported function should throw an error when called with
bad arguments.
A common case for omitting guards occurs with tail-recursive functions. We often write a wrapper
function that initializes the “accumulator” and then calls the tail-recursive code. We export the wrapper,
but the tail-recursive part is not exported because the user doesn’t need to know the details of the tail-
recursive implementation. In this case, it makes sense to declare the guards for the wrapper function. If
those guarantee the guards for the tail-recursive code, and the tail recursive code can only be called from
inside its module, then we can omit the guards for the tail-recursive version. This way, the guards get
checked once, but hold for all of the recursive calls. Doing this gives us the robustness of guard checking
and the speed of tail recursion.
Unless otherwise noted or cited, the questions and other material in this homework problem set is
copyright 2024 by Mark Greenstreet and are made available under the terms of the Creative Commons
Attribution 4.0 International license http://creativecommons.org/licenses/by/4.0/
11