Section 4.1 Big-O, Searching and Sorting¶

Big-O Notation¶

Runtime Analyses¶

What are the big-O runtimes of each of the following functions as a function of \(n\), where \(n\) is either the explicit parameter to the function or the size of the input container?

 void abdominalis(int n) {
     for (int i = 0; i < n; i++) {
         cout << '*' << endl;
     }
 }

 void algiricus(int n) {
     for (int i = 0; i < 137 * n + 42; i++) {
         cout << '*' << endl;
     }
 }

 void angustus(int n) {
     for (int i = 0; i < 271828; i++) {
         cout << '*' << endl;
     }
 }

 void barbouri(int n) {
     for (int i = 0; i < n; i++) {
         for (int j = 0; j < n; j++) {
             cout << '*' << endl;
         }
     }
 }

 void bargibanti(int n) {
     for (int i = 0; i < n; i++) {
         for (int j = 0; j < n; j++) {
             if (j % 2 == 0) { // j is even
                 cout << '*' << endl;
             } else {
                 cout << '?' << endl;
             }
         }
     }
 }

 void breviceps(int n) {
     for (int i = 0; i < n; i++) {
         for (int j = 0; j < n; j++) {
             if (j % 2 == 0) { // j is even
                 cout << '*' << endl;
             } else {
                 for (int k = 0; k < n; k++) {
                     cout << '?' << endl;
                 }
             }
         }
     }
 }

 void camelopardalis(const Vector<int>& v) {
     Vector<int> q;
     for (int i = v.size() - 1; i >= 0; i--) {
         q += v[i];
     }
     cout << v.size() << endl;
 }

 void capensis(const Vector<int>& v) {
     Set<int> q;
     for (int i = v.size() - 1; i >= 0; i--) {
         q += v[i];
     }
     cout << v.size() << endl;
 }

 void casscsio(Vector<int>& v) {
     while (v.size() >= 5) {
         v = v.subList(5);
     }
 }

Solution

Each iteration of the loop does a constant amount of work, and the number of iterations is \(n\). Therefore, we do \(O(n)\) total work.
This loop runs for \(137n + 42\) iterations, which is a linear function of the number of loop iterations. Therefore, the loop runs \(O(n)\) times. Since the work done per loop iteration is \(O(1)\), the total work done is \(O(n)\).
This loop always runs \(271,828\) times, regardless of what the value of \(n\) is. That means that the loop runs \(O(1)\) total times, with \(O(1)\) meaning “something that does not scale as a function of \(n\).” Each iteration does \(O(1)\) work, so the total work done here is \(O(1)\).
This is a great place to apply the maxim “when in doubt, work inside out!” The inner loop does \(O(n)\) total work. Since the outer loop runs \(O(n)\) times, the total work done is \(O(n) \text{ iterations} \times \frac{O(n) \text{ work}}{\text{iteration}} = O(n^2) \text{work.}\)
This is basically the same code as before, except that the innermost for loop now has an if statement inside it. However, that doesn’t matter for our purposes, because the for loop does \(O(1)\) work regardless of which branch we take. The code therefore still runs in time \(O(n^2)\).
Now things get a bit more interesting. Notice that the loop on \(k\) does \(O(n)\) total work, so we can rewrite the code like this:
```
void breviceps(int n) {
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            if (j % 2 == 0) { // j is even
                cout << '*' << endl;
            } else {
                // do O(n) work
            }
        }
    }
}
```
Now, notice that half the time, the loop on \(j\) does \(O(1)\) work (when \(j\) is even), and the other half the time it does \(O(n)\) work (when \(j\) is odd). Based on that, how much work does the loop on \(j\) do? The \(j\) loop runs a total of \(O(n)\) times. Half of those times it’ll be even, and half the times it’ll be odd (roughly). Since half of \(O(n)\) is still \(O(n)\), this means that there are \(O(n)\) iterations where the loop does \(O(1)\) work, and there are \(O(n)\) iterations where it does \(O(n)\) work. Collectively, this means that we do \(O(n) \times O(1) + O(n) \times O(n) = O(n^2)\) total work in the \(j\) loop, so we can rewrite our code like this:
```
void breviceps(int n) {
    for (int i = 0; i < n; i++) {
        // do O(n^2) work
    }
}
```
Across all \(n\) iterations of the outer loop, we therefore do a total of \(O(n^3)\) work.
Here, \(n\) refers to the total length of the input string v, measured in number of integers.

Our loop runs for a total of \(O(n)\) iterations. The question is how much work each iteration does. If you consult the documentation for the Vector type, you’ll find that appending a value to a Vector takes time \(O(1)\) and accessing an element of a Vector takes time \(O(1)\). Therefore, each loop iteration does \(O(1)\) work, so the total work done is \(O(n)\).
As before, the loop runs for \(O(n)\) times. How much work does each iteration do? The answer is not \(O(1)\). If you check the documentation for the Set type, you’ll find that adding to a set takes time \(O(\log n)\). Therefore, we do \(n\) iterations of the loop, and each iteration does \(O(\log n)\) work, so the total work done is \(O(n \log n)\).
With a little though, we can see that this loop runs about \(\frac{n}{5}\) total times, since each iteration makes the list five elements shorter. The question, then, is how much work each loop iteration does.

Importantly, the cost of Vector::subList is directly proportional to the length of the sublist being requested. That is, the work done is \(O(L)\), where \(L\) is the length of the list. If we work out the lengths of those lists, they’ll come back as \(n - 5\), \(n - 10\), \(n - 15\), \(n - 20\), etc. until we get down to a list of length (roughly) \(10\), then (roughly) \(5\), then (roughly) \(0\). So the question then becomes: what is \(0 + 5 + 10 + 15 + 20 + … + (n - 10) + (n - 5)\)?

Intuitively, this quantity counts the number of stars in this approximate triangle shape:
```
*****
**********
***************
********************
*************************
******************************
***********************************
                ...
```
This triangle has a base whose size is roughly \(n - 5\). Its height is roughly \(\frac{n}{5}\). Therefore, the area is something kinda sorta ish like \(\frac{n(n - 5)}{10}\), which is \(O(n^2)\). But we can see that this is \(O(n^2)\) in another way: we’re counting up the number of stars that make up a 2D figure, which is asking about its area. Area grows quadratically, so the work done here is \(O(n^2)\).

Another way to see this is to divide everything by five:

\[0 + 5 + 10 + 15 + \dots + (n - 10) + (n - 5) = 5(0 + 1 + 2 + 3 + \dots + (\frac{n}{5} - 2) + (\frac{n}{5} - 1))\]

That last sum is the one we saw in class: it works out to \(O(n^2)\). Therefore, the total work done is \(5 \cdot O(n^2) = O(n^2)\).

Changing Passwords¶

Looking to change your password? Rather than picking a password that’s a common word or phrase with a bunch of random numbers thrown in, consider using a multi-word password formed by choosing some sequence of totally random words out of the dictionary. Choosing four totally random words, it turns out, tends to be a pretty good way to make a password. Here’s a couple passwords you might make that way:

  RantingCollegersDenoteClinching
  VivificationPandectYawnedCarmine
  DetachednessHowlinglySportscastsVapored
  UnlearnedMockeriesTuskedChuckles
  SharpshootingPreyParaffinsLibeler

We generated these particular passwords using the following piece of code:

string makeRandomPassword(const Vector<string>& wordList) {
   string result;
   for (int i = 0; i < 4; i++) {
      int wordIndex = randomInteger(0, wordList.size() - 1);
      result += wordList[wordIndex];
   }
   return result;
}

When we ran this code, we used a word list containing about 120,000 words. There are other word lists available that we could have picked. The Basic English dictionary, for example, only has about 5,000 words. A more elaborate dictionary called ENABLE has about 180,000 words. It’s therefore not all that unreasonable for us to analyze the efficiency of the above code in terms of the number of words n in our word list.

Let \(n\) denote the number of words in wordList. What is the big-O time complexity of the above code as a function of \(n\)? You can assume that the words are short enough that the cost of concatenating four strings together is \(O(1)\) and that a random number can be generated in time \(O(1)\). Explain how you arrived at your answer. Your answer should be no more than 50 words long.
Let’s suppose that the above code takes 1ms to generate a password when given a word list of length 50,000. Based on your analysis from part 1, how long do you think the above code will take to generate a password when given a word list of length 100,000? Explain how you arrived at your answer. Your answer should be no more than 50 words long.

Now, let’s think about how someone might try to break your password. If they know that you chose four totally random English words, they could try logging in using every possible combination of four English words. Here’s some code that tries to do that:

string breakPassword(const Vector<string>& wordList) {
    for (string w1: wordList) {
        for (string w2: wordList) {
            for (string w3: wordList) {
                for (string w4: wordList) {
                    string guess = w1 + w2 + w3 + w4;
                    if (passwordIsCorrect(guess)) {
                        return guess;
                    }
                }
            }
        }
    }
    error("Couldn't break password.");
}

As before, let’s assume that the words in our word list are short enough that the cost of concatenating four of them together is \(O(1)\). Let’s also assume that calling the passwordIsCorrect function with a given password takes time \(O(1)\).

What is the worst-case big-O time complexity of the above piece of code as a function of \(n\), the number of words in the word list? Explain how you arrived at your answer. Your answer should be no more than 50 words long.
Imagine that in the worst case it takes 1,000 years to break a four-word password when given a word list of length 50,000. Based on your analysis from part 3, how long will it take, in the worst case, to break a password when given a word list of length 100,000? Explain how you arrived at your answer. Your answer should be no more than 50 words long.

Searching and Sorting¶

Binary Search Tracing¶

Trace the execution of the binary search algorithm looking for the value 137 in the following array of 15 items.

{ 29, 34, 77, 78, 100, 210, 240, 278, 281, 315, 444, 549, 864, 875, 930 }

How many elements does binary search end up looking at in the course of the search?

Counting Sort¶

The sorting algorithms we’ve used thus far are examples of comparison sorts, where we determine the relative ordering of elements by comparing them against one another. This allow us to use our sorting algorithms (selection sort, insertion sort, mergesort, etc.) to sort any types of objects that are comparable this way. We could sort integers, strings (compare them alphabetically), Vector<int>s, etc. However, the downside is that no comparison-based sorting algorithm can run faster than \(O(n \log n)\) on average.

Let’s suppose that we’re specifically sorting int values. Furthermore, suppose we’re sorting integers that range from 0 (inclusive) to some maximum value \(U\) (exclusive). Counting sort works as follows. First, we build a histogram that shows how many of each of the \(U\) possible numbers we have in our array. For example, here’s a sample input array whose values are all between 0 and 9, inclusive, and its histogram:

Array:      1 3 2 4 7 1 7 9 5 7 2 4 4 7 4 6 0 2 5 9 6

           +---+---+---+---+---+---+---+---+---+---+
Histogram: | 1 | 2 | 3 | 1 | 4 | 2 | 2 | 4 | 0 | 2 |
           +---+---+---+---+---+---+---+---+---+---+
             0   1   2   3   4   5   6   7   8   9

Once we have the histogram, we can sort the input array by writing out one copy of 0, two copies of 1, three copies of 2, one copy of 3, …, and two copies of 9.

The runtime of counting sort is \(O(n+U)\). If \(U\) is small——say, we’re sorting single-digit numbers——then this runtime may work out to \(O(n)\), faster than any comparison-based sorting algorithm! However, if \(U\) is large, then it may run much slower than, say, mergesort.

Write a function

void countingSort(Vector<int>& values);

that uses counting sort to sort the vector values. You can assume the values in the array are nonnegative.