Administrative information

Administrative course information is available here

We use the inf-2202-f16@list.uit.no mailing list to send important information.

We have the following rooms and hours:

Staff

GitHub

The course GitHub repository.

Lecture plan

Lecture Date Subject Lecturer
L1 Fri 19.08 Introduction Lars Ailo
L2 Fri 26.08 Threads and synchronization primitives Lars Ailo
L3 Thu 01.09 Guest lecture: Go Language Giacomo Tartari
L4 Fri 02.09 Parallel architectures Lars Ailo
L5 Fri 09.09 Parallel programs Lars Ailo
L6 Fri 16.09 Programming for performance Lars Ailo
L7 Fri 23.09 Parallel program performance evaluation Lars Ailo
L8 Fri 30.09 Performance evaluation Lars Ailo
L9 Fri 07.10 Cloud computing (no slides) Lars Ailo
L10 Thu 13.10 Guest lecture: Scala and Spark Inge Alexander Raknes
- Fri 21.10 Postponed -
L11 Fri 28.10 Data-intensive computing Lars Ailo
L12 Thu 03.10 Spark libraries Lars Ailo
L13 Fri 04.11 Guest lecture: The new Stallo Supercomputer Steinar Trædal-Henden
L14 Thu 10.11 Summary lecture Lars Ailo
- Thu 10.11 Course evaluation Jan Fuglesteg and Kai-Even Nilssen
- Thu 24.11 Exam -

Readings

All lecture notes are Mandatory.

In addition, unless otherwise noted, the following are also mandatory readings:

  1. Introduction
    • None
  2. Threads and synchronization primitives (operating systems course recap):
    • Modern operating systems, 3ed, Andrew S. Tanenbaum. Prentice Hall. 2007. Chapters: 2.2, 2.3, 2.5, 10.3, 11.4
    • Alternative to MOS: another operating systems textbook: the chapters about threading, IPC mechanisms, and classical IPC problems.
  3. Go
  4. Parallel architectures
    • Computer Organization and Design: the Hardware/Software Interface, 5th. David A. Patterson, John L. Hennessy. Morgan Kaufmann. 2011. Chapter 6: “Parallel Processors from Client to Cloud”.
  5. Parallel programs
    • None
  6. Programming for performance
    • None
  7. Parallel performance evaluation
  8. Performance evaluation
    • None
  9. Cloud compting
  10. Scala, and Spark
  11. Data-intensive computing
  12. Spark libraries
    • Optional: lecture notes, papers and videos in the slide comments
  13. Stallo supercomputer
    • None
  14. Summary
    • None

The following are suggested additional readings:

  1. The Go Programming Language. Alan Donovan and Brian Kernighan. 2015.
  2. Learning Spark. Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. O’Reilly. 2015.

  3. Parallel Computer Architecture: A Hardware/Software Approach. David Culler, J.P. Singh, Anoop Gupta. Morgan Kaufmann. 1998.
    • This book has a great introduction to parallel programming.
    • There is one copy in the library. Please be nice to your fellow students and do not lend that copy for an extended period.
  4. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. R. K. Jain. Wiley. 1991.
    • A very good book about performance analysis.
    • There is one copy in the library. Please be nice to your fellow students and do not lend that copy for an extended period.
  5. Computer Architecture, Fifth Edition: A Quantitative Approach, 5ed. John L. Hennessy, David A. Patterson. Morgan Kaufmann. 2011.
    • This book has a throughout description of different parallel architectures.
    • You can purchase this book from your favourite bookstore.
  6. The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. 2010.
    • This collection of essays describe many of the opportunities and challenges for data-intensive computing in different scientific fields.
    • The book is freely available as an ebook.
  7. Advanced Analytics with Spark. Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills. O’Reilly. 2015.

Mandatory assignments

Project Start Due Subject Lecturer
P1 23.08 19.09 Concurrent B+tree Tim
P2 20.09 10.10 Deduplication system Tim
P3 11.10 07.11 PageRank on AWS Tim

Exercises

  1. Introduction
    1. None
  2. Threads and synchronization primitives
    1. Compare the overhead of forking a process vs. creating a Pthread
    2. Compare the overhead of forking a process vs. creating a Python thread
    3. Implement a solution the following classical IPC problems using pthreads/Python threads and semaphores/condition variables. Note that you also need to generate a use case, test data, and useful output:
      1. Producer/ consumer
      2. Reader/ writer
      3. Sleeping barber
      4. Dining philosophers
    4. Modify the code in 3) to use message passing.
  3. Go
    1. Take a tour of Go
    2. Implement the classical IPC problems in exercise 2.3. in Go.
  4. Parallel architectures
    1. None
  5. Parallel programs
    1. Implement a simpliefied BLAST search program in Go that does similarity search on two lists of random DNA sequences.
    2. Implement a heat distribution (SOR) program using Pthreads or (/and) Go.
  6. Programming for performance
    1. Implement a tuple space in Python with semantics similar to Linda. Use your tuple space to implement a parallel version of Mandelbrot that uses dynamic assignment (pool of tasks).
  7. Parallel program performance evaluation
    1. Go through either the debunking or ninja paper and study how they did each of the “Steps for a performance evaluation study”.
  8. Performance evaluation
    1. None
  9. Cloud computing
    1. Create an account at AWS and calculate the approximate cost for analyzing 1TB and 1PB of data.
  10. Scala and Spark
    1. Run the provided WordCount in assignment 3 on AWS
    2. Implement grep in Scala and run it on AWS
  11. Data-intensive computing
    1. Implement word count in MapReduce and run it on AWS.
    2. Implement grep in MapReduce and run it on AWS.
  12. Spark libraries
    1. Refactor your assignment 3 code to use GraphX
  13. Stallo
    1. None
  14. Summary lecture