and Distributed Systems Programming, 3 hours lecture.
Parallel programming techniques, architectures and algorithms.
Course TA (Elif Acar)
Textbook: Barry Wilkinson
& Michael Allen, Parallel Programming Techniques and Applications Using
Networked Workstations and Parallel Computers Second edition, Pren-Hall 2004
- I. Foster: Designing and Building Parallel Programs,
Addison-Wesley Publishing, 1995
Computing Tutorial Lawrence Livermore National Laboratory, https://computing.llnl.gov/tutorials/parallel_comp/
- A. Grama, A. Gupta, G.
Karypis and V. Kumar, Introduction to Parallel Computing, 2nd Ed.,
- Calvin Lin & Larry
Snyder, Principles of Parallel Programming, Pren-Hall 2009
introduce the theory and implementation of parallel programming techniques.
(Message Passing Interface) on a 15 node Linux Cluster
(Graphical Processing Unit)
Prerequisites: Data Structures and Algorithms, Computer Architecture
and C/Java Programming
PART I: BASIC TECHNIQUES
Chapter 1. Parallel Computers
Chapter 2. Message Passing Computing
Chapter 3. Embarrasingly Parallel Computations
Chapter 4. Partitioning and Divide-and-Conquer Strategies
Chapter 5. Pipelined Computations
Chapter 6. Synchronous Computations
Chapter 7. Load Balancing and Termination Detection
Chapter 8. Programming with Shared Memory
Chapter 9. Distributed Shared Memory Systems and Programming
ALGORITHMS and APPLICATIONS
Chapter 10. Sorting Algorithms
Chapter 11. Numerical Algorithms
Chapter 12. Image Processing
- Assignments, 40% (+ 5/10)
- Mid-term Exam, 25% (+/- 5)
- Final Exam, 35% (+/- 5)
Homework #1 due date: March 1st, 2012
Homework 2 due date: March 8st, 2012
Homework #3 (due March 29th, 2012)
1. Write an MPI program to
distribute MPI source files
all machines in your Lamhosts file and to compile
automatically. To be more specific your program
take two arguments: a source file name
a distribution list.
Note: When MPI nodes activated (e.g., lamboot) MPI assigns
node id to each Machine. Thus, you can use these node ids
of taking a distribution list as an input parameter.
summary, you dont actually need a distribution list parameter.
Murat Kurt for pointing this out.)
Homework #4 (due April 5thth, 2012)
Suppose you have N integers to be summed using p
You can implement an MPI program for the above
separate send()s and recv()s,,
a broadcast routine with separate recv()s to return partial results, and
scatter and reduce routines.
Write MPI programs for all three implementations
and execute, time and compare them
for N= 100, 1000, 10000 and p= 4, 8.
#6 (due April 12th,2012)
a fully synchronous parallel MPI implementation
and a partially
synchronous parallel MPI implementation of the
problem described in Section 6.3.2 of the textbook.
values of s in the convergence condition specified in
Section 6.4. Write
a report on your findings that includes the specific
Homework #7 (due April 19th, 2012)
is the OpenMP programming tasks.
UBI 520 HW#8 Due Date: 03.05.2012
Java or .NET Thread Programming
Simulate the Gale Shapley algorithm for stable
Create n m_threads and n w_threads, each
representing a participant.
Each thread is in either the free state
or the engaged state. Initially each thread is in the free state.
Each thread, when it is fired,
creates a random preference list of the other gender.
Additionally, each w_thread creates
the inverse list (as shown in lecture notes).
Each thread keeps its state and its
fiancée if exists.
(initially n). When a man becomes engaged,
decrease number_of_free_man by 1.
When a man becomes disengaged, increase
number_of_free_man by 1. But be
careful about the order in case of a disengagement.
Do the increment first otherwise
number_of_free_man may go down to 0 causing the
algorithm terminates prematurely.
Algorithm for m_threads:
Each free m_thread offers to the
first w_thread that didnt reject it yet.
(Note the order that the free man
proposes doesnt matter.)
If its offer (m_threads) is
accepted, the m_thread sets its fiancée variable and goes in the engaged
where he doesnt offer as long as it is
If rejected, the free m_thread may
go on to offer the next w_thread in its preference list.
Algorithm for w_threads:
Each w_thread that receives an offer
must either reject or accept. If it is in free state, it accepts.
If it accepts an offer, a w_thread
becomes engaged and never becomes free again; however,
may change its fiancée when it gets a
When there is no free m_thread,
algorithm terminates (When all man have fiancées, we are done).
Please make sure that you present:
Your test results for n=5 (preference
lists, intermediate steps and final matchings)
Run time results n=5, 10, 20, 50,
100 for average and the longest thread execution times in millisecond.
Homework #9 (due date May
F(x) fonksiyonun (analitik bir fonksiyon olmak zorunda değil,
simülasyon sonucu da olabilir)
[a,b]aralığında minimum değerini bulmak istiyoruz. Bunun için
[a,b] aralığını eşit n parçaya
fonksiyonun bu noktalardaki değerlerini hesaplıyoruz. Daha sonra
hesaplanan bu değerlerden
en küçüğünü üreten x değerini bulmaya çalışıyoruz.
Yukarıda açıklanan problemin
çözümüne yönelik aşağıda verilen fonksiyonların
local minimalarını CPUda ve GPUda hesaplayan
iki farklı prosedür yazınız.
float func1(float x)
__host__ __device__ float fun2(float x)
__host__ __device__ float fun3(float x)
b, int funcIndex) /* funcIndex
hangi fonksiyonu kullanacağımızı belirliyor */
b, int funcIndex) /*
funcIndex hangi fonksiyonu kullanacağımızı belirliyor */
GPUda minimum bulmak için
kullandığınız paralel algoritmayı açıklayınız.
Her üç fonksiyon için [0, 1]
aralığında CPUda ve üç farklı blok konfigürasyonu ile GPUda çalışma
Çalışma sürelerinin farklılıklarını
MPI tutorials one, two,three,four, GroupComm
document for MPI UBE_LAM_MPI.doc
Send any comments or suggestions
Last revised in Feb 12,1996