PCNOTES.2024
PCNOTES.2024
Q.4.HOW SYSTOLIC ARRAY AND VECTOR PROCESSORS ARE DIFFERENT FROM SIMD
AND PIPELINED PROCESSORS:
Systolic arrays and vector processors are specialized for parallel data processing but differ from
SIMD and pipelined processors in operation style and flexibility. **Systolic arrays** use a grid
of processors that rhythmically pass data between nodes, optimized for tasks like matrix
operations, common in AI accelerators. **Vector processors** handle data in batches,
executing the same operation across vectors of data in one instruction.
In contrast, **SIMD (Single Instruction, Multiple Data)** processors apply a single operation to
multiple data points simultaneously but lack the data flow structure of systolic arrays.
**Pipelined processors** split tasks into stages, processing sequentially to enhance speed, but
without direct parallelism.
Q.12.SPMD: SPMD (Single Program, Multiple Data) is a parallel programming model where
multiple processes or threads execute the same program but operate on different data. Each
process or thread follows the same control flow, executing the same instructions, but with
potentially different data inputs. SPMD is commonly used in parallel computing environments,
such as distributed systems or GPU programming, to achieve parallelism across multiple
processing units while maintaining code simplicity and portability. It allows for efficient
exploitation of data parallelism in a variety of parallel computing applications.
Q.15.PRAM MODEL: The PRAM (Parallel Random Access Machine) model is a theoretical
framework for designing parallel algorithms. It assumes multiple processors working
synchronously, sharing a single memory with unrestricted access. PRAM allows analyzing
parallel complexity by categorizing memory access as exclusive (only one processor
accesses at a time) or concurrent. It's used to evaluate algorithm efficiency in parallel
computing. The four main types are:
1. **EREW (Exclusive Read Exclusive Write)**: Only one processor can read from or write to
a specific memory location at any time, eliminating conflicts.
2. **CREW (Concurrent Read Exclusive Write)**: Multiple processors can read from the same
memory location simultaneously, but only one can write to a location at a time.
3. **ERCW (Exclusive Read Concurrent Write)**: A less common model where only one
processor can read from a memory location at a time, but multiple processors can write
simultaneously.
4. **CRCW (Concurrent Read Concurrent Write)**: Allows multiple processors to read and
write to the same memory location simultaneously, with conflict resolution strategies (e.g.,
priority, random choice, or combining writes).
2. **Distributed Memory Programming**: In this model, each processor has its own local
memory, and data sharing is done via message passing. Processors communicate explicitly,
often using interfaces like MPI (Message Passing Interface), to exchange data. This approach
is highly scalable and suitable for clusters and large distributed systems. However, it can
introduce communication overhead and requires complex data management for coordination.
3. **Object-Oriented Programming (OOP)**: OOP organizes code into "objects" that combine
data and functions. This model emphasizes modularity, inheritance, encapsulation, and
polymorphism, making it easier to manage and scale complex applications. OOP is widely used
for building large, reusable, and maintainable systems. Examples include languages like Java,
C++, and Python, which provide structures for defining and interacting with objects.
5. **Data Flow Programming**: In data flow programming, the execution depends on the
availability of data rather than control flow. Processes or functions execute when all required
input data is available, allowing for natural parallelism. It’s ideal for applications with complex
data dependencies, such as signal processing and reactive programming. Data flow is used in
graphical environments like LabVIEW and some programming frameworks for workflow
automation.
6. **Data Parallel Programming**: This model divides a dataset into chunks and applies the
same operation to each part simultaneously across multiple processors, making it highly
effective for repetitive tasks like matrix operations or image processing. Data parallelism
leverages SIMD (Single Instruction, Multiple Data) architectures and is commonly used in fields
like machine learning and scientific computing, with frameworks like CUDA and OpenCL
enabling data-parallel processing on GPUs.
Q.39.DIFFERENTIATE BETWEEN SHARED MEMORY PARALLELISM, DISTRIBUTED
MEMORY PARALLELISM AND OBJECT ORIENTED PROGRAMMING:
Shared Memory Parallelism: 1. Shared memory parallelism involves multiple processing units
accessing a common memory space…2.Processes or threads communicate by reading and
writing to shared memory locations…3.Programs use threads or processes that share a
common address space…4. Synchronization mechanisms such as mutexes and semaphores
are used to coordinate access to shared resources. 5. Limited scalability due to contention for
shared resources and memory bandwidth…6. **Examples**: OpenMP and pthreads are
commonly used APIs for shared memory parallelism…7. Typically implemented on multi-core
processors or shared memory systems like symmetric multiprocessing (SMP) architectures.
8. All threads or processes have direct access to shared data, simplifying data sharing but
requiring careful synchronization…9. Easier to program compared to distributed memory
parallelism due to shared memory abstraction…
DISTRIBUTED MEMORY PARALLELISM: 1. : Distributed memory parallelism involves
multiple processing units with their own local memory…2. Processes or nodes communicate
by passing messages over a network…3. Programs use message passing interfaces (MPI) to
exchange data between processes running on different nodes…4. Synchronization is achieved
through message passing and coordination of distributed processes…5. Offers high scalability
as the number of nodes can be increased without contention for shared resources…6.
**Examples**: MPI is a widely used standard for distributed memory parallelism…7.
Implemented on clusters or supercomputers where each node has its own memory and
processor…8. Requires explicit communication for data sharing between processes, leading to
increased programming complexity…9. : More complex programming compared to shared
memory due to explicit message passing and distributed data management…
OBJECT-ORIENTED PROGRAMMING (OOP):1. OOP is a programming paradigm based on
the concept of objects, which encapsulate data and behavior…2. Focuses on creating classes
and objects to represent real-world entities and their interactions…3. Encapsulates data within
objects and restricts access to it through methods or functions…4. Supports inheritance,
allowing classes to inherit properties and behavior from other classes…5. Supports
polymorphism, enabling objects of different classes to be treated uniformly through inheritance
and interfaces…6. Promotes modularity by organizing code into reusable and understandable
units (objects and classes)…7. Promotes code reusability through inheritance and composition,
reducing redundancy and improving maintainability…8. **Examples:Java, C++, and Python are
popular programming languages that support OOP principles…9. Provides flexibility in
designing software systems by supporting abstraction, encapsulation, inheritance.
Q.41. WHAT ARE THE MAIN STRUCTURE AND TECHNIQUES USED FOR
PARALLELISNG A SEQUENTIAL PROGRAM:
1. **Task Decomposition**: Break down the program into smaller, independent tasks that can
execute concurrently. Each task should ideally be self-contained to minimize dependencies and
synchronization.
2. **Data Decomposition**: Split data into segments so each processor handles a portion.
Common in data-parallel applications, this approach is useful for tasks that apply the same
operations across large datasets.
3. **Loop Parallelism**: Identify independent iterations within loops and execute them in
parallel. This is often done in computationally intensive loops.
4. **Pipelining**: Divide the program into stages, where each stage processes different data
simultaneously, allowing for overlapping computations.
5. **Synchronization Mechanisms**: Use locks, semaphores, or barriers to manage
dependencies and ensure consistency.
6. **Load Balancing**: Distribute tasks evenly across processors to prevent bottlenecks and
optimize performance.
7. **Parallel I/O**: - Parallel I/O techniques are used to parallelize input/output operations to
improve I/O performance in parallel programs.- Techniques such as parallel file systems,
collective I/O, and asynchronous I/O enable concurrent access to data storage devices and
minimize I/O bottlenecks in parallel applications
Q.42.WHAT ARE THE STEPS FOR CREATING A PARALLEL PROGRAMMING
ENVIRONMENT:
1. **System Setup**: Ensure the availability of multi-core processors, GPUs, or distributed
systems with network connectivity for data exchange, depending on the parallelism needs.
2. **Select a Programming Model**: Choose an appropriate parallel model (e.g., shared
memory, distributed memory, or data parallel) based on the hardware architecture and task
requirements.
3. **Choose Parallel Libraries and Tools**: Use libraries like OpenMP, MPI, or CUDA, along
with profiling tools to optimize and manage parallel tasks.
4. **Develop Parallel Code**: Identify parallelizable sections of the program, typically through
task or data decomposition, and restructure code accordingly.
5. **Implement Synchronization and Communication**: Use locks, semaphores, or message
passing to manage dependencies and data sharing.
6. **Test and Optimize**: Run and profile the program, adjusting load balancing,
synchronization, and memory usage for optimal performance.
7. **Deploying and Scaling**: Deploy parallel applications on the target hardware and scale
them to larger problem sizes or distributed environments. Monitor performance and scalability
metrics to ensure efficient utilization of resources and optimal performance across different
configurations.
Q.43.LOOP SCHEDULING:, it refers to the technique used to distribute loop iterations among
parallel threads or processes in parallel computing environments. Loop scheduling aims to
balance workload, minimize overhead, and optimize performance by efficiently distributing loop
iterations. Common loop scheduling strategies include: 1. **Static Scheduling**: Divide loop
iterations statically at compile time among parallel threads or processes. Each thread is
assigned a fixed range of iterations, typically determined by the loop bounds and the number
of available threads…2. **Dynamic Scheduling**: Distribute loop iterations dynamically at
runtime based on workload distribution and load balancing considerations. Threads or
processes request and execute iterations dynamically as they become available, reducing the
potential for load imbalance…3. **Guided Scheduling**: A hybrid approach that starts with
static scheduling and gradually transitions to dynamic scheduling as the number of iterations
decreases. Initially, larger chunks of iterations are assigned statically, and as the workload
diminishes, smaller chunks are distributed dynamically to maintain load balance…4. **Chunk
Sizing**: Determines the number of loop iterations assigned to each thread or process.
Optimizing chunk size is crucial for minimizing overhead and maximizing performance. Larger
chunk sizes reduce scheduling overhead but may lead to load imbalance, while smaller chunk
sizes increase scheduling overhead but improve load balancing.
Q.55 MOORE’S LAW: Moore's Law, proposed by Gordon Moore, observes that the number of
transistors on a microchip doubles approximately every two years, leading to increased
processing power and decreased cost per transistor. This exponential growth has driven
advancements in computing, making devices faster, smaller, and more affordable. However,
as transistor sizes approach physical limits, maintaining this pace has become challenging,
sparking innovation in alternative computing architectures.
Q.56.CACHE COHERENCE: Cache coherence refers to the consistency of data stored in local
caches of parallel processors or multi-core systems. In a system where multiple processors
cache the same memory location, cache coherence ensures that any changes made to that
location in one cache are reflected in all other caches. This is crucial to avoid discrepancies
and ensure correct program execution. Cache coherence protocols, such as MESI and MOESI,
help manage this consistency effectively.
2. **Adaptive Routing:** - **Definition:** Adaptive routing dynamically selects the path for
each packet based on real-time network conditions. - **Routing Decision:** Paths are chosen
based on current network congestion, link availability, and other factors. - **Flexibility:** Adapts
to changing network conditions, potentially reducing congestion and improving performance
- **Complexity:** More complex algorithms and overhead for route selection and adaptation.
- **Examples:** Algorithms like Dijkstra's, shortest path, or routing based on load balancing.
Q.59.MULTI CACHE MEMORY: Multi-level cache memory consists of multiple cache levels
organized hierarchically to improve memory access speed and efficiency. The levels typically
include L1, L2, and sometimes L3 caches, with each level offering progressively larger capacity
but slower access times. The cache hierarchy exploits the principle of locality, with frequently
accessed data stored in smaller, faster caches closer to the processor, while less frequently
accessed data resides in larger, slower caches or main memory. This arrangement optimizes
memory access latency and overall system performance.