This document discusses distributed and clustered systems. It defines distributed systems as systems composed of independent computers that communicate over a network. Distributed systems can be client-server systems, where clients request resources from servers, or peer-to-peer systems, where users share resources directly. Clustered systems combine independent computers and shared storage to work together. They provide benefits like high performance, fault tolerance, and scalability.
This document discusses deadlocks in operating systems. It defines deadlock as when multiple processes are waiting for resources held by each other in a cyclic manner, resulting in none of the processes making progress. It provides examples and describes the four necessary conditions for deadlock to occur: mutual exclusion, hold and wait, no preemption, and circular wait. It also discusses methods for handling deadlocks, including prevention, avoidance, and recovery techniques like terminating processes or preempting resources.
This document discusses different approaches to memory management in operating systems. It begins by describing monoprogramming without swapping or paging, where one program uses all available memory at a time. It then describes multiprogramming using fixed memory partitions, either with separate queues for each partition or a single queue. The challenges of relocation and protection when programs are loaded at different addresses are also covered. Finally, it introduces the concepts of swapping and virtual memory for handling situations where not all active processes fit in main memory.
DSM system
Shared memory
On chip memory
Bus based multiprocessor
Working through cache
Write through cache
Write once protocol
Ring based multiprocessor
Protocol used
Similarities and differences b\w ring based and bus based
This document provides an overview of CPU scheduling algorithms in operating systems. It begins with basic concepts like CPU utilization and I/O burst cycles. It then discusses various scheduling criteria like throughput and turnaround time. Common scheduling algorithms covered include first-come first-served, shortest job first, round robin, priority scheduling, and multilevel queue scheduling. It also addresses thread scheduling, multiprocessor scheduling, and real-time scheduling. The objectives are to describe and evaluate various CPU scheduling algorithms based on scheduling criteria.
This document discusses processes and threads in Perl programming. It defines a process as an instance of a running program, while a thread is a flow of control through a program with a single execution point. Multiple threads can run within a single process and share resources, while processes run independently. The document compares processes and threads, and covers creating and managing threads, sharing data between threads, synchronization, and inter-process communication techniques in Perl like fork, pipe, and open.
This document discusses processes from an operating systems perspective. It defines a process as a program in execution that must progress sequentially. A process contains code, activity, stack, data, and heap. It exists as an active entity in memory versus a passive program on disk. Key process concepts covered include process state, the process control block (PCB), CPU scheduling, and operations like creation, termination, and communication between processes.
This document discusses file systems and file management. It begins by defining key file concepts like file attributes and operations. It then covers topics like access methods, directory structures, file sharing, protection, and file system implementation details. The objectives are to explain file system functions, describe interfaces, discuss design tradeoffs for components like access methods and directories, and explore file system protection.
A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.
This document discusses interprocess communication (IPC) and message passing in distributed systems. It covers key topics such as:
- The two main approaches to IPC - shared memory and message passing
- Desirable features of message passing systems like simplicity, uniform semantics, efficiency, reliability, correctness, flexibility, security, and portability
- Issues in message passing IPC like message format, synchronization methods (blocking vs. non-blocking), and buffering strategies
Distributed Mutual Exclusion and Distributed Deadlock DetectionSHIKHA GAUTAM
This document summarizes key concepts related to distributed mutual exclusion and distributed deadlock detection. It discusses classification of distributed mutual exclusion algorithms into token-based and non-token-based approaches. For distributed mutual exclusion, it describes Lamport's algorithm, Ricart-Agrawala algorithm, Maekawa's quorum-based algorithm, and Suzuki-Kasami's token-based broadcast algorithm. It also discusses requirements for mutual exclusion such as freedom from deadlock and starvation. For distributed deadlock detection, it mentions the system model and types of deadlocks as well as approaches for prevention, avoidance, detection, and resolution of deadlocks.
Round Robin is a preemptive scheduling algorithm where each process is allocated an equal time slot or time quantum to execute before being preempted. It is designed for time-sharing to ensure all processes are given a fair share of CPU time without starvation. The process is added to the back of the ready queue when its time slice expires. It provides low response time on average but increased context switching overhead compared to non-preemptive algorithms. The time quantum value impacts both processor utilization and response time.
This document discusses different distributed computing system (DCS) models:
1. The minicomputer model consists of a few minicomputers with remote access allowing resource sharing.
2. The workstation model consists of independent workstations scattered throughout a building where users log onto their home workstation.
3. The workstation-server model includes minicomputers, diskless and diskful workstations, and centralized services like databases and printing.
It provides an overview of the key characteristics and advantages of different DCS models.
This document provides an overview of optimizing IP for IoT networks. It discusses how IP can be adapted or adopted for devices. It also describes constraints of IoT nodes and networks and how IP is optimized through protocols like 6LoWPAN, 6TiSCH, and RPL. It covers adaptation layers, packet headers, forwarding methods, and scheduling in these protocols. Authentication, application protocols like MQTT and CoAP, and the work of IETF groups on standards for IoT are also summarized.
This document discusses threads and multithreaded programming. It covers thread libraries like Pthreads, Windows threads and Java threads. It also discusses implicit threading using thread pools, OpenMP and Grand Central Dispatch. Issues with multithreaded programming like signal handling, thread cancellation and thread-local storage are examined. Finally, thread implementation in Windows and Linux is overviewed.
The document discusses error control and congestion control in TCP. It provides background on ARQ error control techniques, including positive acknowledgments, negative acknowledgments, and selective acknowledgments. It describes the timers used in TCP, including the retransmission timer, and how the retransmission timer value is adaptive based on round-trip time measurements. It also discusses congestion control algorithms used in TCP, including slow start, congestion avoidance, fast retransmit, and fast recovery, and how the congestion window size is adjusted in response to packet losses.
This document discusses current trends in high performance computing. It begins with an introduction to high performance computing and its applications in science, engineering, business analysis, and more. It then discusses why high performance computing is needed due to changes in scientific discovery, the need to solve larger problems, and modern business needs. The document also discusses the top 500 supercomputers in the world and provides examples of some of the most powerful systems. It then covers performance development trends and challenges in increasing processor speeds. The rest of the document discusses parallel computing approaches using multi-core and many-core architectures, as well as cluster, grid, and cloud computing models for high performance.
There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages.
Platforms that provide a shared data space are called shared-address-space machines or multiprocessors.
Platforms that support messaging are also called message passing platforms or multicomputers.
Message and Stream Oriented CommunicationDilum Bandara
Message and Stream Oriented Communication in distributed systems. Persistent vs. Transient Communication. Event queues, Pub/sub networks, MPI, Stream-based communication, Multicast communication
RPC allows a program to call a subroutine that resides on a remote machine. When a call is made, the calling process is suspended and execution takes place on the remote machine. The results are then returned. This makes the remote call appear local to the programmer. RPC uses message passing to transmit information between machines and allows communication between processes on different machines or the same machine. It provides a simple interface like local procedure calls but involves more overhead due to network communication.
The document discusses key concepts related to distributed file systems including:
1. Files are accessed using location transparency where the physical location is hidden from users. File names do not reveal storage locations and names do not change when locations change.
2. Remote files can be mounted to local directories, making them appear local while maintaining location independence. Caching is used to reduce network traffic by storing recently accessed data locally.
3. Fault tolerance is improved through techniques like stateless server designs, file replication across failure independent machines, and read-only replication for consistency. Scalability is achieved by adding new nodes and using decentralized control through clustering.
The document discusses the World Wide Web (WWW) and Hypertext Transfer Protocol (HTTP). It describes the basic architecture of the WWW including clients, servers, web pages, and URLs. It explains that web pages can be static, dynamic, or active. The document then discusses HTTP in more detail, including how HTTP requests and responses are structured, how persistent connections work in HTTP 1.1, and how caching can improve performance.
Channelization is a multiple-access method in which the available bandwidth of a link is shared in time, frequency, or through code, between different stations. The three channelization protocols are FDMA, TDMA, and CDMA
Heterogeneous computing refers to systems that use more than one type of processor or core. It allows integration of CPUs and GPUs on the same bus, with shared memory and tasks. This is called the Heterogeneous System Architecture (HSA). The HSA aims to reduce latency between devices and make them more compatible for programming. Programming models for HSA include OpenCL, CUDA, and hUMA. Heterogeneous computing is used in platforms like smartphones, laptops, game consoles, and APUs from AMD. It provides benefits like increased performance, lower costs, and better battery life over traditional CPUs, but discrete CPUs and GPUs can provide more power and new software models are needed.
UNIT II PROCESS MANAGEMENT
Processes-Process Concept, Process Scheduling, Operations on Processes, Interprocess Communication; Threads- Overview, Multicore Programming, Multithreading Models; Windows 7 - Thread and SMP Management. Process Synchronization - Critical Section Problem, Mutex Locks, Semophores, Monitors; CPU Scheduling and Deadlocks.
Memory technology and optimization in Advance Computer ArchitechtureShweta Ghate
The document discusses various techniques to optimize computer memory performance. It begins by describing the memory hierarchy and characteristics of main memory technologies like SRAM and DRAM. It then discusses 11 advanced cache optimization techniques:
1) Using small, simple caches to reduce hit time.
2) Increasing cache bandwidth through techniques like pipelined, multibanked, and nonblocking caches.
3) Decreasing miss penalty through critical word first and merging write buffers.
4) Reducing miss rate via compiler optimizations and hardware/software prefetching.
The document analyzes each technique's impact on performance factors and implementation complexity. Generally, optimizations impact one factor but prefetching can reduce both misses and
Here are the key steps in designing a pipelined processor:
1. Identify the stages in the instruction execution pipeline, such as fetch, decode, execute, memory, writeback.
2. Associate functional units and resources with each pipeline stage. For example, register file access in execute stage.
3. Examine the datapath and control signals to ensure data and resource dependencies flow correctly through the pipeline with no conflicts.
4. Add pipeline registers between stages to break up instruction flow into discrete packets and enable overlapped execution.
5. Design pipeline control logic to assert appropriate control signals in each stage, such as read/write enables for register file.
6. Handle exceptions like hazards properly with
In operating system how frames are allocated and what is the algorithm of allocation of frames and also discussed about Thrashing for clear some ideas! . Thank u!.
1. Multicore processors require applications to be parallelized to avoid performance stagnation. Data management, programming models, and on-chip communication all impact performance.
2. Symmetric multiprocessors (SMPs) have uniform memory access times, while distributed memory systems have faster local memory access but slower remote access.
3. Shared memory architectures allow any processor to access any memory location directly using load/store instructions, while message passing involves explicit data transfer between processes using send and receive calls.
A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.
This document discusses interprocess communication (IPC) and message passing in distributed systems. It covers key topics such as:
- The two main approaches to IPC - shared memory and message passing
- Desirable features of message passing systems like simplicity, uniform semantics, efficiency, reliability, correctness, flexibility, security, and portability
- Issues in message passing IPC like message format, synchronization methods (blocking vs. non-blocking), and buffering strategies
Distributed Mutual Exclusion and Distributed Deadlock DetectionSHIKHA GAUTAM
This document summarizes key concepts related to distributed mutual exclusion and distributed deadlock detection. It discusses classification of distributed mutual exclusion algorithms into token-based and non-token-based approaches. For distributed mutual exclusion, it describes Lamport's algorithm, Ricart-Agrawala algorithm, Maekawa's quorum-based algorithm, and Suzuki-Kasami's token-based broadcast algorithm. It also discusses requirements for mutual exclusion such as freedom from deadlock and starvation. For distributed deadlock detection, it mentions the system model and types of deadlocks as well as approaches for prevention, avoidance, detection, and resolution of deadlocks.
Round Robin is a preemptive scheduling algorithm where each process is allocated an equal time slot or time quantum to execute before being preempted. It is designed for time-sharing to ensure all processes are given a fair share of CPU time without starvation. The process is added to the back of the ready queue when its time slice expires. It provides low response time on average but increased context switching overhead compared to non-preemptive algorithms. The time quantum value impacts both processor utilization and response time.
This document discusses different distributed computing system (DCS) models:
1. The minicomputer model consists of a few minicomputers with remote access allowing resource sharing.
2. The workstation model consists of independent workstations scattered throughout a building where users log onto their home workstation.
3. The workstation-server model includes minicomputers, diskless and diskful workstations, and centralized services like databases and printing.
It provides an overview of the key characteristics and advantages of different DCS models.
This document provides an overview of optimizing IP for IoT networks. It discusses how IP can be adapted or adopted for devices. It also describes constraints of IoT nodes and networks and how IP is optimized through protocols like 6LoWPAN, 6TiSCH, and RPL. It covers adaptation layers, packet headers, forwarding methods, and scheduling in these protocols. Authentication, application protocols like MQTT and CoAP, and the work of IETF groups on standards for IoT are also summarized.
This document discusses threads and multithreaded programming. It covers thread libraries like Pthreads, Windows threads and Java threads. It also discusses implicit threading using thread pools, OpenMP and Grand Central Dispatch. Issues with multithreaded programming like signal handling, thread cancellation and thread-local storage are examined. Finally, thread implementation in Windows and Linux is overviewed.
The document discusses error control and congestion control in TCP. It provides background on ARQ error control techniques, including positive acknowledgments, negative acknowledgments, and selective acknowledgments. It describes the timers used in TCP, including the retransmission timer, and how the retransmission timer value is adaptive based on round-trip time measurements. It also discusses congestion control algorithms used in TCP, including slow start, congestion avoidance, fast retransmit, and fast recovery, and how the congestion window size is adjusted in response to packet losses.
This document discusses current trends in high performance computing. It begins with an introduction to high performance computing and its applications in science, engineering, business analysis, and more. It then discusses why high performance computing is needed due to changes in scientific discovery, the need to solve larger problems, and modern business needs. The document also discusses the top 500 supercomputers in the world and provides examples of some of the most powerful systems. It then covers performance development trends and challenges in increasing processor speeds. The rest of the document discusses parallel computing approaches using multi-core and many-core architectures, as well as cluster, grid, and cloud computing models for high performance.
There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages.
Platforms that provide a shared data space are called shared-address-space machines or multiprocessors.
Platforms that support messaging are also called message passing platforms or multicomputers.
Message and Stream Oriented CommunicationDilum Bandara
Message and Stream Oriented Communication in distributed systems. Persistent vs. Transient Communication. Event queues, Pub/sub networks, MPI, Stream-based communication, Multicast communication
RPC allows a program to call a subroutine that resides on a remote machine. When a call is made, the calling process is suspended and execution takes place on the remote machine. The results are then returned. This makes the remote call appear local to the programmer. RPC uses message passing to transmit information between machines and allows communication between processes on different machines or the same machine. It provides a simple interface like local procedure calls but involves more overhead due to network communication.
The document discusses key concepts related to distributed file systems including:
1. Files are accessed using location transparency where the physical location is hidden from users. File names do not reveal storage locations and names do not change when locations change.
2. Remote files can be mounted to local directories, making them appear local while maintaining location independence. Caching is used to reduce network traffic by storing recently accessed data locally.
3. Fault tolerance is improved through techniques like stateless server designs, file replication across failure independent machines, and read-only replication for consistency. Scalability is achieved by adding new nodes and using decentralized control through clustering.
The document discusses the World Wide Web (WWW) and Hypertext Transfer Protocol (HTTP). It describes the basic architecture of the WWW including clients, servers, web pages, and URLs. It explains that web pages can be static, dynamic, or active. The document then discusses HTTP in more detail, including how HTTP requests and responses are structured, how persistent connections work in HTTP 1.1, and how caching can improve performance.
Channelization is a multiple-access method in which the available bandwidth of a link is shared in time, frequency, or through code, between different stations. The three channelization protocols are FDMA, TDMA, and CDMA
Heterogeneous computing refers to systems that use more than one type of processor or core. It allows integration of CPUs and GPUs on the same bus, with shared memory and tasks. This is called the Heterogeneous System Architecture (HSA). The HSA aims to reduce latency between devices and make them more compatible for programming. Programming models for HSA include OpenCL, CUDA, and hUMA. Heterogeneous computing is used in platforms like smartphones, laptops, game consoles, and APUs from AMD. It provides benefits like increased performance, lower costs, and better battery life over traditional CPUs, but discrete CPUs and GPUs can provide more power and new software models are needed.
UNIT II PROCESS MANAGEMENT
Processes-Process Concept, Process Scheduling, Operations on Processes, Interprocess Communication; Threads- Overview, Multicore Programming, Multithreading Models; Windows 7 - Thread and SMP Management. Process Synchronization - Critical Section Problem, Mutex Locks, Semophores, Monitors; CPU Scheduling and Deadlocks.
Memory technology and optimization in Advance Computer ArchitechtureShweta Ghate
The document discusses various techniques to optimize computer memory performance. It begins by describing the memory hierarchy and characteristics of main memory technologies like SRAM and DRAM. It then discusses 11 advanced cache optimization techniques:
1) Using small, simple caches to reduce hit time.
2) Increasing cache bandwidth through techniques like pipelined, multibanked, and nonblocking caches.
3) Decreasing miss penalty through critical word first and merging write buffers.
4) Reducing miss rate via compiler optimizations and hardware/software prefetching.
The document analyzes each technique's impact on performance factors and implementation complexity. Generally, optimizations impact one factor but prefetching can reduce both misses and
Here are the key steps in designing a pipelined processor:
1. Identify the stages in the instruction execution pipeline, such as fetch, decode, execute, memory, writeback.
2. Associate functional units and resources with each pipeline stage. For example, register file access in execute stage.
3. Examine the datapath and control signals to ensure data and resource dependencies flow correctly through the pipeline with no conflicts.
4. Add pipeline registers between stages to break up instruction flow into discrete packets and enable overlapped execution.
5. Design pipeline control logic to assert appropriate control signals in each stage, such as read/write enables for register file.
6. Handle exceptions like hazards properly with
In operating system how frames are allocated and what is the algorithm of allocation of frames and also discussed about Thrashing for clear some ideas! . Thank u!.
1. Multicore processors require applications to be parallelized to avoid performance stagnation. Data management, programming models, and on-chip communication all impact performance.
2. Symmetric multiprocessors (SMPs) have uniform memory access times, while distributed memory systems have faster local memory access but slower remote access.
3. Shared memory architectures allow any processor to access any memory location directly using load/store instructions, while message passing involves explicit data transfer between processes using send and receive calls.
This document discusses parallel processors and multicore architecture. It begins with an introduction to parallel processors, including concurrent access to memory and cache coherency. It then discusses multicore architecture, where a single physical processor contains the logic of two or more cores. This allows increasing processing power while keeping clock speeds and power consumption lower than would be needed for a single high-speed core. Cache coherence methods like write-through, write-back, and directory-based approaches are also summarized for maintaining consistency across cores' caches when accessing shared memory.
The document discusses different cache coherence protocols used in multi-core systems, including snooping-based and directory-based approaches. It describes the MSI protocol, a 3-state snooping protocol, and the MESI protocol, a 4-state optimization. It also covers the Dragon update protocol, a 4-state write-back update snooping approach, and compares the number of bus transactions required for different protocols and memory access patterns.
This document summarizes two shared memory architectures - bus-based and directory-based. It describes:
1) Bus-based architectures have CPUs, caches and shared memory connected by a shared bus. The bus bandwidth limits scalability. It discusses the memory coherence problem and snooping protocols like MESI to address it.
2) Directory-based architectures avoid broadcast snooping and scale better using point-to-point messaging. Each block tracks its presence in caches using a directory with processor bits. It brings coherence through directory lookups and targeted invalidations.
This document discusses cache hierarchies and techniques to improve cache performance. It covers types of cache misses like compulsory, capacity and conflict misses. Methods to reduce miss rates include using larger block sizes, caches and associativity. Victim caches and prefetching can help tolerate miss penalties. The document provides examples of Intel cache designs including a dual-core chip with private 12MB L3 caches and an 80-core prototype with an entire die of stacked SRAM cache. Non-uniform cache architectures are discussed as an approach for large multi-megabyte caches.
The document discusses snooping cache coherence protocols. It covers key design issues like when memory gets updated, who responds to requests, and optimizations like adding states. It then describes several specific protocols: a 4-state protocol, the MESI protocol, an update protocol called Dragon, and issues around implementing these with split transaction buses. Key challenges addressed include non-atomic transitions, livelock, and handling multiple outstanding requests.
This document summarizes shared memory architectures. It describes shared memory systems as having all processors share a global memory, with communication and synchronization occurring through reads and writes to that memory. It then describes two main challenges - contention when multiple processors access shared memory simultaneously, and coherence issues that can arise when multiple copies of data exist in caches. The document proceeds to classify shared memory systems into uniform memory access (UMA), non-uniform memory access (NUMA), and cache-only memory architecture (COMA). It also discusses bus-based symmetric multiprocessors and techniques for maintaining cache coherence.
Memory and Cache Coherence in Multiprocessor System.pdfrajaratna4
In a multiprocessor system with multiple CPUs sharing main memory:
- Each CPU can access a unique logical address space mapped to physical memory distributed among processors. Processes communicate through shared memory using load/store operations.
- Cache coherence issues can arise if multiple caches possess different copies of the same memory block. Coherence ensures changes to shared data are propagated throughout the system.
- Directory-based, snooping, and snarfing are three types of coherence mechanisms used with protocols like MSI, MOSI, MESI, and MOESI to maintain consistency between caches and main memory.
Cache coherence is an issue that arises in multiprocessing systems where multiple processors have cached copies of shared memory locations. If a processor modifies its local copy, it can create an inconsistent global view of memory.
There are two main approaches to maintaining cache coherence - snoopy bus protocols and directory schemes. Snoopy bus protocols use a shared bus for processors to monitor memory transactions and invalidate local copies when needed. Directory schemes track which processors are sharing each block of data using a directory structure.
One common snoopy protocol is MESI, which uses cache states of Modified, Exclusive, Shared, and Invalid to track the ownership of cache lines and ensure coherency is maintained when a line is modified.
This document provides an introduction to multiprocessor systems and discusses different multiprocessor architectures including shared memory, distributed memory, and distributed shared memory systems. It describes the key differences between Uniform Memory Access (UMA) and Non-Uniform Memory Access (NUMA) models. Cache coherence problems that can arise in shared memory systems are discussed along with solutions like snooping and directory-based cache coherence protocols.
Flynn's taxonomy classifies computer architectures based on the number of instruction and data streams. The main categories are:
1) SISD - Single instruction, single data stream (von Neumann architecture)
2) SIMD - Single instruction, multiple data streams (vector/MMX processors)
3) MIMD - Multiple instruction, multiple data streams (most multiprocessors including multi-core)
Multiprocessor architectures can be organized as shared memory (SMP/UMA) or distributed memory (message passing/DSM). Shared memory allows automatic sharing but can have memory contention issues, while distributed memory requires explicit communication but scales better. Achieving high parallel performance depends on minimizing sequential
The document discusses cache coherence and different cache coherence protocols. Cache coherence is needed when multiple processor cores have their own caches to access shared memory. Incorrect execution could occur if the same data has different values in different caches. The document describes write-through, write-back, snoopy, and directory-based cache coherence protocols. Write-through updates memory on every write, while write-back only updates on cache eviction. Snoopy protocol uses a shared bus to monitor transactions. Directory-based protocol uses directories in memory to track the state of each block in large multiprocessor systems.
Cache coherence problem and its solutionsMajid Saleem
This document discusses cache coherence in shared memory multiprocessor systems. It defines cache coherence as ensuring changes to shared memory values are propagated throughout the system quickly. It describes two main approaches to maintaining cache coherence - software-based and hardware-based solutions. Hardware-based approaches can use either snooping or directory-based protocols. Snooping is used in low-end multiprocessors and involves broadcasting cache coherency messages on a shared bus. Directory-based protocols are used in higher-end systems and involve tracking the state of cached blocks in a directory.
This document discusses parallel processing and multiple processor architectures. It covers single instruction, single data stream (SISD); single instruction, multiple data stream (SIMD); multiple instruction, single data stream (MISD); and multiple instruction, multiple data stream (MIMD) architectures. It then discusses the taxonomy of parallel processor architectures including tightly coupled symmetric multiprocessors (SMPs), non-uniform memory access (NUMA) systems, and loosely coupled clusters. It covers parallel organizations for these different architectures.
This document discusses different types of parallel processing architectures including single instruction single data stream (SISD), single instruction multiple data stream (SIMD), multiple instruction single data stream (MISD), and multiple instruction multiple data stream (MIMD). It provides details on tightly coupled symmetric multiprocessors (SMPs) and non-uniform memory access (NUMA) systems. It also covers cache coherence protocols like MESI and approaches to improving processor performance through multithreading and chip multiprocessing.
We introduce the Gaussian process (GP) modeling module developed within the UQLab software framework. The novel design of the GP-module aims at providing seamless integration of GP modeling into any uncertainty quantification workflow, as well as a standalone surrogate modeling tool. We first briefly present the key mathematical tools on the basis of GP modeling (a.k.a. Kriging), as well as the associated theoretical and computational framework. We then provide an extensive overview of the available features of the software and demonstrate its flexibility and user-friendliness. Finally, we showcase the usage and the performance of the software on several applications borrowed from different fields of engineering. These include a basic surrogate of a well-known analytical benchmark function; a hierarchical Kriging example applied to wind turbine aero-servo-elastic simulations and a more complex geotechnical example that requires a non-stationary, user-defined correlation function. The GP-module, like the rest of the scientific code that is shipped with UQLab, is open source (BSD license).
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...Infopitaara
A feed water heater is a device used in power plants to preheat water before it enters the boiler. It plays a critical role in improving the overall efficiency of the power generation process, especially in thermal power plants.
🔧 Function of a Feed Water Heater:
It uses steam extracted from the turbine to preheat the feed water.
This reduces the fuel required to convert water into steam in the boiler.
It supports Regenerative Rankine Cycle, increasing plant efficiency.
🔍 Types of Feed Water Heaters:
Open Feed Water Heater (Direct Contact)
Steam and water come into direct contact.
Mixing occurs, and heat is transferred directly.
Common in low-pressure stages.
Closed Feed Water Heater (Surface Type)
Steam and water are separated by tubes.
Heat is transferred through tube walls.
Common in high-pressure systems.
⚙️ Advantages:
Improves thermal efficiency.
Reduces fuel consumption.
Lowers thermal stress on boiler components.
Minimizes corrosion by removing dissolved gases.
ELectronics Boards & Product Testing_Shiju.pdfShiju Jacob
This presentation provides a high level insight about DFT analysis and test coverage calculation, finalizing test strategy, and types of tests at different levels of the product.
The Fluke 925 is a vane anemometer, a handheld device designed to measure wind speed, air flow (volume), and temperature. It features a separate sensor and display unit, allowing greater flexibility and ease of use in tight or hard-to-reach spaces. The Fluke 925 is particularly suitable for HVAC (heating, ventilation, and air conditioning) maintenance in both residential and commercial buildings, offering a durable and cost-effective solution for routine airflow diagnostics.
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYijscai
With the increased use of Artificial Intelligence (AI) in malware analysis there is also an increased need to
understand the decisions models make when identifying malicious artifacts. Explainable AI (XAI) becomes
the answer to interpreting the decision-making process that AI malware analysis models use to determine
malicious benign samples to gain trust that in a production environment, the system is able to catch
malware. With any cyber innovation brings a new set of challenges and literature soon came out about XAI
as a new attack vector. Adversarial XAI (AdvXAI) is a relatively new concept but with AI applications in
many sectors, it is crucial to quickly respond to the attack surface that it creates. This paper seeks to
conceptualize a theoretical framework focused on addressing AdvXAI in malware analysis in an effort to
balance explainability with security. Following this framework, designing a machine with an AI malware
detection and analysis model will ensure that it can effectively analyze malware, explain how it came to its
decision, and be built securely to avoid adversarial attacks and manipulations. The framework focuses on
choosing malware datasets to train the model, choosing the AI model, choosing an XAI technique,
implementing AdvXAI defensive measures, and continually evaluating the model. This framework will
significantly contribute to automated malware detection and XAI efforts allowing for secure systems that
are resilient to adversarial attacks.
Fluid mechanics is the branch of physics concerned with the mechanics of fluids (liquids, gases, and plasmas) and the forces on them. Originally applied to water (hydromechanics), it found applications in a wide range of disciplines, including mechanical, aerospace, civil, chemical, and biomedical engineering, as well as geophysics, oceanography, meteorology, astrophysics, and biology.
It can be divided into fluid statics, the study of various fluids at rest, and fluid dynamics.
Fluid statics, also known as hydrostatics, is the study of fluids at rest, specifically when there's no relative motion between fluid particles. It focuses on the conditions under which fluids are in stable equilibrium and doesn't involve fluid motion.
Fluid kinematics is the branch of fluid mechanics that focuses on describing and analyzing the motion of fluids, such as liquids and gases, without considering the forces that cause the motion. It deals with the geometrical and temporal aspects of fluid flow, including velocity and acceleration. Fluid dynamics, on the other hand, considers the forces acting on the fluid.
Fluid dynamics is the study of the effect of forces on fluid motion. It is a branch of continuum mechanics, a subject which models matter without using the information that it is made out of atoms; that is, it models matter from a macroscopic viewpoint rather than from microscopic.
Fluid mechanics, especially fluid dynamics, is an active field of research, typically mathematically complex. Many problems are partly or wholly unsolved and are best addressed by numerical methods, typically using computers. A modern discipline, called computational fluid dynamics (CFD), is devoted to this approach. Particle image velocimetry, an experimental method for visualizing and analyzing fluid flow, also takes advantage of the highly visual nature of fluid flow.
Fundamentally, every fluid mechanical system is assumed to obey the basic laws :
Conservation of mass
Conservation of energy
Conservation of momentum
The continuum assumption
For example, the assumption that mass is conserved means that for any fixed control volume (for example, a spherical volume)—enclosed by a control surface—the rate of change of the mass contained in that volume is equal to the rate at which mass is passing through the surface from outside to inside, minus the rate at which mass is passing from inside to outside. This can be expressed as an equation in integral form over the control volume.
The continuum assumption is an idealization of continuum mechanics under which fluids can be treated as continuous, even though, on a microscopic scale, they are composed of molecules. Under the continuum assumption, macroscopic (observed/measurable) properties such as density, pressure, temperature, and bulk velocity are taken to be well-defined at "infinitesimal" volume elements—small in comparison to the characteristic length scale of the system, but large in comparison to molecular length scale
Passenger car unit (PCU) of a vehicle type depends on vehicular characteristics, stream characteristics, roadway characteristics, environmental factors, climate conditions and control conditions. Keeping in view various factors affecting PCU, a model was developed taking a volume to capacity ratio and percentage share of particular vehicle type as independent parameters. A microscopic traffic simulation model VISSIM has been used in present study for generating traffic flow data which some time very difficult to obtain from field survey. A comparison study was carried out with the purpose of verifying when the adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and multiple linear regression (MLR) models are appropriate for prediction of PCUs of different vehicle types. From the results observed that ANFIS model estimates were closer to the corresponding simulated PCU values compared to MLR and ANN models. It is concluded that the ANFIS model showed greater potential in predicting PCUs from v/c ratio and proportional share for all type of vehicles whereas MLR and ANN models did not perform well.
In tube drawing process, a tube is pulled out through a die and a plug to reduce its diameter and thickness as per the requirement. Dimensional accuracy of cold drawn tubes plays a vital role in the further quality of end products and controlling rejection in manufacturing processes of these end products. Springback phenomenon is the elastic strain recovery after removal of forming loads, causes geometrical inaccuracies in drawn tubes. Further, this leads to difficulty in achieving close dimensional tolerances. In the present work springback of EN 8 D tube material is studied for various cold drawing parameters. The process parameters in this work include die semi-angle, land width and drawing speed. The experimentation is done using Taguchi’s L36 orthogonal array, and then optimization is done in data analysis software Minitab 17. The results of ANOVA shows that 15 degrees die semi-angle,5 mm land width and 6 m/min drawing speed yields least springback. Furthermore, optimization algorithms named Particle Swarm Optimization (PSO), Simulated Annealing (SA) and Genetic Algorithm (GA) are applied which shows that 15 degrees die semi-angle, 10 mm land width and 8 m/min drawing speed results in minimal springback with almost 10.5 % improvement. Finally, the results of experimentation are validated with Finite Element Analysis technique using ANSYS.
Concept of Problem Solving, Introduction to Algorithms, Characteristics of Algorithms, Introduction to Data Structure, Data Structure Classification (Linear and Non-linear, Static and Dynamic, Persistent and Ephemeral data structures), Time complexity and Space complexity, Asymptotic Notation - The Big-O, Omega and Theta notation, Algorithmic upper bounds, lower bounds, Best, Worst and Average case analysis of an Algorithm, Abstract Data Types (ADT)
Value Stream Mapping Worskshops for Intelligent Continuous SecurityMarc Hornbeek
This presentation provides detailed guidance and tools for conducting Current State and Future State Value Stream Mapping workshops for Intelligent Continuous Security.
1. Unit III
Multiprocessors and Thread-Level Parallelism
By
N.R.Rejin Paul
Lecturer/VIT/CSE
CS2354 Advanced Computer Architecture
2. 2
Chapter 6. Multiprocessors and
Thread-Level Parallelism
6.1 Introduction
6.2 Characteristics of Application Domains
6.3 Symmetric Shared-Memory Architectures
6.4 Performance of Symmetric Shared-Memory
Multiprocessors
6.5 Distributed Shared-Memory Architectures
6.6 Performance of Distributed Shared-Memory
Multiprocessors
6.7 Synchronization
6.8 Models of Memory Consistency: An Introduction
6.9 Multithreading: Exploiting Thread-Level Parallelism
within a Processor
3. 3
Taxonomy of Parallel Architectures
Flynn Categories
• SISD (Single Instruction Single Data)
– Uniprocessors
• MISD (Multiple Instruction Single Data)
– ???; multiple processors on a single data stream
• SIMD (Single Instruction Multiple Data)
– same instruction executed by multiple processors using different data streams
• Each processor has its data memory (hence multiple data)
• There’s a single instruction memory and control processor
– Simple programming model, Low overhead, Flexibility
– (Phrase reused by Intel marketing for media instructions ~ vector)
– Examples: vector architectures, Illiac-IV, CM-2
• MIMD (Multiple Instruction Multiple Data)
– Each processor fetches its own instructions and operates on its own data
– MIMD current winner: Concentrate on major design emphasis <= 128 processors
• Use off-the-shelf microprocessors: cost-performance advantages
• Flexible: high performance for one application, running many tasks simultaneously
– Examples: Sun Enterprise 5000, Cray T3D, SGI Origin
4. 4
MIMD Class 1:
Centralized shared-memory multiprocessor
share a single centralized memory, interconnect processors and memory by a bus
• also known as “uniform memory access” time taken to access from all processor
to memory is same (UMA) or
“symmetric (shared-memory) multiprocessor” (SMP)
– A symmetric relationship to all processors
– A uniform memory access time from any processor
• scalability problem: less attractive for large-scale processors
5. 5
MIMD Class 2:
Distributed-memory multiprocessor
memory modules associated with CPUs
• Advantages:
– cost-effective way to scale memory bandwidth
– lower memory latency for local memory access
• Drawbacks
– longer communication latency for communicating data between processors
– software model more complex
6. 6
6.3 Symmetric Shared-Memory Architectures
Each processor have same relationship to single memory
usually supports caching both private data and shared data
Caching in shared-memory machines
• private data: data used by a single processor
– When a private item is cached, its location is migrated to the cache
– Since no other processor uses the data, the program behavior is identical to that
in a uniprocessor
• shared data: data used by multiple processor
– When shared data are cached, the shared value may be replicated in multiple
caches
– advantages: reduce access latency and fulfill bandwidth requirements, due to
difference in communication for load store and strategy to write from caches
values form diff. caches may not be consistent
– induce a new problem: cache coherence
Coherence cache provides:
• migration: a data item can be moved to a local cache and used there in a
transparent fashion
• replication for shared data that are being simultaneously read
both are critical to performance in accessing shared data
7. 7
Multiprocessor Cache Coherence Problem
• Informally:
– “memory system is coherent if Any read must return the most recent write”
– Coherent – defines what value can be returned by a read
– Consistency – that determines when a return value will be returned by a read
– Too strict and too difficult to implement
• Better:
– Write propagation : value return must visible to other caches “Any write must
eventually be seen by a read”
– All writes are seen in proper order by all caches(“serialization”)
• Two rules to ensure this:
– “If P writes x and then P1 reads it, P’s write will be seen by P1 if the read and
write are sufficiently far apart”
– Writes to a single location are serialized: seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order
(could see older value after a newer value)
8. 8
Example Cache Coherence Problem
– Processors see different values for u after event 3
I/O devices
Memory
P1
$ $ $
P2 P3
5
u = ?
4
u = ?
u:5
1
u :5
2
u :5
3
u= 7
9. 9
Defining Coherent Memory System
1. Preserve Program Order: A read by processor P to location X
that follows a write by P to X, with no writes of X by another
processor occurring between the write and the read by P, always
returns the value written by P
2. Coherent view of memory: Read by a processor to location X that
follows a write by another processor to X returns the written value
if the read and write are sufficiently separated in time and no
other writes to X occur between the two accesses
3. Write serialization: 2 writes to same location by any 2 processors
are seen in the same order by all processors
– For example, if the values 1 and then 2 are written to a
location X by P1 and P2, processors can never read the value
of the location X as 2 and then later read it as 1
10. 10
Basic Schemes for Enforcing Coherence
• Program on multiple processors will normally have copies of the
same data in several caches
• Rather than trying to avoid sharing in SW,
SMPs use a HW protocol to maintain coherent caches
–Migration and Replication key to performance of shared data
• Migration - data can be moved to a local cache and used there in
a transparent fashion
–Reduces both latency to access shared data that is allocated
remotely and bandwidth demand on the shared memory
• Replication – for shared data being simultaneously read, since
caches make a copy of data in local cache
–Reduces both latency of access and contention for reading
shared data
11. 11
2 Classes of Cache Coherence Protocols
1. Snooping — Every cache with a copy of data also has a copy of
sharing status of block, but no centralized state is kept
• All caches are accessible via some broadcast medium (a bus or switch)
• All cache controllers monitor or snoop on the medium to determine
whether or not they have a copy of a block that is requested on a bus or
switch access
2. Directory based — Sharing status of a block of physical memory
is kept in just one location, the directory
12. 12
Snoopy Cache-Coherence Protocols
• Cache Controller “snoops” all transactions on the shared
medium (bus or switch)
– relevant transaction if for a block it contains
– take action to ensure coherence
• invalidate, update, or supply value
– depends on state of the block and the protocol
• Either get exclusive access before write via write
invalidate or update all copies on write
State
Address (tag)
Data
I/O devices
Mem
P1
$
Bus snoop
$
P
n
Cache-memory
transaction
13. 13
Example: Write-thru Invalidate
• Must invalidate before step 3
• Write update uses more broadcast medium BW
all recent MPUs use write invalidate
I/O devices
Memory
P1
$ $ $
P2 P3
5
u = ?
4
u = ?
u:5
1
u :5
2
u :5
3
u= 7
u = 7
14. 14
Two Classes of Cache Coherence Protocols
•Snooping Solution (Snoopy Bus)
– Send all requests for data to all processors
– Processors snoop to see if they have a copy and respond accordingly
– Requires broadcast, since caching information is at processors
– Works well with bus (natural broadcast medium)
– Dominates for small scale machines (most of the market)
•Directory-Based Schemes (Section 6.5)
– Directory keeps track of what is being shared in a centralized place
– Distributed memory => distributed directory for scalability
(avoids bottlenecks)
– Send point-to-point requests to processors via network
– Scales better than Snooping
– Actually existed BEFORE Snooping-based schemes
15. 15
Basic Snoopy Protocols
• Write strategies
– Write-through: memory is always up-to-date
–Write-back: snoop in caches to find most recent copy
There are two ways to maintain coherence requirements using snooping protocols
• Write Invalidate Protocol
– Multiple readers, single writer
– Write to shared data: an invalidate is sent to all caches which snoop and
invalidate any copies
• Read miss: further read will miss in the cache and fetch a new copy of the data
• Write Broadcast/Update Protocol
– Write to shared data: broadcast on bus, processors snoop, and update any
copies
– Read miss: memory/cache is always up-to-date
• Write serialization: bus serializes requests!
–Bus is single point of arbitration
16. 16
Examples of Basic Snooping Protocols
Assume neither cache initially holds X and the value of X in memory is 0
Write Invalidate
Write Update
17. 17
An Example Snoopy Protocol
Invalidation protocol, write-back cache
• Each cache block is in one state (track these):
– Shared : block can be read
– OR Exclusive : cache has only copy, its writeable, and dirty
– OR Invalid : block contains no data
– an extra state bit (shared/exclusive) associated with a valid bit and a
dirty bit for each block
• Each block of memory is in one state:
– Clean in all caches and up-to-date in memory (Shared)
– OR Dirty in exactly one cache (Exclusive)
– OR Not in any caches
• Each processor snoops every address placed on the bus
– If a processor finds that is has a dirty copy of the requested cache block,
it provides that cache block in response to the read request
18. 18
Cache Coherence Mechanism of the Example
Placing a write miss on the bus when a write hits in the shared state ensures an
exclusive copy (data not transferred)
19. 19
Figure 6.11 State Transitions for Each Cache Block
•CPU may read/write hit/miss to the block
•May place write/read miss on bus
•May receive read/write miss from bus
Requests from CPU Requests from bus
21. 21
6.5 Distributed Shared-Memory Architectures
Distributed shared-memory architectures
• Separate memory per processor
– Local or remote access via memory controller
– The physical address space is statically distributed
Coherence Problems
• Simple approach: uncacheable
– shared data are marked as uncacheable and only private data are kept in caches
– very long latency to access memory for shared data
• Alternative: directory for memory blocks
– The directory per memory tracks state of every block in every cache
• which caches have a copies of the memory block, dirty vs. clean, ...
– Two additional complications
• The interconnect cannot be used as a single point of arbitration like the bus
• Because the interconnect is message oriented, many messages must have
explicit responses
22. 22
Distributed Directory Multiprocessor
To prevent directory becoming the bottleneck, we distribute directory entries with
memory, each keeping track of which processors have copies of their memory blocks
23. 23
Directory Protocols
• Similar to Snoopy Protocol: Three states
– Shared: 1 or more processors have the block cached, and the value in memory is
up-to-date (as well as in all the caches)
– Uncached: no processor has a copy of the cache block (not valid in any cache)
– Exclusive: Exactly one processor has a copy of the cache block, and it has
written the block, so the memory copy is out of date
• The processor is called the owner of the block
• In addition to tracking the state of each cache block, we must track
the processors that have copies of the block when it is shared
(usually a bit vector for each memory block: 1 if processor has copy)
• Keep it simple(r):
– Writes to non-exclusive data
=> write miss
– Processor blocks until access completes
– Assume messages received and acted upon in order sent
24. 24
Messages for Directory Protocols
•local node: the node where a request originates
•home node: the node where the memory location and directory entry of an address reside
•remote node: the node that has a copy of a cache block (exclusive or shared)
25. 25
State Transition Diagram
for Individual Cache Block
• Comparing to snooping protocols:
– identical states
– stimulus is almost identical
– write a shared cache block is
treated as a write miss (without
fetch the block)
– cache block must be in exclusive
state when it is written
– any shared block must be up to
date in memory
• write miss: data fetch and selective
invalidate operations sent by the
directory controller (broadcast in
snooping protocols)
26. 26
State Transition Diagram for
the Directory
Figure 6.29
Transition
diagram for
cache block
Three requests: read miss,
write miss and data write back
27. 27
Directory Operations: Requests and Actions
• Message sent to directory causes two actions:
– Update the directory
– More messages to satisfy request
• Block is in Uncached state: the copy in memory is the current value;
only possible requests for that block are:
– Read miss: requesting processor sent data from memory &requestor made only
sharing node; state of block made Shared.
– Write miss: requesting processor is sent the value & becomes the Sharing node.
The block is made Exclusive to indicate that the only valid copy is cached.
Sharers indicates the identity of the owner.
• Block is Shared => the memory value is up-to-date:
– Read miss: requesting processor is sent back the data from memory &
requesting processor is added to the sharing set.
– Write miss: requesting processor is sent the value. All processors in the set
Sharers are sent invalidate messages, & Sharers is set to identity of requesting
processor. The state of the block is made Exclusive.
28. 28
Directory Operations: Requests and Actions(cont.)
• Block is Exclusive: current value of the block is held in the cache of
the processor identified by the set Sharers (the owner) => three
possible directory requests:
– Read miss: owner processor sent data fetch message, causing state of block in
owner’s cache to transition to Shared and causes owner to send data to
directory, where it is written to memory & sent back to requesting processor.
Identity of requesting processor is added to set Sharers, which still contains the
identity of the processor that was the owner (since it still has a readable copy).
State is shared.
– Data write-back: owner processor is replacing the block and hence must write it
back, making memory copy up-to-date
(the home directory essentially becomes the owner), the block is now
Uncached, and the Sharer set is empty.
– Write miss: block has a new owner. A message is sent to old owner causing the
cache to send the value of the block to the directory from which it is sent to the
requesting processor, which becomes the new owner. Sharers is set to identity
of new owner, and state of block is made Exclusive.
29. 29
Summary
Chapter 6. Multiprocessors and Thread-Level Parallelism
6.1 Introduction
6.2 Characteristics of Application Domains
6.3 Symmetric Shared-Memory Architectures
6.4 Performance of Symmetric Shared-Memory
Multiprocessors
6.5 Distributed Shared-Memory Architectures
6.6 Performance of Distributed Shared-Memory
Multiprocessors
6.7 Synchronization
6.8 Models of Memory Consistency: An Introduction
6.9 Multithreading: Exploiting Thread-Level Parallelism
within a Processor