Chap#8

General GPU Concepts

Q: What is a GPU?
A: A GPU (Graphics Processing Unit) is a specialized microprocessor for parallel processing, originally for graphics, now used for general-purpose computation. It offloads intensive tasks from the CPU by handling millions of instructions per second.

Q: Why are GPUs good for parallel operations?
A: GPUs excel at parallel operations because they are built with a massive number of smaller, simpler execution units (cores). This architecture allows them to perform many calculations simultaneously.

Q: What is the main architectural difference between a CPU and a GPU regarding cores?
A: CPUs have a few powerful cores optimized for serial tasks and low latency. GPUs have hundreds/thousands of smaller cores designed for high-throughput parallel processing of many simpler tasks.

Q: How do modern GPUs (like NVIDIA's) achieve high performance with many cores?
A: Modern GPUs achieve high performance through a large number of cores, each capable of handling multiple threads concurrently. They are highly optimized for parallel floating-point operations on large datasets.

Q: What is the basic idea behind GPU evolution for increased parallelism?
A: The core idea is to use a large quantity (hundreds or thousands) of simpler processing units. These units execute the same instruction simultaneously on different pieces of data (SIMD paradigm).

Q: Besides graphics, what other field uses GPUs extensively?
A: High-Performance Computing (HPC) extensively uses GPUs. Their massive parallel processing capabilities are ideal for complex scientific simulations and data analysis.

GPU Components & Architecture

Q: List 6 main components of a GPU.
A: Key GPU components include the Graphic Processor (the core engine), Frame Buffer (memory for display output), and dedicated high-speed Memory, Memory, Graphic Bios, Display connector, Computer connectors.

Q: What are the key components of GPU cores, and how are they structured for efficiency?
A: GPU cores replicate slim CPU elements like the Fetch/Decode unit, ALU (for calculations), and registers (for local data). To save space, the Fetch/Decode logic is often shared across multiple ALUs. A GPU is built using Compute Units (CUs), each containing several Processing Elements (PEs), which are the basic execution units.

GPU Memory Hierarchy

Q: Name the five main memory regions accessible from a single work item on a GPU.
A: The five main memory regions are Registers (fastest, per work-item), Local Memory (shared within a work-group), Texture Memory (optimized for spatial locality), Constant Memory (cached, read-only for kernels), and Global Memory (largest, accessible by host & device).

Q: What are Registers in the GPU memory hierarchy?
A: Registers are the fastest, smallest, and most immediate level of memory on a GPU. Each work-item has its own private set of dedicated registers for very quick data access.

Q: What is Global Memory on a GPU, and what are its characteristics?
A: Global Memory is the largest memory space on the GPU, accessible by both the GPU and the host (CPU). It offers high bandwidth but has higher latency compared to other on-chip memories.

Q: What is Constant Memory, and what are its special properties?
A: Constant Memory is a read-only memory region for kernels, optimized for data that doesn't change during kernel execution. It's cached and supports efficient broadcasting of values to many work-items.

Q: What is Local Memory in the GPU hierarchy?
A: Local Memory (also called shared memory) is a small, fast on-chip memory shared among work-items within the same work-group. It allows for efficient data sharing and communication between these work-items.

Q: When is Texture Memory beneficial?
A: Texture Memory is useful when nearby data is accessed together, like in images. It has specialized caching and addressing modes that can reduce memory traffic and improve performance.

GPU Programming Concepts (General & OpenCL)

Q: What is the "host" and "device" in GPU computing?
A: In GPU computing, the "host" is typically the main CPU and its memory system. The "device" refers to the GPU (or other accelerator) and its dedicated memory.

Q: What is a "Work-Item" in GPU computing?
A: A Work-Item is the most basic unit of execution in GPU computing, representing a single thread. Many work-items execute the same kernel code in parallel on different data.

Q: What is a "Work-Group"?
A: A Work-Group is a collection of work-items that are scheduled to run concurrently on a single Compute Unit. Work-items within a work-group can cooperate using shared local memory and synchronization.

Q: What is OpenCL?
A: OpenCL (Open Computing Language) is an open standard framework for writing programs that can execute across heterogeneous platforms. It allows developers to use CPUs, GPUs, DSPs, and FPGAs for parallel computing.

Q: What is an OpenCL Kernel?
A: An OpenCL Kernel is a function written in a C-based language that executes on an OpenCL device (e.g., GPU). Many instances of this kernel (work-items) run in parallel to process data.

Q: When writing OpenCL kernels, why is it important to specify memory address spaces like
A: Specifying address spaces like __global or __local is crucial because it tells the compiler where data resides (e.g., large off-chip memory vs. fast on-chip shared memory). This directly impacts performance and data accessibility for work-items.

Q: What is a "heterogeneous system" in the context of OpenCL?
A: A heterogeneous system in OpenCL consists of a host (CPU) connected to one or more OpenCL compute devices. These devices can be of different types, like GPUs, FPGAs, or DSPs.

Q: What is the general role of "Host Code" in OpenCL?
A: Host code, running on the CPU, manages the overall OpenCL application. It sets up devices, compiles kernels, manages memory transfers between host and device, and enqueues kernels for execution on the device.

Q: List 3 key steps the host code must perform to execute an OpenCL kernel.
A: 1. Discover and initialize OpenCL devices and create a context. 2. Compile the kernel source code into a program object. 3. Create memory buffers, transfer data to the device, set kernel arguments, and enqueue the kernel for execution.

Q: What is "occupancy" on a GPU, and why is it important?
A: Occupancy is the ratio of active work-groups to the maximum possible work-groups per compute unit. High occupancy is vital for performance as it helps hide memory latency by keeping the GPU's processing elements busy with other ready work.

MCQs

What does NDRange (N-Dimensional Range) define in OpenCL?
Answer: Total number of work-items executing a kernel

Which OpenCL function is used to discover available devices?
Answer: clGetDeviceIDs()

Which function retrieves detailed information about an OpenCL device?
Answer: clGetDeviceInfo()

What is the purpose of clCreateContext() in OpenCL?
Answer: To create an execution environment for OpenCL objects

What does clBuildProgram() do in OpenCL?
Answer: Compiles and links kernel source code into a program object

Which OpenCL function is commonly used to transfer data from host to device memory?
Answer: clEnqueueWriteBuffer()

How does an OpenCL kernel work-item know which data to process?
Answer: It queries its global ID using get_global_id()

Chap#10

Network topologies Definition: Network topologies define how nodes (processors/computers) are interconnected in parallel and distributed systems. The choice of topology affects performance, scalability, and cost. Key Metrics: Degree: Number of links per node. (Formula: deg = connections per node) Example: In a linear array, each node (except ends) has 2 links. Diameter: Longest shortest path between any two nodes. (Formula: diam = max distance) Example: Linear array with 8 nodes has diameter 7 (P₀ to P₇). Bisection Width: Minimum links to cut to split the network into two halves. (Formula: bw = min cuts) Example: Binary tree has bw=1 (cutting the root disconnects it).4 1. Linear Array Define : Nodes are connected one after another in a straight line. Each node (except the ends) connects to two neighbors one on the left and one on the right. Explanation : Simple to build and easy to understand, but not efficient for large networks. Long distance between farthest nodes makes comm...

Cover Letter

Search This Blog

Chap#8

Comments

Post a Comment

Popular posts from this blog

Chap#10

Ai Mental Health & Cyber Safety Presentation