Skip to main content

Chap#11

Understanding Fault Tolerance (Slides 1-3)

What is Fault Tolerant Distributed Computing and why is it important?

Fault Tolerant Distributed Computing is the design of distributed systems that can continue to function correctly and without interruption even when some components fail. Fault tolerance ensures the system remains operational, recovers from failures seamlessly, and maintains service availability without noticeable impact to the user.

Q: What are the common drawbacks or trade-offs of implementing fault tolerance?

A: Implementing fault tolerance can make the system slower, require more disk space, use more machines, and increase overall costs. There's always a trade-off between cost and the degree of fault tolerance.

Failure vs. Error (Slide 4)

Q: What is the difference between a system failure and an error?

Failure: Occurs when the system does not behave as expected (e.g., becomes unreachable or produces incorrect output).

Error: An incorrect state within the system that may lead to a failure. Errors can sometimes be detected and corrected before they cause a failure.

Phases of Fault Tolerance (Slide 5)

Q: What are the three main phases in handling faults for fault tolerance?

Error Detection: Identifying that an error has occurred.


Damage Confinement: Preventing the error from spreading to other parts of the system.


Error Recovery: Removing the error or its effects so the system can continue operating correctly.

Types of Faults (Slides 6-7)

Q: What are the three kinds of processor faults? Briefly describe each.

Fail Stop: The processor completely fails and stops responding. Other processors can usually detect this.


Slowdown: The processor works more slowly than usual or might stop working completely over time


Byzantine: The processor acts strangely. It might stop working, run slowly, or look normal but secretly give wrong results or try to mess up the work


Q: What are network faults? Give two examples.

Network faults occur when processors cannot communicate effectively. Examples include:

One-way Links: A processor can send messages, but another processor cannot receive them.


Network Partition: A part of the network becomes completely isolated from other parts.

Attributes of a Fault Tolerant System (Slide 8)

Q: What are four important attributes of a fault-tolerant system?

Availability: The system is ready and able to perform its functions at any given moment.


Reliability: The system can operate continuously without failure over a specified period.


Safety: If the system fails, it does so without causing major disasters or negatively impacting other systems.


Maintainability: Failures can be easily detected and repaired.

Types and Classification of Failure (Slides 9-10)

Q: Name a few types of server failures and what they mean.

Crash failure: The server was working fine but suddenly stops completely.

Omission failure: The server doesn’t reply to requests either it doesn’t get them or doesn’t send a response.

Timing failure: The server replies too soon or too late, not within the expected time.

Response failure: The server replies, but the answer is wrong.

Arbitrary failure (Byzantine): The server can do anything wrong at any time, including sending random or misleading responses.


Q: How are failures classified (e.g., by duration)?

A: Failures are classified as:

Transient: Appears once and then disappears on its own.


Intermittent: Appears, disappears, and then reappears repeatedly.


Permanent: Continues to exist until the faulty component is repaired or replaced.

Fault Tolerance Mechanisms (Slides 11-15)

Q: Name three main fault tolerance mechanisms in distributed systems.


Replication-based fault tolerance technique.


Process level redundancy technique.


Fusion-based redundancy technique.


Q: What is the core idea of replication-based fault tolerance?

A: To replicate (copy) data onto other machines or servers. If one machine/server fails, the system can continue using a replica, preventing a total system stop.


Q: What are two major problems or challenges with data replication?

Consistency: Ensuring all copies of the data remain consistent, especially when clients update data.


Degree of replica: Achieving high fault tolerance might require many replicas, increasing complexity and cost.

MCQs


What are transient faults?
A) Permanent hardware failures
B) Faults that disappear on their own
C) Faults caused by software bugs
D) Continuous power supply issues
Answer: B

Q2: Which of the following is a technique used to handle transient faults?
A) Memory paging
B) Load balancing
C) Comparing process outputs
D) Data encryption
Answer: C

Q3: What does the Checkpoint and Rollback technique do?
A) Speeds up processing
B) Encrypts all data
C) Saves and restores the system state
D) Increases memory usage
Answer: C

Q4: What issue does the fusion-based technique aim to solve in replication?
A) Data inconsistency
B) Network delay
C) High cost of multiple backups
D) Security vulnerabilities
Answer: C

Q5: What is a major drawback of the fusion-based technique?
A) Poor data accuracy
B) High recovery overhead
C) Increased energy use
D) Slow normal operation
Answer: B

Comments

Popular posts from this blog

Chap#10

Network topologies Definition: Network topologies define how nodes (processors/computers) are interconnected in parallel and distributed systems. The choice of topology affects performance, scalability, and cost. Key Metrics: Degree: Number of links per node. (Formula: deg = connections per node) Example: In a linear array, each node (except ends) has 2 links. Diameter: Longest shortest path between any two nodes. (Formula: diam = max distance) Example: Linear array with 8 nodes has diameter 7 (P₀ to P₇). Bisection Width: Minimum links to cut to split the network into two halves. (Formula: bw = min cuts) Example: Binary tree has bw=1 (cutting the root disconnects it).4 1. Linear Array Define : Nodes are connected one after another in a straight line. Each node (except the ends) connects to two neighbors one on the left and one on the right. Explanation : Simple to build and easy to understand, but not efficient for large networks. Long distance between farthest nodes makes comm...
Asymmetric-key algorithms are algorithms used in cryptography that use two different keys  a public key for encryption and a private key for decryption. These keys are mathematically related, but the private key cannot be easily derived from the public key. Types: RSA (Rivest–Shamir–Adleman): It uses large prime numbers to generate the key pair and supports both encryption and digital signatures DSA (Digital Signature Algorithm): DSA is primarily used for creating digital signatures, ensuring the authenticity. Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both encryption of plaintext and decryption of ciphertext  Types: Stream Cipher:  Stream Cipher Converts the plain text into cipher text by taking 1 byte of plain text at a time. Block cipher: Converts the plain text into cipher text by taking plain text's block at a time DES? DES stands for Data Encryption Standard . It is a symmetric-key algorithm used to enc...

Ai Mental Health & Cyber Safety Presentation

Module A - The Normalization Engine Linguistic Challenge: Roman Urdu lacks standardized orthography (e.g., "kesa" vs "kaisa"), creating orthographic "noise" that significantly degrades the accuracy of downstream AI models. Technical Role: Acts as a Sequence-to-Sequence (Seq2Seq) transliteration and lexical normalization layer to standardize inputs before analysis. Model: A specialized transformer architecture, specifically m2m100 fine-tuned on parallel corpora or UrduParaphraseBERT. Primary Dataset: Roman-Urdu-Parl (RUP). A large-scale parallel corpus of 6.37 million sentence pairs designed to support machine transliteration and word embedding training. Link: https://arxiv.org/abs/2503.21530 Outcome: Reduces orthographic noise by achieving up to 97.44% Char-BLEU accuracy for Roman-Urdu to Urdu conversion, ensuring Module B receives high-quality "clean" data for risk analysis. Module B - Risk Stratification (BERT) Heading: The "Safety ...