Skip to main content

Posts

Showing posts from April, 2025

Assignment questions

Q1. How can a company apply the CIA Triad (Confidentiality, Integrity, and Availability) to secure sensitive customer data? Companies can keep customer data safe by following three important rules: confidentiality, integrity, and availability. Confidentiality means keeping data private, like using passwords and encryption so only the right people can see it. For example, banks protect your personal info with strong security. Integrity means making sure the data is correct and not changed by anyone. Online stores, for example, check that your payment details are not changed during a purchase. Availability means making sure the data or service is always ready when people need it. Companies like Amazon use backup systems so their website keeps working even if something goes wrong. These three steps help protect data and keep customers safe. Q2. In the healthcare industry, how can organizations assess and manage cybersecurity risks such as ransomware attacks? Hospitals and clinics hav...

2nd long

How can a company teach its employees to avoid viruses, including those that use multi-threading or system weaknesses? Teach the Basics of Online Safety First, companies should teach their workers the basics of staying safe online. This means helping them spot fake emails, strange links, or unsafe downloads. Employees should also know how to make strong passwords and why they should not use the same password on different websites. These simple tips can stop many types of viruses. Explain How Systems Can Be Attacked Employees who work with technology, like developers or IT staff, should learn how hackers use system problems to break in. One example is multi-threading bugs, where two parts of a program run at the same time and cause errors. Hackers can use these errors to sneak into systems. If workers understand how these problems happen, they can build safer programs and protect the company. Use Practice Tests and Examples Another good way to teach is by using practice tests....

1st long) How can the CIA triad (Confidentiality, Integrity, Availability) be applied during the Security SDLC in the development of a mobile banking app? Provide examples of how each principle can be integrated into the app's security measures throughout the development lifecycle?

 The CIA Triad stands for Confidentiality, Integrity, and Availability . These are the three main goals of cybersecurity. They are very important during the Security Software Development Life Cycle (SDLC) , especially when creating something sensitive like a mobile banking app . These three principles help make sure the app is safe, trustworthy, and always working properly for users. Let’s see how each one is used during the development process, with examples. Confidentiality Confidentiality means keeping users' personal and financial information private . Only the right people should be able to see it. In a mobile banking app, this includes things like usernames, passwords, account numbers, and transaction details. During the planning and design stages , developers decide which data needs protection and how to protect it. They use encryption to hide the data when it is stored or sent across the internet. In the development stage , they add features like login with passwords, ...

Ch#4

1. What is data reduction? Answer: Data reduction is the process of reducing the volume of data while producing the same or similar analytical results. 2. Why do we need data reduction? Answer: Improves model performance (speed & accuracy) Helps in data visualization Reduces dimensionality Removes noise Leads to simpler, faster, and more accurate models 3. What is feature selection (aka attribute/variable selection)? Answer: It’s the process of selecting an optimal subset of features from the data that contribute most to the model, based on a specific evaluation criterion. 5. What are the main techniques for feature selection (data reduction)? Method Description Tools/Details Wrapper Method Uses a classifier to evaluate feature subsets based on their performance - Generates all possible subsets- Uses a search technique to find best one Filter Method Ranks features using an attribute evaluator and selects top-ranked ones - Doesn’t r...

Ch#3

1. What is data integration? Answer: Data integration is the process of collecting and combining data from different sources and bringing it together in a unified way so it can be analyzed, reported on, or used for decision-making. Think of it like gathering pieces of a puzzle from different boxes and putting them together to see the full picture. 2. What are common issues in data integration? Answer: Schema Integration:  Merging metadata from different sources Entity Identification:  Matching real-world entities (e.g.,  A.cust-id ≡ B.cust-# ) Data Value Conflicts:  Different units/scales (e.g., km vs miles) Redundant Data:  Same attributes with different names. Inconsistencies:  Conflicting or duplicated information 4. What is data transformation? Answer: Data transformation is the process of converting data from its original (raw) format into a format that is clean, consistent, and ready for analysis, mining, or storage. It often involves chang...

Ch#2

Important Questions & Answers from CH2: Data Cleaning 1. Why is data preprocessing important? Answer: Because real-world data is often incomplete, noisy, and inconsistent , and poor-quality data can lead to misleading results . Data preprocessing ensures that the data is clean, consistent, and ready for mining — it accounts for 80% of the KDD effort . 2. What are the major tasks in data preprocessing? Answer: Data Cleaning (handle missing, noisy, outlier data) Data Integration (combine data from multiple sources) Data Reduction (reduce volume while maintaining accuracy) Data Transformation (normalize, aggregate, discretize data) 3. What makes real-world data "dirty"? Answer: Incomplete: Missing values Noisy: Errors or outliers (e.g., negative salary) Inconsistent: Conflicting entries or formats (e.g., different rating scales or DOB mismatches) 4. How can we handle missing data? Answer: Ignore the record (not recommended) ...

Ch#1

Important Questions & Answers from CH1: Overview KDD_ML 1. What is KDD (Knowledge Discovery in Databases)? Answer: KDD is the automatic extraction of non-obvious, hidden knowledge from large volumes of data. It is an interactive and iterative process that includes selecting, cleaning, transforming data, and applying data mining. 2. How is data different from information and knowledge? Answer: Data: Raw facts (e.g., 10°C) Information: Processed data with meaning (e.g., "It’s cold") Knowledge: Integrated information with relationships and patterns (e.g., "People wear jackets when it’s 10°C") 3. Why is raw data rarely useful by itself? Answer: Raw data has no direct benefit unless we can extract useful information from it for decision support or deeper understanding. 4. What are the main benefits of Knowledge Discovery? Answer: Generate new insights Rapid response to problems Extract useful patterns Improve decision-making via system...

7th Chapter

  Static Load Balancing means that the workload (like user requests, data access, etc.) is divided manually or in a fixed way between multiple servers. Once set, this division doesn’t change automatically , even if one server gets overloaded or another stays underused. In load balancing , a deterministic method means that the system always sends a request to the same server based on a fixed rule. For example, if your computer’s IP address is used to decide where your request goes, you will always be sent to the same server every time. This makes the process predictable and stable. But the problem is — if that server gets too many heavy tasks or busy users, it can get overloaded, while other servers may not be doing much. On the other hand, a probabilistic method works more like rolling a dice. It doesn’t always send the request to the same server. Instead, it chooses one based on chances. For example, Server A might be picked 50% of the time, Server B 30%, and Server C 20%. Th...

Chapter 4 long

Introduction to Parallel Algorithm Models. Parallel algorithm models provide strategies for dividing a computational problem into smaller parts that can be processed simultaneously by multiple processors. The main goals are to speed up computation and handle larger problems. These models differ in how they divide the work (tasks or data) and manage the interaction between processors. Individual Model Descriptions 1. Data Parallel Model Core Concept: Performing the same operation on different pieces of data at the same time. How it Works: The main data set is divided into chunks. Multiple processors execute the exact same instruction or sequence of instructions, but each processor works on its own chunk of data. Think: "Single Instruction, Multiple Data" (SIMD). Key Characteristics: Focuses on distributing the data across processors. All processors typically run the same program/task. Good for problems with large amounts of data that need similar processing (e.g., image proc...

2nd Chapter

Definition of SISD: SISD (Single Instruction stream Single Data stream) is a type of computer architecture where one processor carries out one instruction at a time on a single piece of data. It follows a step-by-step processing method and is part of the Von Neumann model. Key Points about SISD: Single Processor : Only one processor is used to perform tasks. Single Instruction Stream : Instructions are handled one after another, not in parallel. Single Data Stream : Only one set of data is processed at any given time. Sequential Execution : All operations happen one by one, in a straight line. No Parallel Processing : It cannot handle multiple tasks at the same time. Used in Basic Systems : This model is used in older or simpler computers. Example of SISD: An example of an SISD system is the Intel 8085 microprocessor . It processes one instruction and one data set at a time, making it suitable for small, simple computing tasks.  Simple Definition: SIMD...

3rd chapter

 User-Level Threads:*   Managed by user-level libraries, not visible to the operating system.   🔹 Faster to create and manage, but can't run in parallel on multi-core CPUs. *Kernel-Level Threads:*   Managed directly by the operating system kernel.   🔹 Slower to manage but can run in parallel across multiple CPU cores. *Real-life Example:*   User-Level: Like tabs in a web browser managed by the browser itself.   Kernel-Level: Like multiple apps running on your phone, each managed by the operating system. 🧵 Single-threaded Process A single-threaded process has only one thread of execution. It can only perform one task at a time. If the thread is busy or waiting (e.g., for user input or a file to load), the entire process is paused. Slower when dealing with multiple tasks. Example: A basic calculator app that waits for the user to press a button before performing a calculation. 🧵🧵 Multi-threaded Process A multi-threaded ...