Ch#3

1. What is data integration?

Answer:

Data integration is the process of collecting and combining data from different sources and bringing it together in a unified way so it can be analyzed, reported on, or used for decision-making.

Think of it like gathering pieces of a puzzle from different boxes and putting them together to see the full picture.

2. What are common issues in data integration?

Answer:

Schema Integration: Merging metadata from different sources
Entity Identification: Matching real-world entities (e.g., A.cust-id ≡ B.cust-#)
Data Value Conflicts: Different units/scales (e.g., km vs miles)
Redundant Data: Same attributes with different names.
Inconsistencies: Conflicting or duplicated information

4. What is data transformation?

Answer:

Data transformation is the process of converting data from its original (raw) format into a format that is clean, consistent, and ready for analysis, mining, or storage. It often involves changing the structure, format, or values of the data.

5. What are the key data transformation techniques?

Answer:

Smoothing: Removes noise (e.g., using binning, regression, clustering)
Aggregation: Summarization (e.g., constructing data cubes)
Generalization: Replacing low-level data with higher-level concepts
Attribute Construction: Creating new features/attributes
Normalization: Scaling data within a specific range

6. Why is data normalization important?

Answer:

Speeds up processing
Reduces memory usage
Ensures fair contribution of attributes during mining

7. What are the main normalization techniques?

Technique	Description	Example
Min-Max	Scales data to a specified range	If range is 5-100 → scale to 0-1
Z-score	Uses mean & std deviation; good with outliers	`(x - mean)/std`
Decimal Scaling	Divides by 10^n based on max absolute value	-500 → -0.5

8. Give an example of generalization.

Answer:
Age: 20-25 → Age1, 26-30 → Age2, etc.
Street: SP PJ 1 → SP, LHR 2 → PUNJAB, etc.

Cover Letter

Search This Blog

Ch#3

2. What are common issues in data integration?

4. What is data transformation?

Data transformation is the process of converting data from its original (raw) format into a format that is clean, consistent, and ready for analysis, mining, or storage. It often involves changing the structure, format, or values of the data.

5. What are the key data transformation techniques?

6. Why is data normalization important?

7. What are the main normalization techniques?

8. Give an example of generalization.

Comments

Post a Comment

Popular posts from this blog

Chap#10

Ai Mental Health & Cyber Safety Presentation

Cover Letter

Ch#3

2. What are common issues in data integration?

4. What is data transformation?

Data transformation is the process of converting data from its original (raw) format into a format that is clean, consistent, and ready for analysis, mining, or storage. It often involves changing the structure, format, or values of the data.5. What are the key data transformation techniques?

6. Why is data normalization important?

7. What are the main normalization techniques?

8. Give an example of generalization.

Comments

Post a Comment

Popular posts from this blog

Chap#10

Ai Mental Health & Cyber Safety Presentation

Data transformation is the process of converting data from its original (raw) format into a format that is clean, consistent, and ready for analysis, mining, or storage. It often involves changing the structure, format, or values of the data.

5. What are the key data transformation techniques?