DS-U4-ARQ

Active recall is a powerful learning technique that involves actively stimulating your memory during the learning process. Here are some active recall questions related to the topics in Unit 4: Predictive Data Analytics with Python, along with their answers. These questions can help you reinforce your understanding and retention of the material.

Active Recall Questions and Answers

Essential Python Libraries

Question: What are the primary uses of the Pandas library in Python?
- Answer: Pandas is primarily used for data manipulation and analysis. It provides data structures like DataFrames that allow for easy manipulation, cleaning, filtering, grouping, and aggregation of data.
Question: How does NumPy enhance numerical computing in Python?
- Answer: NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. It significantly enhances numerical computing by offering fast array operations and broadcasting capabilities.

Basic Examples

Question: How do you load a CSV file into a Pandas DataFrame?
- Answer: You can load a CSV file into a Pandas DataFrame using the pd.read_csv('file_path') function.
```
import pandas as pd
data = pd.read_csv('data.csv')
```
Question: What method in Pandas can be used to display the first few rows of a DataFrame?
- Answer: The head() method can be used to display the first few rows of a DataFrame.
```
print(data.head())
```

Data Preprocessing

Question: How do you remove duplicate rows in a Pandas DataFrame?
- Answer: You can remove duplicate rows using the drop_duplicates() method.
```
data = data.drop_duplicates()
```
Question: What is the purpose of the apply() function in Pandas?
- Answer: The apply() function is used to apply a function along an axis of the DataFrame (rows or columns). It allows for element-wise operations or transformations.
```
data['column'] = data['column'].apply(lambda x: x * 2)
```
Question: How do you handle missing values in a dataset using Scikit-learn's SimpleImputer?
- Answer: You can handle missing values by creating an instance of SimpleImputer, specifying the strategy (e.g., 'mean', 'median', 'most_frequent', 'constant'), and then fitting and transforming the dataset.
```
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
data['column'] = imputer.fit_transform(data[['column']])
```

Types of Data Analytics

Question: What are the three main types of data analytics?
- Answer: The three main types of data analytics are Predictive Analytics, Descriptive Analytics, and Prescriptive Analytics.
Question: What is the goal of predictive analytics?
- Answer: The goal of predictive analytics is to use historical data to make predictions about future events.

Key Algorithms

Question: What is the Apriori algorithm used for?
- Answer: The Apriori algorithm is used for mining frequent itemsets and generating association rules in large datasets.
Question: What is the difference between linear regression and logistic regression?
- Answer: Linear regression is used to predict a continuous outcome variable based on one or more predictor variables, while logistic regression is used for binary classification problems, predicting the probability of a binary outcome.
Question: How does a decision tree classifier work?
- Answer: A decision tree classifier works by splitting the data into subsets based on the value of input features. Each node in the tree represents a decision point based on a feature, and the branches represent the outcomes of these decisions. The leaves of the tree represent the final classifications.

Introduction to Scikit-learn

Question: How do you install Scikit-learn in Python?
- Answer: You can install Scikit-learn using pip.
```
pip install scikit-learn
```
Question: What function in Scikit-learn is used to load the Iris dataset?
- Answer: The load_iris() function from the sklearn.datasets module is used to load the Iris dataset.
```
from sklearn.datasets import load_iris
data = load_iris()
```
Question: How do you fit a logistic regression model using Scikit-learn?
- Answer: You can fit a logistic regression model using the LogisticRegression class. First, you create an instance of the class, then fit the model using the training data.
```
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
```

By using these active recall questions, you can reinforce your understanding of the key concepts in Unit 4: Predictive Data Analytics with Python. Feel free to modify or expand these questions to suit your study needs better.