DS-U4-ARQ
Active recall is a powerful learning technique that involves actively stimulating your memory during the learning process. Here are some active recall questions related to the topics in Unit 4: Predictive Data Analytics with Python, along with their answers. These questions can help you reinforce your understanding and retention of the material.
Active Recall Questions and Answers
Essential Python Libraries
-
Question: What are the primary uses of the Pandas library in Python?
- Answer: Pandas is primarily used for data manipulation and analysis. It provides data structures like DataFrames that allow for easy manipulation, cleaning, filtering, grouping, and aggregation of data.
-
Question: How does NumPy enhance numerical computing in Python?
- Answer: NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. It significantly enhances numerical computing by offering fast array operations and broadcasting capabilities.
Basic Examples
-
Question: How do you load a CSV file into a Pandas DataFrame?
- Answer: You can load a CSV file into a Pandas DataFrame using the
pd.read_csv('file_path')function.
import pandas as pd data = pd.read_csv('data.csv') - Answer: You can load a CSV file into a Pandas DataFrame using the
-
Question: What method in Pandas can be used to display the first few rows of a DataFrame?
- Answer: The
head()method can be used to display the first few rows of a DataFrame.
print(data.head()) - Answer: The
Data Preprocessing
-
Question: How do you remove duplicate rows in a Pandas DataFrame?
- Answer: You can remove duplicate rows using the
drop_duplicates()method.
data = data.drop_duplicates() - Answer: You can remove duplicate rows using the
-
Question: What is the purpose of the
apply()function in Pandas?- Answer: The
apply()function is used to apply a function along an axis of the DataFrame (rows or columns). It allows for element-wise operations or transformations.
data['column'] = data['column'].apply(lambda x: x * 2) - Answer: The
-
Question: How do you handle missing values in a dataset using Scikit-learn's
SimpleImputer?- Answer: You can handle missing values by creating an instance of
SimpleImputer, specifying the strategy (e.g., 'mean', 'median', 'most_frequent', 'constant'), and then fitting and transforming the dataset.
from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') data['column'] = imputer.fit_transform(data[['column']]) - Answer: You can handle missing values by creating an instance of
Types of Data Analytics
-
Question: What are the three main types of data analytics?
- Answer: The three main types of data analytics are Predictive Analytics, Descriptive Analytics, and Prescriptive Analytics.
-
Question: What is the goal of predictive analytics?
- Answer: The goal of predictive analytics is to use historical data to make predictions about future events.
Key Algorithms
-
Question: What is the Apriori algorithm used for?
- Answer: The Apriori algorithm is used for mining frequent itemsets and generating association rules in large datasets.
-
Question: What is the difference between linear regression and logistic regression?
- Answer: Linear regression is used to predict a continuous outcome variable based on one or more predictor variables, while logistic regression is used for binary classification problems, predicting the probability of a binary outcome.
-
Question: How does a decision tree classifier work?
- Answer: A decision tree classifier works by splitting the data into subsets based on the value of input features. Each node in the tree represents a decision point based on a feature, and the branches represent the outcomes of these decisions. The leaves of the tree represent the final classifications.
Introduction to Scikit-learn
-
Question: How do you install Scikit-learn in Python?
- Answer: You can install Scikit-learn using pip.
pip install scikit-learn -
Question: What function in Scikit-learn is used to load the Iris dataset?
- Answer: The
load_iris()function from thesklearn.datasetsmodule is used to load the Iris dataset.
from sklearn.datasets import load_iris data = load_iris() - Answer: The
-
Question: How do you fit a logistic regression model using Scikit-learn?
- Answer: You can fit a logistic regression model using the
LogisticRegressionclass. First, you create an instance of the class, then fit the model using the training data.
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) - Answer: You can fit a logistic regression model using the
By using these active recall questions, you can reinforce your understanding of the key concepts in Unit 4: Predictive Data Analytics with Python. Feel free to modify or expand these questions to suit your study needs better.