Subject - Data Science
Course Objectives:
- To understand the need of Data Science
- To understand computational statistics in Data Science
- To study and understand the different technologies used for Data processing
- To understand and apply data modeling strategies
- To learn Data Analytics using Python programming
- To be conversant with advances in analytics
Course Outcomes
On completion of the course, learner will be able to
- CO1: Analyze needs and challenges for Data Science
- CO2: Apply statistics for Data Analytics
- CO3: Apply the lifecycle of Data analytics to real world problems
- CO4: Implement Data Analytics using Python programming
- CO5: Implement data visualization using visualization tools in Python programming
- CO6: Design and implement Big Databases using the Hadoop ecosystem
Syllabus
Unit I - Introduction to Data Science
Unit II - Statistical Inference
Unit III - Data Analytics Life CycleUnit III - Data Analytics Life CycleOverview Objective**: DS-U3-Objective Resources Textbook Chapters**: Google Classroom Notes Syllabus Topics * Introduction, * Data Analytics Life Cycle * Data Analytical Architecture * Introduction * Phase 1 - Discovery * Phase 2 - Data Preparation * Phase 3 - Model Planning * Phase 4 - Model Building * Phase 5 - Communication Results * Phase 6 - Operationalise Previous Year Questions (PYQs) * PYQs - (Data Analytics Life Cycle) 1. Explain Data Analytics life cycle with the h
Unit IV - Predictive Data Analytics with PythonUnit IV - Predictive Data Analytics with PythonOverview Objective**: DS-U4-Objective Syllabus Topics * Introduction, 1. Essential Python Libraries, Basic examples. * 3. Data Preprocessing: 4. Removing Duplicates, 4. Transformation of Data using function or mapping, replacing values, 5. Handling Missing Data. * 6. Types of Data Analytics Model: Predictive, Descriptive and Prescriptive. * 8. Association Rules: 9. Apriori Algorithm and FP growth * Regression - Linear Regression, Logistic Regression. * Classification - Naïve Bayes, Decision
Unit V - Data Analytics and Model EvaluationUnit V - Data Analytics and Model EvaluationOverview Objective**: DS-U5-Objective Syllabus Topics * Clustering Algorithms: K-Means, Hierarchical Clustering, Time-series analysis. * Introduction to Text Analysis: Text-Preprocessing, Bag of Words (BoW), TF-IDF and topics. * Need and Introduction to social network analysis, Introduction to business analysis. * Model Evaluation and Selection: Metrics for Evaluating Classifier Performance, Holdout Method and Random Sub sampling, Parameter Tuning and Optimisation, Result Interpretation, * Cl
Unit VI - Data Visualisation and HadoopUnit VI - Data Visualisation and HadoopOverview Objective**: \[Briefly summarize the goals of this unit\] Instructor**: \[Instructor's name\] Weeks Covered**: \[Specify the weeks this unit spans\] Resources Textbook Chapters**: \[List relevant chapters and pages\] Lectures**: * Lecture 1 - Topic * Lecture 2 - Topic Videos**: * \[Link to instructional video 1\] * \[Link to instructional video 2\] Additional Readings**: * Title of the article * Title of the paper Tools & Software**: \[List any tools or software relevant
Learning Resources
Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
Data Science Project
For your Data Science project, consider building a "Predictive Health Analytics" application. Here's a step-by-step guide:
- Learning Part:
- Basics: Start with understanding the basics of data science and statistics.
- Python Programming: Learn Python for data analytics. Platforms like Codecademy or DataCamp offer interactive Python courses.
- Data Analytics Lifecycle: Understand the lifecycle of data analytics. Online resources and courses, like on Coursera or edX, can guide you through this.ß
- Project: Predictive Health Analytics:
- Objective: Create a tool that predicts potential health issues based on historical health data.
- Steps:
- Data Collection: Gather health-related datasets. Kaggle is a good source for datasets.
- Data Cleaning: Learn to clean and preprocess data using Python libraries like Pandas.
- Predictive Model: Implement a predictive model using machine learning algorithms (e.g., scikit-learn).
- Data Visualization: Use Python visualization tools (like Matplotlib or Seaborn) to create interactive health trends and insights.
- Advanced Concepts:
- Big Data: Explore Hadoop for handling large datasets. Online tutorials and documentation can help.
- Visualization Tools: Dive deeper into advanced visualization tools like Tableau for more sophisticated data presentation.