My Blog.

Subject - Data Science

Course Objectives:

To understand the need of Data Science
To understand computational statistics in Data Science
To study and understand the different technologies used for Data processing
To understand and apply data modeling strategies
To learn Data Analytics using Python programming
To be conversant with advances in analytics

Course Outcomes

On completion of the course, learner will be able to

CO1: Analyze needs and challenges for Data Science
CO2: Apply statistics for Data Analytics
CO3: Apply the lifecycle of Data analytics to real world problems
CO4: Implement Data Analytics using Python programming
CO5: Implement data visualization using visualization tools in Python programming
CO6: Design and implement Big Databases using the Hadoop ecosystem

Syllabus

Unit I - Introduction to Data Science

Unit II - Statistical Inference

Unit III - Data Analytics Life CycleUnit III - Data Analytics Life CycleOverview Objective: DS-U3-Objective Resources Textbook Chapters: Google Classroom Notes Syllabus Topics * Introduction, * Data Analytics Life Cycle * Data Analytical Architecture * Introduction * Phase 1 - Discovery * Phase 2 - Data Preparation * Phase 3 - Model Planning * Phase 4 - Model Building * Phase 5 - Communication Results * Phase 6 - Operationalise Previous Year Questions (PYQs) * PYQs - (Data Analytics Life Cycle) 1. Explain Data Analytics life cycle with the h

Unit IV - Predictive Data Analytics with PythonUnit IV - Predictive Data Analytics with PythonOverview Objective**: DS-U4-Objective Syllabus Topics * Introduction, 1. Essential Python Libraries, Basic examples. * 3. Data Preprocessing: 4. Removing Duplicates, 4. Transformation of Data using function or mapping, replacing values, 5. Handling Missing Data. * 6. Types of Data Analytics Model: Predictive, Descriptive and Prescriptive. * 8. Association Rules: 9. Apriori Algorithm and FP growth * Regression - Linear Regression, Logistic Regression. * Classification - Naïve Bayes, Decision

Unit V - Data Analytics and Model EvaluationUnit V - Data Analytics and Model EvaluationOverview Objective**: DS-U5-Objective Syllabus Topics * Clustering Algorithms: K-Means, Hierarchical Clustering, Time-series analysis. * Introduction to Text Analysis: Text-Preprocessing, Bag of Words (BoW), TF-IDF and topics. * Need and Introduction to social network analysis, Introduction to business analysis. * Model Evaluation and Selection: Metrics for Evaluating Classifier Performance, Holdout Method and Random Sub sampling, Parameter Tuning and Optimisation, Result Interpretation, * Cl

Unit VI - Data Visualisation and HadoopUnit VI - Data Visualisation and HadoopOverview Objective: \[Briefly summarize the goals of this unit\] Instructor: \[Instructor's name\] Weeks Covered: \[Specify the weeks this unit spans\] Resources Textbook Chapters: \[List relevant chapters and pages\] Lectures**: * Lecture 1 - Topic * Lecture 2 - Topic Videos**: * \[Link to instructional video 1\] * \[Link to instructional video 2\] Additional Readings**: * Title of the article * Title of the paper Tools & Software**: \[List any tools or software relevant

Learning Resources

Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data

Data Science Project

For your Data Science project, consider building a "Predictive Health Analytics" application. Here's a step-by-step guide:

Learning Part:
- Basics: Start with understanding the basics of data science and statistics.
- Python Programming: Learn Python for data analytics. Platforms like Codecademy or DataCamp offer interactive Python courses.
- Data Analytics Lifecycle: Understand the lifecycle of data analytics. Online resources and courses, like on Coursera or edX, can guide you through this.ß
Project: Predictive Health Analytics:
- Objective: Create a tool that predicts potential health issues based on historical health data.
- Steps:
  - Data Collection: Gather health-related datasets. Kaggle is a good source for datasets.
  - Data Cleaning: Learn to clean and preprocess data using Python libraries like Pandas.
  - Predictive Model: Implement a predictive model using machine learning algorithms (e.g., scikit-learn).
  - Data Visualization: Use Python visualization tools (like Matplotlib or Seaborn) to create interactive health trends and insights.
Advanced Concepts:
- Big Data: Explore Hadoop for handling large datasets. Online tutorials and documentation can help.
- Visualization Tools: Dive deeper into advanced visualization tools like Tableau for more sophisticated data presentation.