My Blog.

Subject - Data Science

Course Objectives:

  • To understand the need of Data Science
  • To understand computational statistics in Data Science
  • To study and understand the different technologies used for Data processing
  • To understand and apply data modeling strategies
  • To learn Data Analytics using Python programming
  • To be conversant with advances in analytics

Course Outcomes

On completion of the course, learner will be able to

  1. CO1: Analyze needs and challenges for Data Science
  2. CO2: Apply statistics for Data Analytics
  3. CO3: Apply the lifecycle of Data analytics to real world problems
  4. CO4: Implement Data Analytics using Python programming
  5. CO5: Implement data visualization using visualization tools in Python programming
  6. CO6: Design and implement Big Databases using the Hadoop ecosystem

Syllabus

Unit I - Introduction to Data Science

Unit II - Statistical Inference

Unit III - Data Analytics Life CycleUnit III - Data Analytics Life CycleOverview Objective**: DS-U3-Objective Resources Textbook Chapters**: Google Classroom Notes Syllabus Topics * Introduction, * Data Analytics Life Cycle * Data Analytical Architecture * Introduction * Phase 1 - Discovery * Phase 2 - Data Preparation * Phase 3 - Model Planning * Phase 4 - Model Building * Phase 5 - Communication Results * Phase 6 - Operationalise Previous Year Questions (PYQs) * PYQs - (Data Analytics Life Cycle) 1. Explain Data Analytics life cycle with the h

Unit IV - Predictive Data Analytics with PythonUnit IV - Predictive Data Analytics with PythonOverview Objective**: DS-U4-Objective Syllabus Topics * Introduction, 1. Essential Python Libraries, Basic examples. * 3. Data Preprocessing: 4. Removing Duplicates, 4. Transformation of Data using function or mapping, replacing values, 5. Handling Missing Data. * 6. Types of Data Analytics Model: Predictive, Descriptive and Prescriptive. * 8. Association Rules: 9. Apriori Algorithm and FP growth * Regression - Linear Regression, Logistic Regression. * Classification - Naïve Bayes, Decision

Unit V - Data Analytics and Model EvaluationUnit V - Data Analytics and Model EvaluationOverview Objective**: DS-U5-Objective Syllabus Topics * Clustering Algorithms: K-Means, Hierarchical Clustering, Time-series analysis. * Introduction to Text Analysis: Text-Preprocessing, Bag of Words (BoW), TF-IDF and topics. * Need and Introduction to social network analysis, Introduction to business analysis. * Model Evaluation and Selection: Metrics for Evaluating Classifier Performance, Holdout Method and Random Sub sampling, Parameter Tuning and Optimisation, Result Interpretation, * Cl

Unit VI - Data Visualisation and HadoopUnit VI - Data Visualisation and HadoopOverview Objective**: \[Briefly summarize the goals of this unit\] Instructor**: \[Instructor's name\] Weeks Covered**: \[Specify the weeks this unit spans\] Resources Textbook Chapters**: \[List relevant chapters and pages\] Lectures**: * Lecture 1 - Topic * Lecture 2 - Topic Videos**: * \[Link to instructional video 1\] * \[Link to instructional video 2\] Additional Readings**: * Title of the article * Title of the paper Tools & Software**: \[List any tools or software relevant

Learning Resources

Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data


Data Science Project

For your Data Science project, consider building a "Predictive Health Analytics" application. Here's a step-by-step guide:

  1. Learning Part:
    • Basics: Start with understanding the basics of data science and statistics.
    • Python Programming: Learn Python for data analytics. Platforms like Codecademy or DataCamp offer interactive Python courses.
    • Data Analytics Lifecycle: Understand the lifecycle of data analytics. Online resources and courses, like on Coursera or edX, can guide you through this.ß
  2. Project: Predictive Health Analytics:
    • Objective: Create a tool that predicts potential health issues based on historical health data.
    • Steps:
      • Data Collection: Gather health-related datasets. Kaggle is a good source for datasets.
      • Data Cleaning: Learn to clean and preprocess data using Python libraries like Pandas.
      • Predictive Model: Implement a predictive model using machine learning algorithms (e.g., scikit-learn).
      • Data Visualization: Use Python visualization tools (like Matplotlib or Seaborn) to create interactive health trends and insights.
  3. Advanced Concepts:
    • Big Data: Explore Hadoop for handling large datasets. Online tutorials and documentation can help.
    • Visualization Tools: Dive deeper into advanced visualization tools like Tableau for more sophisticated data presentation.