Predictive Analysis Using R Lab (Practical)

Paper Code: 
24DAC332
Credits: 
4
Periods/week: 
2
Max. Marks: 
100.00
Objective: 

This course will enable students to exercise Multivariate Techniques in R environment in different Business Cases. They will know the different techniques covered under the scope of Multivariate Analysis and will be able to apply and build select Predictive Models in the context of Binary Classification and Time Series.

 

Course Outcomes: 

Course

Course outcome

(at course level)

Learning and teaching strategies

Assessment Strategies

Course Code

Course

Title

 

24DAC332

 

Predictive Analysis Using R Lab (Practical)

 

CO7. Install R and R studio.

CO8. Apply analysis tool R to solve practical problems in a variety of disciplines.

CO9. Apply basic and advanced statistical techniques used in data science research.

 CO10. Analyze large sets of data to gain useful business understanding.

CO11. Explain and demonstrate R methods for data visualization

CO12. Contribute effectively in course-specific interaction

Approach in teaching:

Interactive Lectures, Discussion, Demonstrations, Group activities, Teaching using advanced IT audio-video tools. 

Learning activities for the students:

Effective assignments, Giving tasks.

Assessment Strategies

Class test, Semester end examinations, Practical Assignments, Individual and group projects

 

 

12.00
Unit I: 
R and R Studio:

Logical Arguments, Missing Values, Characters, Factors and Numeric, Help in R, Vector to Matrix, Matrix Access, Data Frames, Data Frame Access, Basic Data Manipulation Techniques, Usage of various apply functions – apply, lapply, sapply and tapply, Outliers treatment. 

 

12.00
Unit II: 
Measures of Central Tendency:

Mean, Mode and Median, Charts (Bar, Pie and Box Plot Histogram, Stem and Leaf Diagram), Measures of dispersion (Range, Inter-Quartile-Range, Standard Deviation, Skewness and Kurtosis), Standard Error of Mean and Confidence Intervals.

Discrete Probability Distributions: Binomial, Poisson, Continuous Probability Distribution, Normal Distribution &t-distribution, Sampling Distribution and Central Limit Theorem.

 

12.00
Unit III: 
Parametric and non-parametric tests:

 (one sample, independent sample, paired sample and two and more than two samples) 

 

12.00
Unit IV: 
Analysis of Relationship:

Positive and Negative Correlation, Perfect Correlation, Correlation Matrix, Scatter Plots, Simple Linear Regression, R Square, Adjusted R Square, Testing of Slope, Standard Error of Estimate, Overall Model Fitness, Assumptions of Linear Regression, Multiple Regression, Coefficients of Partial Determination, Durbin Watson Statistics, Variance Inflation Factor.

 

12.00
Unit V: 
Binary Classification versus Point Estimation:

Odds versus Probability, Logit Function, Classification Matrix, Individual Group Classification Efficiency, Overall Classification Efficiency, Nagelkerke R Square, Receiver Operating Characteristic Curve, Sensitivity, Specificity, Area Under ROC Curve, Cut-Offs, True Positive Rate and False Positive Rate.

 

 

ESSENTIAL READINGS: 
  1. Maindonald,John,Braun john ,”Data Analysis and Graphics Using R”, Cambridge University Press,2007
  2. Gardener Mark,”Beginning R: The Statistical Programming Language “ Wiley India Pvt. Ltd. 2015
  3. Srivasa K.G., Siddesh G M,Shetty,” Statistical Programming in R”, Oxford University Press 2017
  4. Business Statistics: Naval Bajpai, Pearson
  5. Menard, S. (2002). Applied Logistic Regression Analysis. Thousand Oaks, CA: Sage.

 

REFERENCES: 

Suggested Readings:

  1. Business Statistics: Naval Bajpai, Pearson
  2. Menard, S. (2002). Applied Logistic Regression Analysis. Thousand Oaks, CA: Sage.

e-Resources:

  1. https://www.slideshare.net/
  2. https://nptel.ac.in/courses/106106222
  3. https://spoken-tutorial.org/??/
  4. www.kaggle.com

Journals:

  1. Journal of the Brazilian Computer Society, SpringerOpen
  2. Journal of Internet Services and Applications, SpringerOpen

 

Academic Year: