Home


Getting Started With R

Getting Started With SAS

Getting Started With SAS Graphics

Getting Started With Python

Getting Started With Unix

Other Programming Languages

StatReferences


NU MSPA Links
Obtain SAS
Course Books
Books By Topic
Basic Mathematics
Starter Kit
Amazon Info
PREDICT 454 Info







Predict 454







I am putting up this page to improve communication about PREDICT 454. PREDICT 454 is an advanced elective course, and it is not an appropriate course for every student. Here is a brief description of the course.



What PREDICT 454 is.

PREDICT 454 is a research oriented course. Each student will participate in a team project, which constitutes one third of the total points for the quarter, on a team of four or five students. The team project can be selected from the Team Project Guide or proposed by a student. Either way at least four students must show an interest in the topic for it to be accepted as a team project.

Many of the stored data projects are former Kaggle and KDD Cup competitions. The fact that these are the default data sets should send a clear signal to the level of the course and the level of R skill needed. You will need to be able to clean and manipulate these data sets in R.

The primary learning objective in PREDICT 454 is to teach you how to think about modeling 'larger' data sets. The readings, individual assignments, and the team project are all oriented towards this one goal. Modeling larger data sets is very different from modeling small data sets.



What PREDICT 454 is not.

PREDICT 454 is not a course where we will 'teach you R' or show you how to fit a linear regression model in R. By the time that you get to PREDICT 454 it is expected that you already know how to fit a linear regression model in R. It is also expected that if you had to fit a different model in R, such as a tree model or a SVM model, then you would understand how to install the R package, look up the syntax in the R documentation or a book such as the ISL book, and fit the model. The ISL book has a toy example for every topic covered in the book. As an advanced student you should be able to implement that toy example with the code provided in the book on your own.



PreRequisites

The formal prerequisites are given by the following prerequisite courses. Prerequisite Courses: PREDICT 420 and PREDICT 422.



The informal prerequisites are given by the following guidelines:

- In addition to these prerequisite courses each student needs to have a high level of maturity. PREDICT 454 is an unstructured course where students are expected to be capable of working independently on unstructured assignments with little direction from the instructor. Students are expected to be capable of maintaining appropriate student behavior while working in this unstructured environment.

- PREDICT 454 is not a good elective option for all students. Some students simply cannot handle working in the unstructured framework. As a student you need to make a decision to either have a good attitude and take advantage of the learning experience or not be a PREDICT 454 student.

- It is recommended that students take PREDICT 454 as a single course. The unstructured learning environment and the work load and coordination of team projects require a lot of time. If students must take a second course, then that second course should not be the capstone course.

- Students are expected to have a basic understanding of R. This basic understanding of R should include: (1) knowing how to read a data file into a data frame, (2) how to write a for loop, (3) how to write a R function, (4) how to make a basic graphical object, (5) how to fit an OLS regression model, and (6) how to install R libraries. Students with no R experience should obtain basic R functionality before taking PREDICT 454.

Students who do not meet the informal prerequisite guidelines will have more difficulty with the course than those who do meet the informal prerequisite guidelines.



Quick R Knowledge Check

The Change Machine

Write a R function named change.machine() that takes an input value between 0 and 100 and returns a R list object of the number of quarters, dimes, nickels, and pennies required to provide the correct change. If the user inputs a number not in (0,100), then the function should print an error message to the terminal: 'The input value must be between 0 and 100.'

A Customized Summary Function

Write a R function named my.summary() that will take a data frame of numeric variables and compute summary statistics for the variables. The summary statistics should be the quantiles (0.01,0.05,0.25,0.5,0.75,0.95,0.99), the mean, the variance, the min, and the max. The summary statistics should be output in a R data frame structure.

The change machine problem is a common programming problem used in many introductory programming courses. Writing a function such as my.summary() is an example of a common programming problem given in an R programming course. You may not know how to do these problems off of the top of your head, but given an hour and a R book, you should be able to write these functions. If you cannot, then you would be well served to spend some more time on R fundamentals before you take PREDICT 454.



R Resources

If the Quick R Knowledge Check seems foreign to you, then you would need to spend some time working on R before you would enroll in PREDICT 454. Here are several good and openly available resources that are available to you.

[1] As a MSPA student you have access the the NU license for Lynda.com. Lynda.com has some very basic R instructional videos.

[2] The Johns Hopkins Data Science Specialization [link] has nine introductory R mini-courses.

[3] Watch videos from the authors of the ISL book. [link]