Home


Getting Started With R

Getting Started With SAS

Getting Started With SAS Graphics

Getting Started With Python

Getting Started With Unix

Other Programming Languages

StatReferences


NU MSPA Links
Obtain SAS
Course Books
Books By Topic
Basic Mathematics
Starter Kit
Amazon Info
PREDICT 454 Info







Predict 412







I am putting up this page to improve communication about PREDICT 412. In general there seems to be a mis-understanding about what PREDICT 412 is, and of course what PREDICT 412 is not.



What PREDICT 412 is.

PREDICT 412 is a research oriented course. Each student will participate in a team project, which constitutes one third of the total points for the quarter, on a team of four or five students. The team project can be selected from the Team Project Guide or proposed by a student. Either way at least four students must show an interest in the topic for it to be accepted as a team project.

Many of the stored data projects are former Kaggle and KDD Cup competitions. The fact that these are the default data sets should send a clear signal to the level of the course and the level of R skill needed. You will need to be able to clean and manipulate these data sets in R.

The primary learning objective in PREDICT 412 is to teach you how to think about modeling 'larger' data sets. The readings, individual assignments, and the team project are all oriented towards this one goal. Modeling larger data sets is very different from modeling small data sets.



What PREDICT 412 is not.

PREDICT 412 is not a course where we will 'teach you R' or show you how to fit a linear regression model in R. By the time that you get to PREDICT 412 it is expected that you already know how to fit a linear regression model in R. It is also expected that if you had to fit a different model in R, such as a tree model or a SVM model, then you would understand how to install the R package, look up the syntax in the R documentation or a book such as the ISL book, and fit the model. The ISL book has a toy example for every topic covered in the book. As an advanced student you should be able to implement that toy example with the code provided in the book on your own.



PreRequisites

The formal prerequisites are given by the following prerequisite courses. Prerequisite Courses: CIS 317, CIS 435, PREDICT 401, PREDICT 410, and PREDICT 411.

* When PREDICT 420 comes online, then PREDICT 420 will become a prerequisite for PREDICT 412.

The informal prerequisites are given by the following guidelines:

- In addition to these prerequisite courses each student needs to have a high level of maturity. PREDICT 412 is an unstructured course where students are expected to be capable of working independently on unstructured assignments with little direction from the instructor. Students are expected to be capable of maintaining appropriate student behavior while working in this unstructured environment.

- PREDICT 412 is not a good elective option for all students. Some students simply cannot handle working in the unstructured framework. As a student you need to make a decision to either have a good attitude and take advantage of the learning experience or not be a PREDICT 412 student.

- It is recommended that students take PREDICT 412 as a single course. The unstructured learning environment and the work load and coordination of team projects require a lot of time. If students must take a second course, then that second course should not be the capstone course.

- Students are expected to have a basic understanding of R. This basic understanding of R should include: (1) knowing how to read a data file into a data frame, (2) how to write a for loop, (3) how to write a R function, (4) how to make a basic graphical object, (5) how to fit an OLS regression model, and (6) how to install R libraries. Students with no R experience should obtain basic R functionality before taking PREDICT 412.

Students who do not meet the informal prerequisite guidelines will have more difficulty with the course than those who do meet the informal prerequisite guidelines.



Quick R Knowledge Check

The Change Machine

Write a R function named change.machine() that takes an input value between 0 and 100 and returns a R list object of the number of quarters, dimes, nickels, and pennies required to provide the correct change. If the user inputs a number not in (0,100), then the function should print an error message to the terminal: 'The input value must be between 0 and 100.'

A Customized Summary Function

Write a R function named my.summary() that will take a data frame of numeric variables and compute summary statistics for the variables. The summary statistics should be the quantiles (0.01,0.05,0.25,0.5,0.75,0.95,0.99), the mean, the variance, the min, and the max. The summary statistics should be output in a R data frame structure.

The change machine problem is a common programming problem used in many introductory programming courses. Writing a function such as my.summary() is an example of a common programming problem given in an R programming course. You may not know how to do these problems off of the top of your head, but given an hour and a R book, you should be able to write these functions. If you cannot, then you would be well served to spend some more time on R fundamentals before you take PREDICT 412.



R Resources

If the Quick R Knowledge Check seems foreign to you, then you would need to spend some time working on R before you would enroll in PREDICT 412. Here are several good and openly available resources that are available to you.

[1] As a MSPA student you have access the the NU license for Lynda.com. Lynda.com has some very basic R instructional videos.

[2] The Johns Hopkins Data Science Specialization [link] has nine introductory R mini-courses.

[3] Watch videos from the authors of the ISL book. [link]