Short Courses




Each short course runs from 9:00 AM to 4:30 PM on Wednesday.
Please see the registration page for information on pricing.



John Elder, "Tools for Discovering Patterns in Data"
James Thompson, "Simulation: A Modeler's Approach"


John Elder, Chief Scientist, Quantitative Solutions
elder@charlottesville.net
"Tools for Discovering Patterns in Data"
Wednesday, May 14, 1997  9:00-4:30 (break 12:00-1:30)

* Course Description
Find the useful information hidden in your data!  This course surveys the
leading computer-intensive methods for data analysis and inductive
modelling, drawn from Statistics, Machine Learning, and Data Mining.  Dr.
Elder will describe the key inner workings of various algorithms, compare
their merits, and (briefly) demonstrate their relative effectiveness on
practical applications.  We'll first review classical statistical
techniques, both linear and nonparametric, then outline the ways in which
these basic tools are modified and combined into more modern methods.  The
course pays particular attention to four powerful approaches: neural
networks, polynomial networks, kernels, and decision trees, and uses actual
scientific and business problems to demonstrate useful accompanying
techniques (such as scientific visualization, resampling, and bundling)
employed by experienced analysts.

* Handouts
Comprehensive notes and the recent book chapter, "A Statistical Perspective
on Knowledge Discovery in Databases", by Elder & Pregibon.

* Instructor
John Elder is Chief Scientist of Quantitative Solutions, a Data Mining
research firm in Charlottesville, Virginia, and an Adjunct Professor at the
University of Virginia.  He has over a decade of experience developing and
applying adaptive, data-driven techniques to practical problems.  He has
been a researcher at Rice University, and Director of Research at an
engineering consulting firm and for an investment management company.  Dr.
Elder has authored four book chapters and numerous articles on pattern
discovery, and is the technical chair of the Adaptive and Learning Systems
Group of the IEEE Systems, Man, and Cybernetics Society.

* Who Should Attend?
Those from industry and academia who work with data and wish to understand
recent developments in pattern discovery, data mining, and inductive
modeling.  At the conclusion of this course, one should be able to discern
the basic strengths of competing methods and select the appropriate tools
for one's applications.  Participants should have prior working experience
with computers and knowledge of, or interest in, applied statistical
techniques.

* Course Outline
     *Pattern Discovery: An Overview
          *Inducing Models from Data: Benefits and Dangers
          *The Data Mining Process
     *Classical Statistical Techniques
          *Regression
          *Discriminant Analysis
          *Nonparametric:
               *Scatterplot Smoothers
               *Nearest Neighbors
               *Kernels
     *Modern Methods
          *Neural Networks
          *Polynomial Networks
          *Decision Trees
     *Key General Tools:
          *Scientific Visualization
          *Resampling
          *Optimization
     *Data Issues
          *Case Diagnostics (Outlying, Influential, Leverage, & Missing points)
          *Feature Creation and Selection
     *(Brief) Outline of Other Methods
          *Projection Pursuit
          *ASH (Average Shifted Histograms)
          *MARS (Multivariate Adaptive Regression Splines)
          *RBF (Radial Basis Functions)
     *Comparing and Combining Methods
          *Matching an algorithm to your application
          *Bundling & Fusing models


* A note about the course scope:  Each of the major topics discussed could
clearly comprise a semester-long course if presented in full detail!  What
this (admittedly intensive) short course provides however, is a broad
overview of the highlights, drawing connections between major developments
in the diverse fields that contribute to the emerging discipline of Data
Mining.  Previous participants have found this "big picture" to be
particularly useful for identifying avenues worthy of further exploration,
whether for research or practical problem-solving.



James Thompson, Dept. of Statistics, Rice University
thomp@stat.rice.edu
Simulation: A Modeler's Approach

Thompson, the author or co-author of seven books, is a Fellow of the
ASA and the IMS and an elected Member of the ISI. He has received
the ASA's Owen Award and the US Army's Wilks Medal for his work in
applied statistics.

Topics include:

1. Does modern computing provide model freedom?
2. SIMEST, a paradigm for parameter estimation in complex models.
3. SIMDAT, an algorithm for smooth resampling.
4. Epidemics, simulating disasters before they occur.
5. Formulating simulation based strategies for options trading.
6. Who needs the Gibbs Sampler?
7. Does noise prevent chaos?

This page last maintained on 3/3/97
Problems or suggestions to webmaster@stat.rice.edu

Contact Interface '97 via interface97@stat.rice.edu