Sponsoring Section/Society: ASA-SPES
Session Slot: Sunday, 4:00 - 5:50
Estimated Audience Size: 100-150
AudioVisual Request: Two Overheads
Session Title: Data Mining
Theme Session: Yes
Applied Session: Yes
Session Organizer: Edelstein, Herbert A. Two Crows Corporation
Address: 10500 Falls Road Potomac, MD 20854
Phone: (301) 983-3555
Fax: (301) 983-2554
Email: herb@twocrows.com
Session Timing: 110 minutes total (Sorry about format):
Opening Remarks by Chair - 5 minutes First Speaker - 35 minutes Second Speaker - 35 minutes Discussant - 15 minutes (or 10 or 20) Floor Discussion - 10 minutes (or 5 or 15)
Session Chair: Small, Robert D. Two Crows Corporation and Duke Clinical Research Institute
Address: 2024 West Main Street Durham, NC 27705
Phone: (919) 286-8917
Fax: (919) 286-0570
Email: bob@twocrows.com
1. New Insights from Applying Data Mining to Urology
Tigrani, Vida S., UCSF School of Medicine
Address: UCSF School of Medicine Department of Urology Mountain View, CA
Phone:
Fax:
Email: vtigran@itsa.ucsf.edu
John, George H., Epiphany Marketing Software
Abstract: Data mining is an umbrella term referring to the process of discovering patterns in data, typically with the aid of algorithms to automate part of the search. These methods come from disciplines such as statistics, artificial intelligence, visualization, pattern recognition, and so forth. Researchers analyzing medical data are typically well-versed in basic statistics, primarily experimental design and hypothesis testing, but have not been exposed to data mining. This paper begins by surveying medical data analysis literature and describing the common statistical methods employed by researchers. We then describe some examples of data mining tools that would be useful in analyzing outcomes of medical procedures. Using a case study from urology, we show how standard statistics are applied, and the additional insight that can be gained from the data mining tools. We conclude that there is ample evidence that data mining methods can provide better out of sample results, or better insights, or perhaps even the same insights and results obtained more quickly, than basic statistical methods, and that therefore data mining methods should become part of the standard medical data analysis toolbox.
2. Dynamic Similarity: Mining Collections of Trajectories
Grossman, Robert, Center for Data Mining, University of Illinois at Chicago
Address:
Phone:
Fax:
Email: Grossman@uic.edu
Abstract: An important challenge in data mining is to extend some of the standard data mining algorithms from data which is static to data which is time varying. Phrased slightly different, the challenge is to extend data mining algorithms from data sets consisting of vectors to data sets consisting of time varying paths or trajectories.
In this talk we survey some of the results in this area related to the similarity problem (find all trajectories which are similar to a fixed trajectory) and the classification problem (classify this trajectory).
We focus in part on time varying data which arises by sampling trajectories from an underlying family of parameterized dynamical systems. This is the dynamic similarity problem and occurs commonly in practice. For example, there are important applications to robotics and aeronautics. We describe a new algorithm to attack this problem and provide experimental data from these domains to demonstrate its scalability on collections ranging in size from 1 GB to 100 GBs.
Discussant: Edelstein, Herbert A. Two Crows Corporation
Address: 10500 Falls Road Potomac, MD 20854
Phone: (301) 983-3555
Fax: (301) 983-2554
Email: herb@twocrows.com
List of speakers who are nonmembers: None