Next: asa.stat.comp.02 Up: ASA Statistical Computing (5 Previous: ASA Statistical Computing (5

asa.stat.comp.01

Sponsoring Section/Society: ASA-COMP

Session Slot: 2:00- 3:50 Sunday

Estimated Audience Size: 50-75

AudioVisual Request: None

Session Title: Data Mining

Theme Session: No

Applied Session: Yes

Session Organizer: Sun, Don X. Bell Labs, Lucent Technologies

Address: Bell Labs, Lucent Technologies Room 2C-278 Murray Hill, NJ 07974-0636

Phone: 908-582-2149

Fax: 908-582-3340

Email: dxsun@research.bell-labs.com

Session Timing: 110 minutes total (Sorry about format):

First Speaker - 40 minutes Second Speaker - 30 minutes Third Speaker - 30 minutes Floor Discusion - 10 minutes

Session Chair: Sun, Don X. Bell Labs, Lucent Technologies

Address: Bell Labs, Lucent Technologies Room 2C-278 Murray Hill, NJ 07974-0636

Phone: 908-582-2149

Fax: 908-582-3340

Email: dxsun@research.bell-labs.com

1. Mining Large Databases in Telecommunications

Buja, Andreas, AT&T Labs - Research

Address: AT&T Labs - Research Room C209 180 Park Avenue P.O. Box 971 Florham Park, NJ 07932-0971, USA

Phone: 973-360-8438

Fax: 973-360-8178

Email: andreas@research.att.com

Abstract: AT&T Labs has built up a datastore of roughly a terabyte of phone call records. In data of this size, the major intellectual effort shifts from modeling to data handling. The reasons are twofold: Traditional statistical systems are not capable of handling data this size, and traditional databases are useless because every analysis step involves a pass over a sizable fraction of the data. To address the demands posed by our datastore, we present a philosphy of small tools that preserves intimacy between users and data. This is in contrast to the philosophy of insulation and abstraction that underlies traditional databases.
Although a large fraction of our efforts is spent handling data and developing tools for handling data, large data also pose interesting statistical problems. For example, the substantive problems (marketing in this case) often do not call for a model of all of the data. Rather, it is sufficient to find and characterize relatively small subsets of interest. This is where datamining is not just a subarea of statistics: In statistics, we are trained to model all of the data at hand, while in data-``Mining'' we try to find worthwhile subsets of the data and leave the rest alone.

2. Data Mining with Extended Symbolic Models

Apte, Chidanand, IBM T. J. Watson Center

Address: Data Abstraction Research IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598

Phone: 914-945-1024

Fax:

Email: apte@watson.ibm.com

Abstract: Symbolic modeling of data with decision trees and decision rules has a certain appeal to data mining application developers. The computationally efficient nature of the modeling methodology, and the inbuilt explanatory nature of the models that are generated, are two often cited reasons for the preferred use of these methods. Traditionally, the applications of these methods had been restricted to classification modeling. Recent extensions to these methods employing ideas from statistics and machine learning have resulted in more general frameworks that continue to exhibit the underlying characteristics but apply to a much wider class of applications. These extended symbolic modeling methodologies permit exciting new application avenues, including regression, probabilistic modeling, non-myopic feature analysis, and integrating data mining into knowledge-based frameworks. A synopsis of work in this area in the data abstraction research group at IBM's T.J. Watson Research Center will be presented.

3. Tips for Data Mining Practitioners

Chu, Robert, SAS Institute Inc.

Address: SAS Institute Inc. Cary, NC 27513

Phone: 919-677-8000

Fax: 919-677-4444

Email: sasrcc@wnt.sas.com

Tideman, Susan, SAS Institute Inc.

Abstract: To get data mining results that improve your enterprise's ``bottom line,'' it sometimes takes more than clean data, good tools and algorithms, and expertise in statistics and/or computing. SAS Institute's experience with the SAS(R) Enterprise Miner(TM) is reflected in a set of tips for getting useful results under various circumstances. This paper, for statisticians and computer professionals, will discuss ten of the tips in detail.

List of speakers who are nonmembers: None

Next: asa.stat.comp.02 Up: ASA Statistical Computing (5 Previous: ASA Statistical Computing (5

David Scott
6/1/1998