Stat 640 Alternate Final Dataset Astronomers are cataloguing all of the objects in the sky these days. One group (ESO) uses the telescope on La Silla, Chile, to image the so-called Chandra Deep Field South. An example image is http://www.mpia.de/COMBO/combo_index.html The dataset consists of 3,438 galaxies in the CDFS for which all measurements are available. We are interested in the speed at which these galaxies are receding from ours. The 5th variable is commonly used as a surrogate for this purpose, being the mean red-shift. Variables 2-7 are related, so should not be used as predictors. (Variable 1 is just a pointer to a larger database.) Now a histogram of variable 5 shows a strong bimodal distribution. So we define Y = I( x5 > 0.45 ) as an indicator of which cluster a galaxy falls. Can we find a good prediction algorithm from variables 8-30? Variables BjMAG to S280MAG measure absolute magnitudes of the galaxy in 10 bands. (The telescope uses 17 different filters to measure different parts of the spectrum.) The remaining variables measure the observed brightness in 13 bands in sequence from 420 nm to 915 nm (ultraviolet to far red). One could also predict the magnitude of Y directly, as an alternative goal.