.po 1.1i
.he '''Rice Technical Report 311-87-1'
.fo ''-%-''
.EQ
delim &&
.EN
.ce 
\fBSoftware for Univariate and Bivariate Averaged Shifted Histograms\fR\**

.ce 5
David W. Scott
Department of Statistics
Rice University
P.O. Box 1892
Houston, TX  77251-1892


\fI0.  Abstract\fR
.pp
Software for computing one and two dimensional
averaged shifted histograms (Scott, 1985,
\fIAnnals of Statistics\fR, \fB13\fR:1024-40) is given.
The algorithms are written in Fortran \fB77\fR.
Also included are function definitions and
documentation for use in the interactive \fBS\fR system
from Bell Labs.  [Revised August 26, 1991 to
provide New S interfaces.]


\fI1.  Introduction\fR
.pp
The averaged shifted histogram is a computationally effective
algorithm for computing nonparametric kernel density estimates.
The main portion of this technical report is code, but here we
indicate the structure chosen.
In general, the averaged shifted histogram operates on bin counts
rather than the data themselves.
Of course in many applications the raw data are in fact bin counts.
Thus we have separated these two functions:  first compute
the bin counts;  second compute the averaged shifted histogram.
.pp
The bins are assumed to be regular intervals or rectangles.
For plotting purposes, we note that the values returned are
at the \fIcenter\fR of the bins and should be linearly
interpolated.
The routines have been designed so that the output can be
plotted directly using the \fBS\fR functions
\fBplot\fR and \fBpersp\fR in the univariate and bivariate
cases, respectively.
Notice that in the bivariate case, if the matrix fxy contains the
density estimates, and if x(i) is the center of i-th bin along
the x-axis and y(j) is the center of the j-th bin along the y-axis,
then the value of the averaged shifted histogram at the
bin center (x(i),y(j)) is z(i,j).
This follows the usual \fBS\fR convention.
.pp
Both algorithms use the quartic or biweight kernel
&K(x) ~=~ 15(1-x sup 2 ) sup 2 /16&
with a product of these kernels used in the bivariate case.
The original averaged shifted histogram used a triangle kernel.
[The parameter kopt defaults to the biweight kernel but
provides the option for other kernels in the polynomial class.]

\fI2.  Code and listings\fR
.pp
There are two binning and two averaging routines, one each
for the univariate and bivariate case.
Within the double lines are three routines for each function.
First is a documentation file for \fBS\fR.
For those not using \fBS\fR, note that the arguments are
described in order and that an example of correct usage
is also given.
Second is a function definition for the algorithm using
the \fBS\fR format.
One of the advantages of using \fBS\fR rather than
Fortran directly is that only the input arguments
need to be specified since the output
arguments are returned in a structure as defined in this second
routine.  Also work space allocation can be handled
dynamically in \fBS\fR.
Fortran programmers will have to provide this work space
in their calling program;  it will also be necessary to pass
arguments of actual array dimensions in the main program.
The third routine is the actual fortran listing of the algorithm.
I have tried to be efficient where it matters,
relying upon a fairly friendly optimizing compiler to avoid
recomputation of certain values inside loops.
This could be handled explicitly if desired.
The algorithms are fairly well documented and straight forward,
although it may be helpful to glance at the second \fBS\fR
function definition if additional information is needed.
The twelve listings follow.
Any comments or suggestions would be welcomed.
.(f
\**This research was partially supported by ONR contract N00014-85-K-0100
and ARO contract DAAG-29-85-K0212.
The author's electronic address is scottdw@rice.edu.
.)f
.bp
.po .5i
.ll 7.0i
.ps 6
.2c
.nf
There are 3 sections in the following:
	1.  The New S function definitions.
	2.  The Fortran subroutine code.
	3.  Help files.

# 5-8-91   source this file

# ash.s----------------------------------------------------------------------

bin1 <- function(x,ab=nicerange(x),nbin=50) {
  n <- length(x)
  if(ab[1]>=ab[2])
    fatal("Interval vector has negative orientation")
  if(nbin<=0)
    fatal("Number of bin intervals nonpositive")
  r<-.Fortran("bin1",
           as.single(x),
           as.integer(n),
           as.single(ab),
           as.integer(nbin),
           nc=integer(nbin),
           nskip=integer(1))
  list(nc=r$nc,ab=ab,nskip=r$nskip)
}

ash1 <- function(bins,m=5,kopt=c(2,2)){
  nc <- bins$nc
  ab <- bins$ab
  nbin <- length(nc)
  r <- .Fortran("ash1",
                as.integer(m),
                as.integer(nc),
                as.integer(nbin),
                as.single(ab),
		as.integer(kopt),
                t=single(nbin),
                f=single(nbin),
                single(m),
		ier=integer(1))
  if(r$ier==1) print("ash estimate nonzero outside interval ab")
  list(x=r$t,y=r$f,m=m,ab=ab,kopt=kopt,ier=r$ier)
}

nicerange <- function(x, beta = .1) {
	ab <- range(x)	# interval surrounding data
	del <- ((ab[2] - ab[1]) * beta)/2
	return(c(ab + c( - del, del)))
}

bin2 <- function(x,ab,nbin=c(20,20)){
  if(missing(ab)){
    ab <- t(array(c(nicerange(x[,1]),nicerange(x[,2])),c(2,2)))
  }
  n <- nrow(x)
  r <- .Fortran("bin2",
           as.single(x),
           as.integer(n),
           as.single(ab),
           as.integer(nbin),
           nc=integer(nbin[1]*nbin[2]),
           nskip=integer(1))
  list(nc=matrix(r$nc,nbin[1],nbin[2]),ab=ab,nskip=r$nskip)
}

ash2 <- function(bins,m=c(5,5),kopt=c(2,2)){
  nc <- bins$nc;  if(!is.matrix(nc)) stop("bins does not contain bin count matrix")
  ab <- bins$ab;  if(!is.matrix(ab)) stop("ab not a matrix - should be 2 by 2")
  nbin <- dim(nc)
  r <- .Fortran("ash2",
                as.integer(m),
                as.integer(nc),
                as.integer(nbin[1]),
                as.integer(nbin[2]),
                as.single(ab),
		as.integer(kopt),
                f=single(nbin[1]*nbin[2]),
                single(m[1]*m[2]),
		ier=integer(1))
  if(r$ier==1) print(" estimate nonzero outside ab rectangle")
  list(z=matrix(r$f,nbin[1],nbin[2]),
	x=center(ab[1,],nbin[1])[[1]],y=center(ab[2,],nbin[2])[[1]],
	ab=ab,m=m,kopt=kopt,ier=r$ier)
}

center <- function(ab,k) {
	h <- diff(ab)/k
	list( seq(ab[1]+h/2,by=h,length=k) ) }


c ash.f  Contains all the fortran routines
c ash.s  Contains the calling functions.
c bin1   Help for bin1
c ash1   Help for ash1
c bin2   Help for bin2
c ash2   Help for ash2

c To install this in a local S. 

c 1) get the all the files in some directory.
c 2) mkdir .Data 
c 3) mkdir .Data/.Help
c 4) cp bin1 ash1 bin2 ash2 .Data/.Help
c 5) f77 -c -o ash.o ash.f
c 6) S LOAD ash.o; S < ash.s   OR
c 7) dyn.load2("ash.o",syslibs="-lF77 -lm -lc"); source("ash.s")


c ash.f----------------------------------------------------------------------
c       April 8, 1986
c
c       Find bin counts of data array "x(n)" for ASH estimator
c
c       "nbin" bins are formed over the interval [a,b)
c
c       bin counts returned in array "nc"  -  # pts outside [a,b) = "nskip"

c ##### Copyright 1986 David W. Scott

        subroutine bin1 ( x , n , ab , nbin , nc , nskip )
        dimension x(n) , nc(nbin) , ab(2)

        nskip = 0
        a = ab(1)
        b = ab(2)

        do 5 i = 1,nbin
                nc(i) = 0
5       continue

        d = (b-a) / nbin

        do 10 i = 1,n
                k = (x(i)-a) / d + 1.0
                if (k.ge.1 .and. k.le.nbin) then
                        nc(k) = nc(k) + 1
                else
                        nskip = nskip + 1
                end if
10      continue

        return
        end


c       April 8, 1986
c
c       Computer ASH density estimate;  Quartic (biweight) kernel
c
c       Average of "m" shifted histograms
c       
c       Bin counts in array "nc(nbin)"  -  from routine "bin1"
c
c       "nbin" bins are formed over the interval [a,b)
c
c       ASH estimates returned in array "f(nbin)"
c
c       FP-ASH plotted at  a+d/2 ... b-d/2   where d = (b-a)/nbin
c
c       Note:  If "nskip" was nonzero, ASH estimates incorrect near boundary
c       Note:  Should leave "m" empty bins on each end of array "nc" so f OK

c ##### Copyright 1986 David W. Scott

        subroutine ash1 ( m, nc, nbin, ab, kopt, t, f, w, ier )
        dimension nc(nbin), ab(2), t(nbin), f(nbin), w(m), kopt(2)

        ier = 0
        a = ab(1)
        b = ab(2)
        n = 0

c-compute weights    cons * ( 1-abs((i/m))^kopt1)^kopt2
c             --  should sum to "m"   5-8-91
c                       w-array shifted by 1

        mm1 = m-1
        xm = m
c                     cons = sum of weights from -(m-1) to (m-1) = 1 + 2 (sum from 1 to m-1)

        w(1) = 1.0
        cons = 1.0
        do 5 i = 1,mm1
                w(i+1) = ( 1.0 - abs(i/xm)**kopt(1) )**kopt(2)
                cons = cons + 2*w(i+1)
5       continue
        cons = float(m)/cons
        do 6 i=1,m
           w(i) = cons*w(i)
6      continue

c-check if estimate extends beyond mesh

        do 7 i = 1,mm1
                if( nc(i)+nc(nbin+1-i) .gt. 0) ier = 1
7       continue

c-compute ash(m) estimate

        delta = (b-a) / nbin
        h = m*delta

        do 10 i = 1,nbin
                t(i) = a + (i-0.5)*delta
                f(i) = 0.0
                n = n + nc(i)
10      continue

        do 20 i = 1,nbin
                if (nc(i).eq.0) goto 20

                c = nc(i) / (n*h)
                do 15 k = max0(1,i-mm1) , min0(nbin,i+mm1)
                        f(k) = f(k) + c * w( iabs(k-i)+1 )
15              continue
20      continue

        return
        end


c       April 12, 1986          bin2.f
c
c       Find bin counts of data array "x(n,2)" for ASH estimator
c
c       "nbin(1)" by "nbin(2)" bins are formed
c
c       x:axis  [ ab(1,1) , ab(1,2) )
c       y:axis  [ ab(2,1) , ab(2,2) )   half-open
c
c       bin counts returned in array "nc"  -  # pts outside [a,b) = "nskip"

c ##### Copyright 1986 David W. Scott

        subroutine bin2 ( x , n , ab , nbin , nc , nskip )
        dimension x(n,2) , nbin(2) , nc(nbin(1),nbin(2)) , ab(2,2)

        nskip = 0
        ax = ab(1,1)
        bx = ab(1,2)
        ay = ab(2,1)
        by = ab(2,2)

        do 5 j = 1,nbin(2)
                do 4 i = 1,nbin(1)
                        nc(i,j) = 0
4               continue
5       continue

        dx = (bx-ax) / nbin(1)
        dy = (by-ay) / nbin(2)

        do 10 i = 1,n
                kx = (x(i,1)-ax) / dx + 1.0
                ky = (x(i,2)-ay) / dy + 1.0
                if (kx.ge.1 .and. kx.le.nbin(1) .and.
     1              ky.ge.1 .and. ky.le.nbin(2)) then
                                nc(kx,ky) = nc(kx,ky) + 1
                else
                        nskip = nskip + 1
                end if
10      continue

        return
        end


c       April 12, 1986          ash2.f
c
c       Computer ASH density estimate;  Product Quartic (biweight) kernel
c
c       Average of "m[1] by m[2]" shifted histograms
c       
c       Bin counts in matrix "nc"  -  from routine "nbin2"
c
c       ASH estimates returned in matrix "f"
c
c       FP-ASH plotted at  a+d/2 ... b-d/2   where d = (b-a)/nbin
c
c       Note:  If "nskip" was nonzero, ASH estimates incorrect near boundary
c       Note:  Should leave "m" empty bins on each end of array "nc" so f OK

c #### Copyright 1986 David W. Scott
c #### added kernel option kopt 5-8-91

        subroutine ash2 ( m, nc, nbinx, nbiny, ab, kopt, f, w, ier )
        dimension m(2) , nc(nbinx,nbiny) , ab(2,2) , f(nbinx,nbiny)
        dimension w( m(1) , m(2) ), kopt(2)

        ier = 0
        ax = ab(1,1)
        bx = ab(1,2)
        ay = ab(2,1)
        by = ab(2,2)

c-compute weights    cons * ( 1-abs(i/m)^kopt1)^kopt2
c           --  should sum to "m"   5-8-91
c                       w-array shifted by 1

        mx = m(1)
        my = m(2)
        mxm1 = mx-1
        mym1 = my-1
        xm = mx
        ym = my

c-----ASSUMES f dimensioned larger than m(1) or m(2)

c-put marginal weights in f array as work array

        f(1,1) = 1.0
        f(2,1) = 1.0
c                     consx = sum of weights from -(m-1) to (m-1) = 1 + 2 (sum from 1 to m-1)
        consx = 1.0
        consy = 1.0

        do 5 i = 1,mxm1
                f(1,i+1) = ( 1.0 - abs(i/xm)**kopt(1) )**kopt(2)
                consx = consx + 2*f(1,i+1)
5       continue
        consx = float(mx)/consx
        do 6 i = 1,mym1
                f(2,i+1) = ( 1.0 - abs(i/ym)**kopt(1) )**kopt(2)
                consy = consy + 2*f(2,i+1)
6       continue
        consy = float(my)/consy

c-computer product weight array (avoids later multiplications)

        do 3 j = 1,my
                do 2 i = 1,mx
                        w(i,j) = (consx*f(1,i)) * (consy*f(2,j))
2               continue
3       continue

c-compute ash(m) estimate

        n = 0
        do 10 j = 1,nbiny
                do 9 i = 1,nbinx
                        f(i,j) = 0.0
                        n = n + nc(i,j)
9               continue
10      continue
c-check if estimate extends beyond mesh

        ncheck = 0
        do 12 j = my , nbiny+1-my
                do 11 i = mx , nbinx+1-mx
                        ncheck = ncheck + nc(i,j)
11              continue
12      continue
        if (ncheck .ne. n) ier = 1

        dx = (bx-ax) / nbinx
        dy = (by-ay) / nbiny

        hx = mx*dx
        hy = my*dy

        do 20 j = 1,nbiny
            do 19 i = 1,nbinx

                if (nc(i,j).eq.0) goto 19

                c = nc(i,j) / (n*hx*hy)

                do 18 ky = max0(1,j-mym1) , min0(nbiny,j+mym1)
                    do 17 kx = max0(1,i-mxm1) , min0(nbinx,i+mxm1)
                        f(kx,ky) = f(kx,ky) + c *
     1                          w( iabs(kx-i)+1 , iabs(ky-j)+1 )
17                  continue
18              continue

19              continue
20      continue

        return
        end


bin1----------------------------------------------------------------------
.BG
.FN bin1
.TL
bin1: Function to compute array of bin counts for a data vector
.CS
bin1(x, ab, nbin=50)
.PP
.AG x
(input) data vector
.AG ab
(input vector of length 2):  half-open interval for bins [a,b).  If no
value is specified, the range of x is stretched by 5% at each end and
used the interval.
.AG nbin
(input integer):  number of bins desired. Default 50.
.AG opt
(input OPTIONAL logical):  to suppress output messages, set opt=TRUE
.RT
bin1 returns a list including the vector of integer bin counts and
the ab vector and the number of points outside the ab interval.
.EX
x <- rnorm(100)         # data vector
ab <- c(-5,5)           # bin interval
bins <- bin1(x,ab,10)     # bin x into 10 bins over ab
.KW density estimation
.WR
ash1----------------------------------------------------------------------

.BG
.FN ash1
.TL
ash1: Computes univariate averaged shifted histogram (polynomial kernel)
.CS
ash1(bins, m, kopt)
.PP
.AG bins
(input list) $nc=integer vector of bin counts and $ab=bin interval
.AG m
(input) optional integer smoothing parameter;  default=5.
.AG kopt
(input) vector of length 2 specifying the kernel, which
is proportional to ( 1 - abs(i/m)**kopt(1) )**kopt(2);
(2,2)=biweight (default);  (0,0)=uniform;  (1,0)=triangle;
(2,1)=Epanechnikov;  (2,3)=triweight.
.RT
returns structure suitable for input to "plot"
.RC x=t
vector of bin center locations
.RC y=f
vector of ash estimates
.RC ier
0=normal exit; 1=estimate nonzero outside interval ab
.EX
x <- rnorm(100)         # data
f <- ash1(bin1(x,nbin=50),5) # compute ash estimate
plot( f , type="l" )    # line plot of estimate
.KW density estimation
.WR

bin2----------------------------------------------------------------------

.BG
.FN bin2
.TL
bin2: Bin bivariate data x
.CS
bin2(x, ab, nbin)
.PP
.AG x
(input matrix with 2 columns) data sample
.AG ab
(input 2 x 2 matrix) rows 1 and 2 contain x and y axis bin intervals,
respectively.  If not specified, the ranges are stretched by 5%
at each end for each dimension.
.AG nbin
(input vector of length 2) number of bins along x and y axes. Default
is 20 by 20. 
.RT
bin2 returns a list including the bivariate bin matrix (see "persp"
function for orientation) and the number of points outside the ab rectangle.
.EX
x <- matrix( rnorm(200), 100 , 2)       # bivariate normal n=100
ab <- matrix( c(-5,-5,5,5), 2, 2)       # interval [-5,5) x [-5,5)
nbin <- c( 20, 20)                      # 400 bins
bins <- bin2(x, ab, nbin)               # bin counts,ab,nskip
.KW density estimation
.WR

ash2----------------------------------------------------------------------

.BG
.FN ash2
.TL
ash2: Compute bivariate ASH estimate (product polynomial kernel)
.CS
ash2(bins, m, kopt)
.PP
.AG bins
(input list) bin count matrix nc and interval matrix  ab from "bin2"
.AG m
(input integer vector of length 2) x and y direction smoothing
parameters. Default is 5 by 5. 
.RT
Matrix of ASH estimates returned.
Components x,y,z can be given to the contour function directly.
Other input variables returned in list for record keeping.
.EX
Continuing example from help("bin2")

1.  To use with "persp"

m <- c(5,5)
f <- ash2(bins,m)
persp( f$z / max(f$z) )

2.  To use with "contour" (continuing part 1)

contour( f )

# note:  it is a good idea to not plot contours at level = 0

.KW density estimation
.WR


.fi