STAT 578 - ASSIGNMENT 3 (北美程序代写,加拿大程序代写)

Consider a regression model with an intercept

联系我们
微信: biyeprodaixie 欢迎联系咨询

本次CS代写的主要涉及如下领域: 北美程序代写,加拿大程序代写

 

STAT 578 - ASSIGNMENT 3 - due date is on the course outline

 

  1.  

P

Consider a regression model with an intercept, so that the regressors are of the form x0 = (1 v0).  Let C =     (v − v¯) (v − v¯)0  ( − 1) be the sample covariance matrix, and define the Mahalanobis distance  by 2 = (v − v¯)0 C−1 (v − v¯).  Show that

 



2 = ( − 1) µ − 1 ¶ 

 

where H is the hat matrix formed from x1  x Thus  also measures the distance of a regressor from the mean of the data.

  1. Let   be the median of a population with d.f.   ,  and let ˆ be the sample median. One way to define ˆ as a functional  () of the e.d.f.  is as ˆ = −1 (5).    (Look in any mathematical statistics book to see how the inverse of a step function like  is defined.)   The population version of this is  ( )=  −1 (5).
    1. Assuming that  has a density  () with  ( )  0, show that the Influence Function of ˆ is

 

1

 ()=

2 (

 

)  ( −  )                                     (*)

 

Here  (·) is defined in such a way as to be continuous from the left at 0.

    1.  

P    ³          ´

Use (*) to exhibit the asymptotic distribution of the sample median, when sam- pling from a  ( 2) population.
    1.  

Write ˆ as an M-estimate, i.e.  a solution to  1         − ˆ   = 0, for a particular choice of . Show that the expression for the IF of an M-estimate, derived in class, becomes in this case

 ()=   ()   

 [0 ( − )]

This makes no sense for the -function corresponding to the median (why not?). However, first evaluate the denominator for differentiable -functions through an integration by parts; and then evaluate it at the -function corresponding to the median. Verify that the result agrees with (*).

 

  1. Consider a straight line regression model. Suppose that one of the regressors, say 1, is an outlier which is so far away from the rest of the sample that it is larger than ¯, whereas all other  are smaller than ¯.

 

    1. Show that the L1 line necessarily passes through (1 1).
    2. Use (a) to give an alternate proof (i.e. different from the one given in class for monotone M-estimators) that the breakdown point of the estimate is 0.
 
  1. Suppose that θˆ is an ordinary M-estimate of regression, where scale is known to equal 1 and hence is not estimated.   Thus with  =  − x0θˆ we have

 

X

 () x = 0×1

=1

 

To  judge the influence  of the j data point  we  might  recalculate  the estimate after

omitting this point from the data, thus obtaining a revised estimate θˆ().   This estimate

satisfies the equation F ³θˆ()´ = 0×1, where F (θ)= P6=  ¡ − x  θ¢ x.   We can

 

 

approximate  θˆ()  by  carrying  out  just  one step  of  the  Newton-Raphson  method  for solving F (θ)= 0, starting with θˆ.

 

    1. Show that this one-step procedure results in the approximation

 

θˆ()  ≈ θˆ −

 

"X 0 () xx0

 

#−1

 

 

x () 

 

    

 

    1. Show that the expression above reduces to

 

 

θˆ     ≈ θˆ − M−1x

 

            ()        

 

()

 

 1 − 0 ( ) x0 M−1x

 

 

=1

where M = P

 

0 () xx0.

 

                       

 

 

  1. Consider the Water Quality dataset, as described in §5 of SR&C and available on the course website. For each of the methods (i) Least Squares, (ii) Ordinary M-estimation with  Huber’s  15,  (iii)  GM  estimation  (3  step)  with  the  Huber’s  15  and  the  ‘w1’ weights, (iv) GM estimation (3 step) with the Huber’s 15  and the ‘w2’ weights, (v)

 

·

GM estimation (3 step) with the bisquare  ( ; 468) and the ‘w1’ weights, (vi) MM-

estimation, (vii) Least Squares after the removal of the Hackensack River observation:

 

    1. Fit a regression model relating N to the  3 variables.  (Do  not use ‘Other’  - fit an intercept instead. Some of the R-functions cannot yet handle no-intercept  models.) Present the coefficients in tabular form, so that they can be easily compared. Comment on your findings.
    2. Plot the standardized residuals ˆ against the fitted values.   Comment on your

findings.

    1. Carry out individual t-tests for the significance of the three regressors, using the mm fit and appropriate asymptotic approximations.
 

 

  1. Develop a method of robust non-linear M-estimation. There is no one ‘right’ answer here - I just want you to do something that is sensible, and that agrees with what we have done in class in the linear case. In the model

 

 = (x θ)+ 

 

 

P ³              ´

propose an algorithm for determining an ordinary M-estimate of θ, i.e. a minimizer of

 

   (x)    for a suitable function .   I suggest using the MAD to re-estimate scale after each of the regression-estimation steps. Show that, if your algorithm converges, then it converges to a solution of the original equations.
  1. Suppose that one estimates a straight line for  ∈ [−1 1], obtaining the LS estimate

 

Define the prediction bias at  by  () =

− {

0

+ 

1

 + 

2

2},  and

ˆ

0

+ ˆ

1

           
         
 
 
 

 

R

2

h              i

ˆ0 +ˆ1.   The design is symmetric, in that − is a design point whenever  is a design point. Now suppose that the true response is quadratic:  [ |]= 0 + 1 + 2  .

 

the overall  bias by   = 1

 

2

−1

2 () .     Show that  () = 2

 

(2

 

−  ), where

 

 

2 = −1 P 2, and that

 

 

 

2

 = 22

(µ2 −

 

 

 

)

1  2   4

+       

3        45

 

 

 

    

Then show that the design with equally spaced design points  = − + 2(1) , where

 

 

+1

 = q1 , is a bias-minimizing design.

−1