本次CS代写的主要涉及如下领域: 北美程序代写,加拿大程序代写
STAT 578  ASSIGNMENT 3  due date is on the course outline





where H is the hat matrix formed from x1 x Thus also measures the distance of a regressor from the mean of the data.
 Let be the median of a population with d.f. , and let ˆ be the sample median. One way to define ˆ as a functional () of the e.d.f. is as ˆ = −1 (5). (Look in any mathematical statistics book to see how the inverse of a step function like is defined.) The population version of this is ( )= −1 (5).
 Assuming that has a density () with ( ) 0, show that the Influence Function of ˆ is
1
()=
2 (
) ( − ) (*)
Here (·) is defined in such a way as to be continuous from the left at 0.


()= ( − )
[0 ( − )]
This makes no sense for the function corresponding to the median (why not?). However, first evaluate the denominator for diﬀerentiable functions through an integration by parts; and then evaluate it at the function corresponding to the median. Verify that the result agrees with (*).
 Consider a straight line regression model. Suppose that one of the regressors, say 1, is an outlier which is so far away from the rest of the sample that it is larger than ¯, whereas all other are smaller than ¯.

 Show that the L1 line necessarily passes through (1 1).
 Use (a) to give an alternate proof (i.e. diﬀerent from the one given in class for monotone Mestimators) that the breakdown point of the estimate is 0.
 Suppose that θˆ is an ordinary Mestimate of regression, where scale is known to equal 1 and hence is not estimated. Thus with = − x0θˆ we have

() x = 0×1
=1
To judge the influence of the j data point we might recalculate the estimate after
omitting this point from the data, thus obtaining a revised estimate θˆ(). This estimate
satisfies the equation F ³θˆ()´ = 0×1, where F (θ)= P6= ¡ − x θ¢ x. We can

approximate θˆ() by carrying out just one step of the NewtonRaphson method for solving F (θ)= 0, starting with θˆ.

 Show that this onestep procedure results in the approximation
θˆ() ≈ θˆ −
"X 0 () xx0
#−1
x ()

 Show that the expression above reduces to
θˆ ≈ θˆ − M−1x
()
()
1 − 0 ( ) x0 M−1x

0 () xx0.
 Consider the Water Quality dataset, as described in §5 of SR&C and available on the course website. For each of the methods (i) Least Squares, (ii) Ordinary Mestimation with Huber’s 15, (iii) GM estimation (3 step) with the Huber’s 15 and the ‘w1’ weights, (iv) GM estimation (3 step) with the Huber’s 15 and the ‘w2’ weights, (v)

estimation, (vii) Least Squares after the removal of the Hackensack River observation:

 Fit a regression model relating N to the 3 variables. (Do not use ‘Other’  fit an intercept instead. Some of the Rfunctions cannot yet handle nointercept models.) Present the coeﬃcients in tabular form, so that they can be easily compared. Comment on your findings.
 Plot the standardized residuals ˆ against the fitted values. Comment on your
findings.

 Carry out individual ttests for the significance of the three regressors, using the mm fit and appropriate asymptotic approximations.
 Develop a method of robust nonlinear Mestimation. There is no one ‘right’ answer here  I just want you to do something that is sensible, and that agrees with what we have done in class in the linear case. In the model
= (x θ)+


 Suppose that one estimates a straight line for ∈ [−1 1], obtaining the LS estimate
Define the prediction bias at by () = 
− { 
0 
+ 
1 
+ 
2 
2}, and 
ˆ 
0 
+ ˆ 
1 




the overall bias by = 1

2 () . Show that () = 2
(2
− ), where


(µ2 −


+
3 45


= q−1 , is a biasminimizing design.
−1