STAT 578 - ASSIGNMENT 3 (北美程序代写,加拿大程序代写) - 毕业Pro|程序作业代做,Python作业代写,非中介代写,CS作业代写,C/C++/JAVA/python/assignment/算法/web/安卓/os/AI/Machine Learning,承接澳洲/英国/新西兰/加拿大/美国/北美作业代做

本次CS代写的主要涉及如下领域: 北美程序代写,加拿大程序代写

STAT 578 - ASSIGNMENT 3 - due date is on the course outline



Consider a regression model with an intercept, so that the regressors are of the form x0 = (1 v0). Let C = (v − v¯) (v − v¯)0  ( − 1) be the sample covariance matrix, and define the Mahalanobis distance by 2 = (v − v¯)0 C−1 (v − v¯). Show that







2 = ( − 1) µ − 1 ¶ 

where H is the hat matrix formed from x1  x Thus  also measures the distance of a regressor from the mean of the data.

Let  be the median of a population with d.f.  , and let ˆ be the sample median. One way to define ˆ as a functional  () of the e.d.f. is as ˆ = −1 (5). (Look in any mathematical statistics book to see how the inverse of a step function like  is defined.) The population version of this is  ( )=  −1 (5).
1. Assuming that  has a density  () with  ( )  0, show that the Influence Function of ˆ is

 ()=

2 (

)  ( −  )  (*)

Here  (·) is defined in such a way as to be continuous from the left at 0.

P ³ ´

Use (*) to exhibit the asymptotic distribution of the sample median, when sam- pling from a  ( 2) population.



Write ˆ as an M-estimate, i.e. a solution to 1   − ˆ = 0, for a particular choice of . Show that the expression for the IF of an M-estimate, derived in class, becomes in this case

 ()=  ( − ) 

 [0 ( − )]

This makes no sense for the -function corresponding to the median (why not?). However, first evaluate the denominator for diﬀerentiable -functions through an integration by parts; and then evaluate it at the -function corresponding to the median. Verify that the result agrees with (*).

Consider a straight line regression model. Suppose that one of the regressors, say 1, is an outlier which is so far away from the rest of the sample that it is larger than ¯, whereas all other  are smaller than ¯.

1. Show that the L1 line necessarily passes through (1 1).
2. Use (a) to give an alternate proof (i.e. diﬀerent from the one given in class for monotone M-estimators) that the breakdown point of the estimate is 0.

Suppose that θˆ is an ordinary M-estimate of regression, where scale is known to equal 1 and hence is not estimated. Thus with  =  − x0θˆ we have



 () x = 0×1

=1

To judge the influence of the j data point we might recalculate the estimate after

omitting this point from the data, thus obtaining a revised estimate θˆ(). This estimate

satisfies the equation F ³θˆ()´ = 0×1, where F (θ)= P6=  ¡ − x θ¢ x. We can





approximate θˆ() by carrying out just one step of the Newton-Raphson method for solving F (θ)= 0, starting with θˆ.

1. Show that this one-step procedure results in the approximation

θˆ() ≈ θˆ −

"X 0 () xx0

#−1

x () 

 

1. Show that the expression above reduces to

θˆ ≈ θˆ − M−1x

 () 

()

 1 − 0 ( ) x0 M−1x

=1

where M = P

0 () xx0.

  

Consider the Water Quality dataset, as described in §5 of SR&C and available on the course website. For each of the methods (i) Least Squares, (ii) Ordinary M-estimation with Huber’s 15, (iii) GM estimation (3 step) with the Huber’s 15 and the ‘w1’ weights, (iv) GM estimation (3 step) with the Huber’s 15 and the ‘w2’ weights, (v)

GM estimation (3 step) with the bisquare  ( ; 468) and the ‘w1’ weights, (vi) MM-

estimation, (vii) Least Squares after the removal of the Hackensack River observation:

1. Fit a regression model relating N to the 3 variables. (Do not use ‘Other’ - fit an intercept instead. Some of the R-functions cannot yet handle no-intercept models.) Present the coeﬃcients in tabular form, so that they can be easily compared. Comment on your findings.
2. Plot the standardized residuals ˆ against the fitted values. Comment on your

findings.

1. Carry out individual t-tests for the significance of the three regressors, using the mm fit and appropriate asymptotic approximations.

Develop a method of robust non-linear M-estimation. There is no one ‘right’ answer here - I just want you to do something that is sensible, and that agrees with what we have done in class in the linear case. In the model

 = (x θ)+ 

P ³ ´

propose an algorithm for determining an ordinary M-estimate of θ, i.e. a minimizer of



 −(x) for a suitable function . I suggest using the MAD to re-estimate scale after each of the regression-estimation steps. Show that, if your algorithm converges, then it converges to a solution of the original equations.

Suppose that one estimates a straight line for  ∈ [−1 1], obtaining the LS estimate

Define the prediction bias at  by  () =

− {

+ 

 + 

2}, and

ˆ

+ ˆ



h i

ˆ0 +ˆ1. The design is symmetric, in that − is a design point whenever  is a design point. Now suppose that the true response is quadratic:  [ |]= 0 + 1 + 2 .

the overall bias by  = 1

−1

2 () . Show that  () = 2

(2

−  ), where



2 = −1 P 2, and that

 = 22

(µ2 −

)

1 2 4

+ 

3 45

Then show that the design with equally spaced design points  = − + 2(−1) , where

+1

 = q−1 , is a bias-minimizing design.

−1