本次CS代写的主要涉及如下领域: Python代写,Algorithm代写,Machine Learning代写,澳洲程序代写,DATA7703代写,University of Queensland代写
DATA7703, Assignment 3
2020 Semester 2, due 23:59 6 Oct
Instructions. Please read the instructions carefully — not following them may result in a penalty. (a) Submit your solutions as a single PDF file on Blackboard. Go to Assessment, Assignment 3 to submit. If you don’t know how to convert your file to a PDF, please search for a guide online. You can submit as many times as you want before the deadline. The last submission will be graded. (b) Write down your name and student number on the first page of your solution report, and write down the question numbers for your solutions. For programming questions, you are welcome to submit your code files or output files in a separate zip file, but you must include both your code and relevant output in your submitted PDF file. Excessive code output may be penalised. (c) Follow integrity rules, and provide citation as needed. You can discuss with your classmates, but you are required to write your solutions independently, and specify who you have discussed with in your solution. If you do not know how to solve a problem, you can get 15% of the mark by writing down “I don’t know”.
You are encouraged to keep your solutions concise — these questions require thoughts, not long answers.
|
can be written as 𝑐(�) =
+1, � ≤ 5, .
−1, � > 5.
Given threshold classifiers 𝑐1, . . . , 𝑐�, we say that a linear combination 𝑓 = �1𝑐1 + . . . +
𝑑𝑒𝑓
��𝑐� of them represents a set 𝑆 ⊆ R if and only if the classifier 𝑐(�)
satisfies 𝑐(�) = 𝐼(� ∈ 𝑆).
= 𝐼(𝑓 (�) ≥ 0)
-
- (5 marks) For any 𝑎 < 𝑏, consider the threshold classifiers 𝑐1(�) = 𝐼(� > 𝑎), 𝑐2(�) =
𝐼(� < 𝑏), and 𝑐3(�) = 𝐼(� < +∞). What is the set represented by 𝑓 = 𝑐1 +𝑐2 +0.1𝑐3?
-
- (10 marks) Find a linear combination of threshold classifiers to represent two intervals (−2, −1) ∪ (1, 2),
-
�
|
|
|
|
|
There are many possible ways to choose �1, . . . , �� in Wagging, and we consider how
their values affect bias and variance below.
In the remainder of this question, assume that 𝑌1, . . . , 𝑌� are identically distributed with var(𝑌�) = 𝜎2 for all �, and cov(𝑌�, 𝑌�) = 𝜌𝜎2 for all 1 ≤ � ̸= � ≤ �.
-
- (5 marks) Show that Wagging has the same bias as each individual model.
- (5 marks) Express var(𝑌¯ ) in terms of �1, . . . , ��, 𝜌 and 𝜎2. In addition, for � = 2,
use your formula to evaluate var(𝑌¯ ) when (�1, �2) equals to (1, 0), (1 , 1 ) and (0, 1)
2 2
respectively.
|
|
|
|
-
- (5 marks) Load the California housing dataset provided in sklearn.datasets, and construct a random 70/30 train-test split. Set the random seed to a number of your choice to make the split reproducible. What is the value of 𝑑 here?
- (5 marks) Train a random forest of 100 decision trees using default hyperparameters. Report the training and test accuracies. What is the value of � used?
- (5 marks) Compute all the pairwise correlations between the test set predictions of the 100 trees, and report their average. Correlation refers to the Pearson correlation in this question.
- (5 marks) Repeat (b) and (c) for � = 1 to 𝑑, and tabulate the training and test accuracies, and the average correlations for all � values. In addition, plot the training and test accuracies against � in a single figure, and plot the average correlation against � in another figure.
- (5 marks) Describe how the average correlation changes as � increases. Explain the observed pattern.
- (5 marks) A data scientist claims that we should choose � such that the average correlation is smallest, because it gives us maximum reduction in the variance, thus maximum reduction in the expected prediction error. True or false? Justify your answer.
|
|
pre-specified maximum number of models, or (b) � is at least 10, and
𝜖¯𝑂𝑂𝐵
≥ 𝜖¯𝑂𝑂𝐵.
|
|
|
|
|
|
the OOB error for the ensemble consisting of the first � models.
To help you implement the above enhanced bagging algorithm, a partial implementation with some comments has been provided below and also in the file bagging.py.
from sklearn.base import clone import numpy as np
class OOBaggingClassifier:
def init (self, base_estimator, n_estimators=200):
'''
Parameters
----------
base_estimator: a probabilistic classifier that implements the predict_proba function, such as DecisionTreeClassifier
n_estimators: the maximum number of estimators allowed.
'''
self.base_estimator_ = base_estimator self.n_estimators = n_estimators self.estimators_ = [] self.oob_errors_ = []
def fit(self, X, y, random_state=None): if random_state:
np.random.seed(random_state)
self.best_n = 0
probs_oob = None
for i in range(self.n_estimators): estimator = clone(self.base_estimator_)
# construct a bootstrap sample
# train on bootstrap sample
# compute OOB error
oob_error = ... # replace ... with your code
# save the OOB error and the new model self.oob_errors_.append(oob_error) self.estimators_.append(estimator)
# stop early if smoothed OOB error increases (for the purpose of # this problem, we don't stop training when the criterion is
# fulfilled, but simply set self.best_n to (i+1)).
if (self.best_n == 0) and (OOB criterion); # replace OOB criterion with your code
self.best_n = (i+1)
def errors(self, X, y):
'''
Parameters
----------
X: an input array of shape (n_sample, n_features)
y: an array of shape (n_sample,) containing the classes for the input examples
Returns
------
error_rates: an array of shape (n_estimators,), with the error_rates[i] being the error rate of the ensemble consisting of the first (i+1) models.
'''
error_rates = []
# compute all the required error rates return error_rates
def predict(self, X):
'''
Parameters
----------
X: an input array of shape (n_sample, n_features)
Returns
------
y: an array of shape (n_samples,) containig the predicted classes
'''
probs = None
for estimator in self.estimators_: p = estimator.predict_proba(X) if probs is None:
probs = p else:
probs += p
return np.argmax(probs, axis=1)
Note that in this question, we only consider bagging of probabilistic classifiers, and we use a different prediction method from the majority vote method discussed in lecture: we first use each individual classifier to predict the class distributions, then we average over all the predicted distributions, and predict the most likely class. This is implemented in the predict function, and you should use the same prediction method when computing the OOB prediction. If an example is not an OOB example for any of the basis models, predict the first class.
-
- (5 marks) Load the digit data set provided in sklearn.datasets, and construct a random 70/30 train-test split. Set the random seed to a number of your choice to make the split reproducible.
- (5 marks) A naive way to compute the OOB error is to store all bootstrap samples, and then use the definition of OOB error discussed in lecture to compute it. This requires a lot of memory for large datasets. Describe a method that computes the
|
using the �-th bootstrap sample, but not previous bootstrap sam-
ples — this allows you to throw away a bootstrap sample once the �-th model has
|
has been computed. Introduce additional variables or data
structures if needed, but the amount of memory used should be at most within a constant factor of that for the original dataset size (the constant factor need to be independent of the number of basis models used).
-
- (10 marks) Implement the fit function to support the method for using the OOB error to determine the number of models as described above.
- (5 marks) Implement the errors function which takes in a labeled dataset, and returns the error rates for the ensemble consisting of the first � basis models, for 1 ≤ � ≤ �.
- (5 marks) Train your OOBaggingClassifier on the digit dataset, using a maximum of 200 decision trees. Report the number of selected models. Plot the OOB error and the test error against the number of models (from 1 to 200). What is the number of basis models chosen by the OOB error method?
- (5 marks) Repeat (f) two more times using different random seeds. Comment on the performance of the OOB error method for selecting the number of models.