build a decision tree for a given dataset (Java代写)

Your first task is to build a decision tree for a given dataset.

联系我们
微信: biyeprodaixie 欢迎联系咨询

本次CS代写的主要涉及如下领域: Java代写

Introduction

We have already laid most of the groundwork for building decision trees and, at this point, we are only one well- crafted (recursive) algorithm away from building these trees! The description for Assignment 4 is shorter than those for the previous assignments, since we have covered most of what we need in these previous assignment descrip- tions. The amount of coding involved in Assignment 4 is also going to be less than in the previous assignments. However, the smaller implementation does not mean that you should postpone it! Assignment 4 requires famil- iarity with both (simple) tree-based linked structures as well as recursive programming. These concepts need time to digest. We therefore ask that you start working on the assignment as soon as you receive this description.

You will be implementing the following three tasks in Assignment 4. The tasks marked with a * will take more time to complete.

 

Task 1*.

Your first task is to build a decision tree for a given dataset. The implementation will be done in the DecisionTree class (whose template code has been provided to you). The nodes of the decision tree will be instances of the following (private) nested class in DecisionTree.

 

 

 

 

 

 

 

 
 
 
 

 

 

The DecisionTree class will instantiate this generic class with VirtualDataSet class as type. All the tree nodes that you will be working with are therefore instances of Node<VirtualDataSet>. The DecisionTree has an instance variable, named root, which maintains a reference to the root of the tree:

 

 
 
 

The pesudo-code for the recursive build(...) method that you need to write is as follows:

 

 
 

 

Task 2*.

Write a toString() method that provides an if-else representation of the decision tree. The toString() method will call a recursive toString(...) method with the following signature:

 

 
 

 

We want the if-else representation of a decision tree to be properly indented for readability.   As we traverse deeper into the decision tree, the amount of indentation, captured by the indentDepth parameter, therefore has to increase (e.g., by one tab or one space). In the DecisionTree class, you have been provided with a simple method to create the desired indent to use as prefix for different depths during the traversal of the tree.

 

IMPORTANT: JDK 12 and upward provide an indent(int n) method for Strings. You are NOT allowed to use this method, as this will necessitate JDK 12+ for compiling your program. flor marking, you can expect the TAs to be using JDK 11 but not higher. If your program does not compile due to using String’s new indent(int n) method, you are solely responsible for any marks deducted because of non-compilation.

 

We now illustrate the output returned by the toString() method of the DecisionTree class.  Consider the

main(...) method below:

 

 
 

 

The output generated by the above main(...) method is as follows:

*** Decision tree for weather-nominal.csv *** if (outlook is ’sunny’) {

if (humidity is ’high’) { class = no

}

else if (humidity is ’normal’) { class = yes

}

}

else if (outlook is ’overcast’) { class = yes

}

else if (outlook is ’rainy’) { if (windy is ’FALSE’) {

class = yes

}

else if (windy is ’TRUE’) { class = no

}

}

 

 
 
 

To be able to implement the toString(Node<VirtualDataSet> node, int indentDepth) method properly, you need to take note of three factors:

  1. In Assignments 2 and 3, we did not keep track of the split condition in the virtual datasets resulting from partitioning. Examples of split conditions in the above output are:
    • outlook is ’sunny’
    • windy is ’FALSE’
    • humidity <= 70
    • humidity > 70

Compared to the reference implementations you were provided with for Assignments 2 and 3, the template code for Assignment 4 has a slightly updated VirtualDataSet class. Specifically, VirtualDataSet now keeps track of the split condition that induced the dataset during the partitioning process. You can now obtain the split condition associated with a (virtual) dataset by simply calling the newly added getCondition() method. You will therefore not need to change VirtualDataSet.

  1. The build(...) method in Task 1 merely manages the splitting process and stops it where the process cannot or should not continue. That method, however, does not ascribe a decision to the leaf nodes of the decision tree. The decision (verdict) for each leaf node is computed by the toString() method. For weather-nominal.csv, the two possible decisions are: (i) class = no and (ii) class = yes. For weather- numeric.csv, the two possible decisions are: (i) play = no and (ii) play = yes.

 

 
 
If the dataset is noisy or the attributes not chosen properly by the data scientists, the leaf nodes in the decision tree may have datapoints that disagree on their “class” attribute. In other words, one may have a mix of yeses and noes in the leaves. In our weather-nominal example, for instance, we could have had situations where both class = no and class = yes are supported by the datapoints remaining in a given leaf node. For the purposes of this assignment, if multiple decisions are supported by a leaf node, toString(...) can arbitrarily pick either of them1.

 

1The reference implementation that you will receive later for Assignment 4 simply returns, for each leaf node, the first value in the unique value set of the “class” attribute. More nuanced implementations are possible but are beyond the scope of this assignment.

 

Task 3.

In this task, you will implement exception handling for the methods in DecisionTree, as well as for the methods in the three classes you developed in Assignmetn3. Specifically, the methods in the following four classes require proper exception for all their edge cases:

 

    • DecisionTree.java (from the current assignment)
    • EntropyEvaluator.java (from Assignment 3)
    • GainInfoItem.java (from Assignment 3)
    • InformationGainCalculator.java2 (from Assignment 3)

 

For Task 3, we do not anticiptae that you will need to define any new exception c lasses. The exception classes already provided by Java should suffice.  In particular, the following exception classes are probably all that you need:   IOException,IllegalArgumentException,   ArrayIndexOutOfBoundsException,    IllegalStateException and NullPointerException.

 

Implementation

Like in previous assignments, you cannot change any of the signatures of the methods. You cannot add new public methods or variables either. You can, however, add new private methods to improve the readability or the organization of your code.

Guidance is provided in the template code in the form of comments. For the DecisionTree class, the locations where you need to write code have been clearly indicated with an inline comment that reads as follows:

// WRITE YOUR CODE HERE!

For Task 3 (exception handling), you need to decide where and how to update the code, based on what you have learned during the lectures and the labs. No guidance is provided in the code for exception handling.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
 
 

fliles

 

 

  • README.txt

– A text file that contains the names of the two partners for the assignments, their student ids, section, and a short description of the assignment (one or two lines).

  • ActualDataSet.java
  • Attribute.java
  • AttributeType.java
  • CSVReader.java
  • DataReader.java
  • DataSet.java
  • DecisionTree.java (Aside from exception handling, the new code you write in Assignment 4 is localized to DecisionTree.java)
  • EntropyEvaluator.java
  • GainInfoItem.java
  • InformationGainCalculator.java
  • StudentInfo.java (Make sure to update the file, so that the display() method shows your personal informa- tion).
  • Util.java
  • VirtualDataSet.java