Have you played that guessing game where they ask you 20 questions and guess what are you thinking about http://en.akinator.com/personnages/
In Decision trees we use a criteria to split our data into parts and finally classify the sequence into a class. Like in Akinator if they ask 20 question and each question has 2 options (either yes or no) we can classify/find 2^20 people at best.
Questions are asked in a way which would result in best split. Say if we have 10 people and we want this tree to have minimum depth, we would like to ask a question which would split this data into equal halves. In Machine learning terminology we call this Entropy or Information gain. Entropy is high when both options are equally likely. There are different ways in which this entropy function can be computed eg. variance entropy, gini, impurity. In layman’s term this function just returns how good our function will split the data at a particular node.
Algorithms like CART and ID3 are popular for solving tree based problems. This technique is however useful when we have nominal data, ie we have no way to measure or relate two values in terms of distance. They are incomparable. Like in a popular toy example they say given chances of rain, sun how likely one will go out to play. We enumerate all possible cases and respective values associated with those combinations.
Then we create a tree where our columns in the table becomes questions we ask at each node. Sequence in which we ask questions is learned from training sequence.
Follow this tutorial for computational example
For working code in Java