Sunday, March 27, 2016

Similarity in Classification

Classification and sequencing of data is often done on the basis of similarity. This isn't always the case. In the case of alphabetical order and numerical order, letter and number override similarity. However, similarity is a popular tool for classification.

Take the example of sports. They can be placed into a number of categories. These include ball sports, aquatic sports, winter sports and team sports. In certain cases, the boundary between one and the other isn't so clear. For example, tennis can be classified as a team sport (doubles, mixed doubles) but can also be classified as a ball sport. Waterpolo can be classified as an aquatic sport or as a ball sport.

Sometimes sequences vary for the same data. If we are asked to sequence white, grey and black, we may go from lightest to darkest (white, grey, black) or from darkest to lightest (black, grey, white). Both orders are equally possible. However, if we have to group hair, it's likely that we'll start with black hair and then add grey hair and white hair. The reason is that black hair is associated with young age and white hair with old age.

The sequencing changes, though, if we have to sequence clouds. It's likely that in this case we'll start with white clouds and then continue with grey clouds and black clouds. The reason is that grey and black clouds are associated with rain and black clouds are considered the ones which are most likely to be a significant source of rain.

Data can be classified and sequenced in a number of ways. Sometimes the boundary between one category and another isn't so clear. It's also true that data is sometimes classified and sequenced differently depending on the method of classification used.


No comments: