Which Machine Learning Methods to use? An Overview

You might have read various books, journals, blogs or watched YouTube videos about this topic. There are various Machine Learning Methods listed in various media. Ever wondered which one to use, and when to use them? Which one would be the most appropriate one?

Well, the answer depends on what you are seeking. Let us have a look at various methods and how they compare with each other.

What are the typical methods of data analysis and machine learning?

Here, we have not divided statistical tests, multivariate analysis, and other machine learning methods.

Here is a list of various methods you might come across in books, articles, etc.

Association analysis
Cluster analysis
Correspondence analysis
Cross-tabulation
Decision tree, regression tree
Discriminant analysis
Ensemble learning
Factor analysis
K neighborhood method
Multidimensional scaling
neural network
Principal component analysis/singular value decomposition
Regression analysis
Self-organizing map
Statistical test
Support vector machine
Survival analysis
Time series analysis
And many others…

Machine learning methods can be broadly divided into four types

If you want to predict category values,

Unsupervised	Supervised
Clustering	Discrimination (classification)

Other than the above…

Predict continuous values (Supervised)	Other than that (Unsupervised)
Regression	Dimension reduction

Here,
Category value: A value indicating the category of YES / NO, prefecture, etc.
Continuous value: Sales, temperature, etc. continuous value. (Sales is an integer value but it is regarded as a virtually continuous value)

Also,
Supervised: An objective variable is given as the answer.
Not supervised: No objective variable is given as an answer.

As for the variables,
Objective variable: The value to be determined. Also called as ‘dependent variable’.
Explanatory variable: A value that describes it. It is also called a ‘dependent variable’ or ‘feature quantity;.

So when should I use which?

Unsupervised learning:
- Classification → clustering
- Reduce the dimensions → Principal component analysis
- Besides these, there are other methods such as self-organizing map etc.
Supervised learning:
- In the case of supervised data, we can narrow down the method by our priorities. Do we need a basis for explaining the prediction or do we want to determine the prediction accuracy?
- If you need explanatory abilities (useful for business use) :
  - Regression analysis, decision trees, regression trees are mainly used
- When you need prediction accuracy:
  - Ensemble learning or support vector machine are mainly used
- When explanatory variables are not clear, such as image recognition
  - Deep learning (a part of neural networks) is popular lately

A comparison of machine learning methods

Method	Discrimination (classification)	Regression	Explanatory power	Prediction accuracy	Remarks
regression analysis	Bad	Good	Very good	So-so	The regression equation is obtained. As a result, it is clear how much the explanatory variable affects the predicted value
Decision tree, regression tree	Good	Good	Although no regression equation can be obtained, the influence of explanatory variables can be known	–	–
neural network	Good	Good	Bad	NA (See right)	It generates a black box. So there is no explanation at all. There is a limit to prediction accuracy, but can be greatly improved and focused by deep learning. In other words, it depends on the model and data.
Support vector machine	Good	Good	So-so	Good	It was a noteworthy technology before ensemble learning came out
Ensemble learning	Good	Good	So-so	Good	Create multiple learning models and take an average or majority decision. There is no explanation or the explanation is very weak. However, the accuracy is very good.

Other
- Depending on the application, there are commonly used methods
  - Data in chronological order
    - Time series analysis, state space model
  - This is recommended for those who bought it
    - Association analysis
- Also, a combination of techniques may be used
  - Because there are many explanatory variables, use principal component analysis to reduce dimensions and reduce variables, then use discrimination and regression methods, etc.

Conclusion

We glossed over various popular methods available in Machine Learning. You can compare them or try them yourself to see which one words the best for you. Was it useful? Would you like more insights? Please comment below!

Artificial Intelligence