Confused as to which ML method to use?

Which Machine Learning Methods to use? An Overview

You might have read various books, journals, blogs or watched YouTube videos about this topic. There are various Machine Learning Methods listed in various media. Ever wondered which one to use, and when to use them? Which one would be the most appropriate one?

Well, the answer depends on what you are seeking. Let us have a look at various methods and how they compare with each other.

What are the typical methods of data analysis and machine learning?

Here, we have not divided statistical tests, multivariate analysis, and other machine learning methods.

Here is a list of various methods you might come across in books, articles, etc.

  • Association analysis
  • Cluster analysis
  • Correspondence analysis
  • Cross-tabulation
  • Decision tree, regression tree
  • Discriminant analysis
  • Ensemble learning
  • Factor analysis
  • K neighborhood method
  • Multidimensional scaling
  • neural network
  • Principal component analysis/singular value decomposition
  • Regression analysis
  • Self-organizing map
  • Statistical test
  • Support vector machine
  • Survival analysis
  • Time series analysis
  • And many others…

Machine learning methods can be broadly divided into four types

If you want to predict category values,

UnsupervisedSupervised
ClusteringDiscrimination (classification)

Other than the above…

Predict continuous values
​​(Supervised)
Other than that
(Unsupervised)
RegressionDimension reduction

Here,
Category value: A value indicating the category of YES / NO, prefecture, etc.
Continuous value: Sales, temperature, etc. continuous value. (Sales is an integer value but it is regarded as a virtually continuous value)

Also,
Supervised: An objective variable is given as the answer.
Not supervised: No objective variable is given as an answer.

As for the variables,
Objective variable: The value to be determined. Also called as ‘dependent variable’.
Explanatory variable: A value that describes it. It is also called a ‘dependent variable’ or ‘feature quantity;.

So when should I use which?

  • Unsupervised learning:
    • Classification → clustering
    • Reduce the dimensions → Principal component analysis
    • Besides these, there are other methods such as self-organizing map etc.
  • Supervised learning:
    • In the case of supervised data, we can narrow down the method by our priorities. Do we need a basis for explaining the prediction or do we want to determine the prediction accuracy?
    • If you need explanatory abilities (useful for business use) :
      • Regression analysis, decision trees, regression trees are mainly used
    • When you need prediction accuracy:
      • Ensemble learning or support vector machine are mainly used
    • When explanatory variables are not clear, such as image recognition
      • Deep learning (a part of neural networks) is popular lately

A comparison of machine learning methods

MethodDiscrimination (classification)RegressionExplanatory powerPrediction accuracyRemarks
regression analysisBadGoodVery goodSo-soThe regression equation is obtained. As a result, it is clear how much the explanatory variable affects the predicted value
Decision tree, regression treeGoodGoodAlthough no regression equation can be obtained, the influence of explanatory variables can be known
neural networkGoodGoodBadNA (See right)It generates a black box. So there is no explanation at all. There is a limit to prediction accuracy, but can be greatly improved and focused by deep learning. In other words, it depends on the model and data.
Support vector machineGoodGoodSo-soGoodIt was a noteworthy technology before ensemble learning came out
Ensemble learningGoodGoodSo-soGoodCreate multiple learning models and take an average or majority decision. There is no explanation or the explanation is very weak. However, the accuracy is very good.
  • Other
    • Depending on the application, there are commonly used methods
      • Data in chronological order
        • Time series analysis, state space model
      • This is recommended for those who bought it
        • Association analysis
    • Also, a combination of techniques may be used
      • Because there are many explanatory variables, use principal component analysis to reduce dimensions and reduce variables, then use discrimination and regression methods, etc.

Conclusion

We glossed over various popular methods available in Machine Learning. You can compare them or try them yourself to see which one words the best for you. Was it useful? Would you like more insights? Please comment below!