Best Machine Learning Models
Supervised learning:
Logistic regression is used for classification. It analyzes independent variables (categorical or numeric) and provides a binary output (yes/no, pass/fail, cat/dog) which is always categorical.
Linear regression is used for regression. This algorithm assumes a linear relationship between input and output variables.
Decision tree can be used for both classification and regression. It is very intuitive and fast. A decision tree is simply a set of cascading questions, i.e., it separates the data into the two most similar categories at a time, thus, creating branches and leaves.
Random forest algorithms can also be used for both classification and regression like decision trees. Random forests consist of several randomly created decision trees that operate as an ensemble. Random forests frequently outperform a single decision tree since a large number of loosely correlated trees protect each other from individual errors.
Unsupervised learning:
K-means clustering is an iterative technique that attempts to split a dataset into K separate and non-overlapping clusters (subgroups) such that each data point is part of only one of these clusters. The idea is to make the data points belonging to the same subgroup as similar as possible while keeping the clusters as separate as possible from each other.
Apriori algorithm is used for data categorization and the generation of association rules. Association rules specify how closely or loosely two items are related. These rules are created using a breadth-first search algorithm.