Often times when I give a talk about machine learning, I get a question about what is data mining and what is machine learning, which got me to thinking about the differences. Data mining has been implemented as a tool in databases for a while. SSIS even has a data mining task to run prediction queries on an SSAS data source. Machine Learning is commonly represented by Google’s self-driving car. After reading the article I linked about Google’s car or study the two disciplines, one can come to the understanding that they are not all that different. Both require the analysis of massive amounts of data to come to a conclusion. Google uses that information in the car to tell it to stop or go. In data mining, the software is used to identify patterns in data, which are used to classify the data into groups.
Data Mining is a subset of Machine Learning
There are four general categorizations of Machine Learning: Anomaly Detection, Clustering, Classification, and Regression. To determine the results, algorithms are run against data to find the patterns that the data contains. For data mining the algorithms tend to be more limited than machine learning. In essence all data mining is machine learning, but all machine learning is not data mining.
Goals of Machine Learning
There are some people who will argue that there is no difference between the two disciplines as the algorithms, such as Naïve Bayes or Decision trees are common to both as is the process to finding the answers. While I understand the argument, I tend to disagree. Machine learning is designed to give computers the ability to learn without specifically being programmed to do so, by extrapolating the large amounts of data which have been fed to it to come up with results which fit that pattern. The goal of machine learning is what differentiates it from data mining as it is designed to find meaning from the data based upon patterns identified in the process.
Deriving Meaning from the Data
As more and more data is gathered, the goal of turning data into information is being widely pursued. The tools to do this have greatly improved as well. Like Lotus 123, the tools that were initially used to create machine learning experiments bear little resemblance to the tools available today. As the science behind the study of data continues to improve, more and more people are taking advantage of the ability of new tools such as Azure Machine Learning to us data to answer all sorts of questions, from which customer is likely to leave aka Customer Churn or is it time to shut down a machine for maintenance. Whatever you chose to call it, it’s a fascinating topic, and one I plan on spending more time pursuing.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur