As a professional using machine learning to analyze your business data. You take on two distinct roles of course you are a data analyst you’re analyzing business data but when you’re using machine learning and in particular supervised learning which is what we’re studying in this course you are also a teacher first up we need to talk: About what is machine learning? What is the whole concept I am guessing that you might know?
Something a bit about machine learning or at a very minimum you’ve just heard the term in social media or on the news or in a blog so here’s the definition and there are many definitions this is just one machine learning is the field of study which gives computers the capability to learn without being explicitly programed.
So let’s translate this into some brass tacks kind of ideas so first up as the data analyst you use computer code written by others in the definition where it says appear explicitly programed this is one aspect of it you’re not actually going to write, the machine learning code itself that’s already been written by other people and what you do is you take the code written by others and provide the code historical data your business data? That’s what you do now this code that has been written by others uses the data to quote unquote learn patterns to learn patterns in the data what the code does is it looks at the data and it executes some logic and it does some magic and then eventually what happens is the code returns back.
The learned patterns in the data in a bundle in a reusable package and that’s what’s known as a model a machine learning model and you can use these models to create predictions now you will have to write some code because you need to write our code to pull in your historical data you’ll need to write our code to wrangle massage transform that data into a format that’s usable by the machine learning code that was written by other people you then also have to stitch together all of these processes of pulling in the data massaging the data transforming the data and then linking it up to the code written by others so you will write code but you’re not actually explicitly writing the machine learning code you’re not explicitly writing code for the model and that is the magic of machine learning.
The cool things that it provides you in terms of insights into the business it’s also this idea of you can get a lot of power you can stand on the shoulders of giants without writing a bunch of code. Yourself and this will become far more clear as you progress throughout the course as I mentioned previously when you use machine learning to analyze your business data you are more than a data analyst you are also a teacher so from time to time throughout the course, we’re going to use a classroom analogy like we’re depicting here what you see here is little jimmy he’s with his friends in class.
The teacher is in the foreground and jimmy has guessed from a picture book incorrectly that an animal is an elephant when in fact it’s an armadillo this is the kind of background the kind of metaphor the kind of analogy that I want you to keep in the back of your mind throughout this course you are a teacher and the computer as we’ll see in a second is the student if you go to the Wikipedia page for machine learning you will find that machine learning is kind of an umbrella term that encompasses many areas of study machine learning, or also known as statistical learning in the statistics community is wildly useful stuff so not surprisingly.
It is an area of active academic research as well as industry research so a lot is going on in machine learning generally speaking however the focus of this course will be one aspect of machine learning which is known as supervised learning and once again the easiest way to think about it is with this classroom metaphor so in this metaphor you are the supervisor you are the teacher and as I have mentioned previously little jimmy. Here he’s the machine that’s the way you can think about it you are the teacher you are the supervisor you are managing the process and the student the thing that learns in this analogy in this metaphor is a little jimmy.
The machine so that’s it at a very high level when you break it down there are three constituent elements to this overall supervised learning process and I can show this as a Venn diagram. Here first up you have data is by far and away the single most important thing data, is the raw material of machine learning it doesn’t matter how awesome, your algorithm is it doesn’t matter how awesome your trading regimen in it doesn’t matter. How much you really want to produce a useful machine learning model?
If you don’t have the data it doesn’t matter data trumps everything it is the most important thing now you have algorithms we’re going to go into a little bit more detail. What algorithms are if you don’t know and we’ll talk briefly about training and training is definitely a cornerstone of a later section of the course on how you make awesome decision tree models. We’ll focus a lot on training now the intersection of all three of these things data plus algorithm plus training regimen produces a model and as I said before a model is essentially just a pre-bundled package of goodness.
That you can use to create predictions and the predictions are based on the intersection of the data that you fed the algorithm and the way you train the algorithm that produces your model data is the easiest thing to think about in all of this it’s just a data frame so here’s an example data frame for one of the data sets that we will be using in the lectures of this course and it just shows you some rows of data and some columns and this is all the data really is later on in the course we will talk a little bit about how you need to massage and transform and engineer the data to be as useful as possible to machine learning algorithms but for the time being.
This intuition is all you need so after data the next important concept that we need to discuss is an algorithm so what is an algorithm, is simply a well-defined procedure or formula that takes in input and produces output now that’s kind of abstract so let’s try and work a little bit more intuition into this, you can think of an algorithm like a detailed recipe that the computer follows in order to perform a task so for example if you’re like me and you’re not a very good cook. you have a cookbook and you open it up to a particular recipe and it tells you all the things that you need and it tells you exactly what to do that’s what an algorithm is to a computer it’s a set of instructions that it follows now not surprisingly as i said on the last slide regarding machine learning.
Being wildly useful and well researched there are many many algorithms in use today some of them are machine learning some of them, are not however algorithms are everywhere they are essentially the cornerstone of so much that we take for granted in the modern economy so for example not surprisingly facebook has algorithms, they have algorithms to do all kinds of things like which posts they should show you in your Facebook feed amazon has got algorithms lots and lots of algorithms as you might imagine is the world’s largest online.
Retailer by far they have tons of algorithms to help you find items that you might like while you’re shopping because their catalog is just so big you’re not going to necessarily find everything you might want or things that you don’t even know you want that you might like so amazon algorithms help you out with that automatically when it comes to this course here are the key points to remember one there are many machine learning algorithms in this course we’re going to study two algorithms we’re going to study individual decision trees and an algorithm known as the random forest but there are many more machine learning algorithms than this however these two algorithms are the best place in my opinion for any professional to start.
If you want to learn other algorithms after this course awesome I would actively encourage you to do that as I’ve already mentioned as the data analyst you use these algorithms these machine learning algorithms with your historical data the combination of algorithm data and training regimen produces the model once again and creating useful models is the focus of this course next up.