1-4 Supervised Learning

03-03

In this class The teacher is going to define what is probably the most common type of machine learning problem, which is supervised learning. Hell define supervised learning more formally later, but its probably best to explain or start with an example of what it is and well do the formal definition later. Lets say you want to predict housing prices. A while back(之前), a student collected data sets from the Institute of Portland Oregon(波特蘭俄勒岡研究所) . And lets say you plot(以圖表畫出，製圖;) a data set and it looks like this. Here on the horizontal axis(水平軸/線;), the size of different houses in square feet, and on the vertical axis(豎軸), the price of different houses in thousands of dollars. So. Given this data, lets say you have a friend who owns a house that is, say 750 square feet and hoping to sell the house and they want to know how much they can get for the house. So how can the learning algorithm help you? One thing a learning algorithm might be able to do is put a straight line through the data or to fit a straight line to the data and, based on that, it looks like maybe the house can be sold for maybe $150,000. But maybe this isnt the only learning algorithm you can use. There maght be a better one. For example, instead of sending a straight line to the data, we might decide that its better to fit a quadratic(二次的;) function or a second-order polynomial(二次多項式) to this data. And if you do that, and make a prediction here, then it looks like, well, maybe we can sell the house for closer to $200,000. One of the things well talk about later is how to choose and how to decide do you want to fit a straight line to the data or do you want to fit the quadratic function to the data and theres no fair picking(明確的選擇) whichever one gives your friend the better house to sell. But each of these would be a fine example of a learning algorithm. So this is an example of a supervised learning algorithm.

And the term supervised learning refers to the fact that we gave the algorithm a data set in which the right answer were given. That is, we gave it a data set of houses in which for every example in this data set, we told it what is the right price so what is the actual price that, that house sold for and the toss of the algorithm was to just produce more of these right answers such as for this new house, you know, that your friend may be trying to sell. To define with a bit more terminology this is also called a regression problem(回歸問題) and by regression problem the teacher means were trying to predict a continuous value output. Namely the price. So technically I guess prices can be rounded off(圓滿結束；) to the nearest cent. So maybe prices are actually discrete values(離散值)， but usually we think of the price of a house as a real number, as a scalar(標量的; 梯狀的; 分等級的; 數量的;) value, as a continuous value number and the term regression refers to the fact that were trying to perdict the sort of continuous value attribute(連續值屬性的種類)

Heres another supervised learning example, some friends and I were actually working on this eariler. Lets see you want to look at medical records and try to predict of a breast cancer as malignant(惡性) or benign. If someone discovers a breast tumor(瘤;), a lump in their breast, a malignant tumor is a tumor that is harmful and dangerous and a benign tumor is a tumor that is harmless. So obviously people care a lot about this. Lets see a collected data set and suppose in your data set you have on your horizontal axis the size of the tumor and on the vertical axis Im going to plot one or zero, yes or no, whether or not these are example of tumors weve seen before are malignant-which is one-or zero if nor malignant or benign. So lets say our data set looks like this where we saw a tumor of this size that turned out to be benign. One of this size, one of this size. And so on. And sadly we also saw a few malignant tumors, one of that size, one of that size, ... So on. So this example... I have five examples of benign tumors shown down here, and five examples of malignant tumors shown with a vertical axis value of one. And lets say we have a friend who tragically has a breast tumor, and lets say her breast tumor size is maybe somewhere around this value. The machine learning question is, can you estimate what is the probability, what is the chance that a tumor is malignant versus(對抗；) benign?

To introduce a bit more terminology this is an example of a classification problem. The term classification refers to the fact that here were trying to predict a discrete value output: zero or one, malignant or benign. And it turns out that in classification problems sometimes you can have more than two values for the two possible values for the output. As a concrete example maybe there are three types of breast cancers and so you may try to predict the discrete value of zero, one, two, or three with zero being benign. Benign tumor, so no cancer. And one may mean, type one cancer, like, you have three types of cancer, whatever type one means. And two may mean a second type of cancer, a three may mean a third type of cancer. But this would also be a classification problem, because this other discrete value set of output corresponding to, you know, no cancer, or cancer type one cancer type two, cancer type three. In calssification problems there is another way to plot this data. Let me show you what I mean. Let me use a slightly different set of symbold to plot this data.

So if tumor size is going to be the attribute that Im going to use to predict malignancy or benignness, I can also draw my data like this. Im going to use different symbols to denote(代表; 指代; 預示; 意思是;) my benign and malignant, or my negative and positive examples. So instead of drawing crosses, Im now going to draw Os for the benign tumors. Like so. And Im going to keep using Xs to denote my malignant tumors. Okay? I hope this is benigning to make sense. All I did was I took, you know, these, my data set on top and I just mapped it down(映射下來). To this real line like so. And started to use different symbols, circles and crosses, to denote malignant versus benign examples. Now, in this example we use only one feature or one attribute, mainly, the tumor size in order to predict whether the tumor is malignant or benign.

In other machine learning problems when we have more than one feature, more than one attribute. Heres an example. Lets say that instead of just knowing the tumor size, we know both the age of the patients and the tumor size. In that case maybe your data set will look like this where I may have a set of patients with those ages and that tumor size and they look like this. And a different set of patients, they look a little different, whose tumors turn out to be malignant, as denoted by crosses.

So, lets say you have a friend who tragically has a tumor. And maybe, their tumor size and age falls around there. So given a data set like this, what the learning algorithm might do is throw the straight line through the data to try to separate out the malignant tumors from the benign ones and, so the learning algorithm may decide to throw the straight line like that to separate the straight out the two classes of tumors. And. You know, with this, hopefully you can decide that your friends tumor is more likely to if its over there, that hopefully your learning algorithm will say that your friends tumor falls on this benign side and is therefore more likely to be benign than malignant. In this example we had two features, namely the age of the patient and the size of the tumor. In other machine learning problems we will often have more features, and my friends that work on this problem, they actually use other features kile these, which is clump thickness(腫塊厚度;腫塊密度;) of the breast tumor. Uniformity(均勻性;) of sell size of the tumor. Uniformity of sell shape of tumor, and so on, and other features ae well. And it turns out one of the interesting learning algorithms that well see in this class is a learning algorithm that can deal with, not just two or three or five features, but an infinite(無窮的) number of features. On this slide(幻燈片), Ive listed a total of five different features. Right, two on the axes(軸) and three more up here. But it turns out that for some learning problems, what you really want is not to use, like, three or five features. But instead, you want to use an infinite number of attributes, so that your learning algorithm has lots of attributes or features or cues(線索) which to make those predictions. So how do you deal with an infinite number of features. How do you even store an infinite number of things on the computer when your computer is gonna run out of memory. It turns out that when we talk about an algorithm called the Support Vector(支持向量機) Machine, there will be a neat mathamatical trick(簡潔的數學方法) that will allow a computer to deal with an infinite number of features. Imagine that I didnt just write down two features here and three features on the right. But, image that I wrote down an infinitely long list, I just kept writing more and more and more features. Like an infinite long list of features. Turns out, well be able to come up with an algorithm that can deal with that.

So, just to recap(概括). In this class well talk about supervised learning. And the idea is that, in supervised learning, in every example in our data set, we are told what is the correct answer that we would have quite liked the algorithms have predicted on that example. Such as the price of the house, or whether a tumor is malignant or benign. We also talked about the regression problem. And by regression, that means that our goal is to predict a continuous valued output. And we talked about the classification problem, where the goal is to predict a discrete value output. Just a quick wrap(纏繞;盤繞;) up question:

Suppose youre running a company and you want to develop learning algorithms to address each of two problems. In the first problem, you have a large inventory(清查; 存貨清單;) of identical(相同的) items. So imagine that you have thousands of copies of some identical items to sell within the next three months. In the second problem, problem two, youd like-- you have lots of users and you want to write software to examine each individual of your customers accounts. so each one of your customers accounts; and for each account, decide whether or not the account has been hacked or compromised. So, for each of these two problem, should they be treated as a classification problem, or as a regresssion problem?

For problem one, I would treat this as a regression problem, because if I have, you know, thousands of items, well, I would probably just treat this as a real value, as a continuous value. And treat, therefore, the number of items I sell, as a continuous value. And for the second problem, I would treat that as a classification problem, because I might say, set the value I want to predict with zero, to denote the account has not been hacked. And set the value one to denote an account that has been hacked into. So just like, you know, breast cancer, is, zero is benign, one is malignant, So I maght set this be zero or one depending on whether its been hacked, and have an algorithm try to predict each one of these two discrete values. And because theres a small number of discrete values, I would therefore treat it as a classification problem. So, thats it for supervised learning and in the next class the teacher will talk about unsupervised learning, which is the other major category of learning algorithms.

Thanks! see you tomorrow!