Home
This site is intended for healthcare professionals
Advertisement

Lecture 5.1. Basic Statistics and Critical Appraisal

Share
Advertisement
Advertisement
 
 
 

Summary

This on-demand teaching session is relevant to medical professionals, and covers the fundamentals of basic statistics and curricula appraisal. Specifically, it looks at the different types of variables, defining independent and dependent variables, understanding the P-value, and the importance of sample size. This lecture is led by a medical student from the University of Manchester, and is designed to be easy to understand and follow for all medical professionals, regardless of experience.

Generated by MedBot

Description

Week 5: ‘Basic Statistics and Critical Appraisal’ Part 1by Sayan Biswas, 4th Year Medical Student

Feedback and certificates:

  • As part of this course, we want to continuously evaluate its success by receiving feedback from our audience.

Pre-Lecture Questionnaire: https://forms.gle/NQ4cFeDWLErinvg9A

Post-Lecture Questionnaire: https://forms.gle/qe3wkN3fEptZCTyQ8

  • To receive a Course Offical Walter E Dandy Completion Certificate, you MUST complete all Pre- and Post-Lecture Forms (link in the description of each lecture)

Learning objectives

Learning Objectives:

  1. Describe the differences between continuous, categorical, ordinal, and nominal variables and how to identify them.
  2. Explain the concept of independent and dependent variables and the use of a Cartesian plane to represent them.
  3. Explain the meaning of a p-value and how to interpret a p-value less than 0.5
  4. Explain how a hypothesis influences a research study and how it contributes to the p-value calculation.
  5. Explain the import of sample size during a research study, and the balance between independent and dependent variables.
Generated by MedBot

Related content

Similar communities

View all

Similar events and on demand videos

Advertisement
 
 
 
                
                

Computer generated transcript

Warning!
The following transcript was generated automatically from the content and has not been checked or corrected manually.

Hi, everyone. My name is side. I'm 1/4 year medical student at the University of Manchester and today I'll be talking to you all about basic statistics and curricula appraisal. And I want to give a big thanks to the Walter Dandy Society for allowing me to give this lecture. So in terms of disclosures, the only one I have is that I'm one of the members of the AI Wing in neurosurgery at Salford Royal Hospital. Um, and I have no other pertinent disclosures. So, um, what will we be covering today? It's quite a big topic in the statistics, and it's pretty daunting. Um, and I've tried to break it down into seven fundamentals that I think are key for every medical student. So we're talking about the different types of variables. What does the P value mean? You know, it gets thrown around, um, what does mean median mode and cohort demographics, which are very important. And then some of the common tests that you're used in papers in research and how you can analyze them particularly so the first step for any statistical analysis for any paper, any publication, Any research you perform is the identification and stratification of the variables. This is very important because it then further determines what tests that you will use down the line. And there are two important variable types. They are either continuous numeric or quantitative they used anonymously or they are categorical or qualitative variables. So, um, in this talk, I'll be referring to them as continuous variables and categorical variables. Continuous variables. As the name suggests, um, is anything that's numerical, So height, weight age bm i your most of your blood tests such as hemoglobin and everything that all continuous variables numerical. On the other hand, you can also have something called a discrete continuous. The best example of this is your G. C s score. It's from 3 to 15, but it's not a categorical variable. It's a discrete continuous, so it's on a continuous scale. Um, but you don't have a G. C s 2.5, and so that's why it's discreet, continuous, and then you've got three types of categorical variables. Um, the first one is orginal or ordered, which means, um, the previous, um, rank determines the next rank. What I mean by that is a pain scale from 1 to 10, that is, or Donal, So a two is higher than a one. So there is a relationship between the variables, and it's ordered in the ascending or descending order. Another one is tumor node and met staging. So a T one is worse is better than a T to better than a three t three, and so on. Then you've got nominal variables, which are a Nordic. Oh, there isn't an order to it. Um, the most common example of this is anything with text. So names of employees, their birthdays, their email IDs if you have them in a column. They are nominal variables, which means the birth date of one person does not affect the birth date of others. And then lastly, you've got binary or dichotomous categorical variables. What that means is that that variable has only two classes. For example, sex, male female smoking status, yes. No, um, and so on and so forth. And a lot of the times you'll see binary and dichotomous use anonymously that mean the same thing. And so those are the two most important variable types. And whenever you're analyzing research, you're about to start research. You need to have a clearer picture of what your continuous variables are and what your categorical variables are. Because, um, you cannot interchangeably use a categorical variable as a continuous variable or a continuous variable as a categorical variable. So now that we have that in mind, what are independent and what are dependent variables? So in a normal Cartesian plane X and y axis, Um, as you can see on the image on the right, that is what a dependent, independent variable are located. And so what would they mean? So if you think about a research study, let us look at trying to see whether patient factors determine whether or not the patient's going to have lung cancer. And those patient factors or patient characteristics are your feature variables or your predictor variables. They could include age, ethnicity, sex, smoking status, previous history of cancer, secondhand smoking and so on and so forth. And you can have multiple. And the reason they're called independent variables because each of those variables are independent of each other, and they go on the X axis, as you can see, and then the dependent variable in this case would be whether or not the patient has lung cancer, so that's a binary outcome. Variable and that outcome variable. We're trying to see if it is dependent on all of those predictor or patient features, and that is why it's called the Dependent Variable, which goes on the Y axis. And this. If you want to go even more mathematical, um, in terms of defining functions you have y equals ffx, which is why is a function of the variable X, which is why, why it depends on X and which is why X is the independent variable and why is the outcome variable? And you can have multiple different predictive valuables, Or you could just have one, and I'll move on to what that means in a bit. And so how I remember it is by the pneumonic dry mix Um, D N y, which is dependent on the Y axis, and I exit just independent on the X axis. And this is very important because whenever you're trying to make Rezum do research or trying to analyze research papers, you have to know what variables are your independent or your predicted variables, and what variable is your outcome? Variable. Because if your outcome. Variable, let's say is continuous. Then you can't use some of the tests. If you're outcome, variable is categorical, then you have to use a specific number of tests. Similarly, you cannot use for independent variables. You cannot use a continuous, um variable analysis test on a categorical variable. And that's also more important to know how many independent variables you have, because the more variables you include in your analysis, the larger your sample size has to be, and I'll touch upon that later on. Um, and so it's in. Research is a fine balance of knowing how many independent variables there are and how many dependent variables there are because, for example, you could not not not only look at whether or not a patient has lung cancer, you could also look at whether or not they died from the lung cancer, so you could have multiple different outcomes. Um, and an outcome can be categorical and continuous. So a continuous outcome, for example, could be how much money did the hospitalization of a lung cancer patient cost? And so that's going to be a continuous 10,000 lbs 5000 lbs, and that will change based on the patient characteristics. And so that is how we must You must start. Any research is by identifying the dependent, the outcome variable and the independent or predicted variable. Next, um, is the P value What is the P value? Um, we get this number thrown around of less than 0.5. It's the boon and bane of medical research, and so let me explain what P value is. You have a nice idea of what it means and not to be disheartened in cases where your research or the research you're reading does not follow that rule. So before we start any research that you have needs to have a hypothesis, you can have two types of hypothesis. A null hypothesis or the alternate hypothesis. So, for example, sticking with our lung cancer theme, um, let's say that the older you are so the higher you are in age, the more likely you are to have lung cancer. That is a hypothesis that I am making before the start of my research. So that becomes the alternate hypothesis. That's saying that given a certain variable, I will have a certain outcome. The null hypothesis states that age has no effect on whether or not a patient has lung cancer. So luckily, null hypothesis has the word null in it. So we know it means that there is no correlation, causation or relationship between your predictor and your output. So that's what the null hypothesis means, and they are often represented as H zero and H one. So what, then, is the P value? The P value is the probability of obtaining your results under the assumption that the null hypothesis is correct. So let's say you do your research. You have a large sample size and you see that the that age that all the age predicts your likelihood of having cancer and the P value is less than 0.5 now. A P value, as you can see in this graph, is a probability. So a P value can range from 0 to 1, because that is how much your probability can go from 0% to 100%. And what the P value states is the likelihood that what you've observed is true, given that there is no relationship between the two variables. So in our in this hypothetical study would be that even though we have seen that the older you are, the more likely you are to have lung cancer, The P value states. If it's 0.5, then there is a 5% probability that what you've observed is because of chance. And it is. In fact, the null hypothesis is true. Um, the null hypothesis being that age does not affect your likelihood of having cancer. And so, as you can see, the P value because it's, um, the set P value in medical research is 0.5 or 5%. The reason that's the cut off that's used is because it represents a set of very unlikely observations. You can go lower with your P value 0.10 point 001. And that depends on the type of research you're doing and your sample size and where, and the and the journal and the pay the and the type of paper you're trying to publish. And so that's where it's a good thing because it tells you that the likelihood is quite low to for it to be the case that what you've seen is because of chance and that the null hypothesis is the truth and that you are and that your hypothesis, your alternate hypothesis is wrong. And so it's very important to know to differentiate this to what is the colloquial use in medical research in medicine is used carelessly because what a P value blessed and 0.5, it's considered to reject the null hypothesis. And so that is where the difference is. 0.5 simply tells you that you have a 5% probability that the results you've seen is because of chance and that, in fact, the null hypothesis is correct. It's not saying you can reject that. The null hypothesis does not exist, and it's not true. But that is how it's used in medicine. And the 5% of 0.5 was randomly and arbitrarily decided upon using a consensus so that then raises the question. What is the difference in the P value of 0.49 and 0.51, because that then becomes a probability of 4.9% or 5.1%? Does that mean you can't reject, uh, the null hypothesis because it's 5.1% and not 5.0 where in actuality, what it represents is the likelihood that the results you've observed are true under the assumption that the null hypothesis is true is 5.1%. And so this is where it's the bone of research because we aim for that less than 0.5. And we can conclude that that we can reject the null hypothesis. That isn't the case, um, because the null hypothesis still has a 5% probability of being true. Um, but that is how it's been used in medicine and slowly but surely, a lot of journals and a lot of peer reviewing is moving to a smaller P value because the smaller U P value, the less likely that what you've seen, what you've observed is by chance and the case that the null hypothesis is true. And so we'll never be able to truly reject the null hypothesis because having a probability of zero is, um, an impossibility in medicine. But you can have a very, very small number, and the smaller and smaller the P value goes, the more likely it's the case that what you've observed, is because your hypothesis is correct. And, um, I insist everyone have a read about this because this is what's called the transposed conditional fallacy that talks about, um, What you observed because of the hypothesis is not equal to the hypothesis, given what you've observed, um, and so it's very important to know what the P value stands for and not to be disheartened if you do research where your P value is not less than 0.5 is because simply arbitrarily said, and it could be because of the multiple different reasons. And that does not invalidate your research because a higher P value sometimes 0.1 and a lot of things will become statistically significant. And so, currently, as it is in medicine, aim, the aim for statistical significance is a P value less than 0.5. It's often called an alpha value of less than 0.5. They mean the same thing. Um and hopefully are they have been able to give you an idea of what the P value represents. So now I'll talk to you about some measures of central tendency. Um, what do I mean, by this before I move on to all these different types, I want to talk to you about a normal distribution. As you can see on the top right image here, the one in the middle is a symmetrical, normal distribution. Any distribution that is normal is by definition symmetrical. What that means is, if you draw a line from from in the middle, the left and right hemispheres are exact and a consequence of a normal distribution is that your measures of central tendency are at the same exact point. So what do I mean by this? You you have three measures of central tendency. Your mean median in mode. What is the mean? The mean is simply an average. It's the arithmetic mean you can have different other types of mean, like a pool, the mean But the one that's used in medicine is the arithmetic. Is the arithmetic mean where you take all your you add up all your observations and then divided by your total number of observations? The problem with mean is that it can be skewed by extreme values. So if you've got everything that goes in sequence 12345678, and then your last value is 100. That then shifts the entire mean, and so it is very, very impacted by outlying data points. And this is where you get to the question of Is it a normal distribution? Or is it a non normal distribution? Because a normal distribution, as you can see, is symmetrical on either ends. And so you mean is straight in the middle, and it is not affected by extreme values because it's a symmetrical distribution, you cannot have a unilateral extreme value. Unlike these graphs on the left and right, they are non normal distribution and that are negatively and positively skewed. So what do I mean by this? A negative skew is also called a left skew, so this peak is moving towards the left, and a positive skew is also called a right skew. So it is moving towards the right. Um, and it might be a bit difficult, but think of it like an MRI image where anything on the on this side of your screen is in fact the left. So it's a left skew if it's negatively skewed or write you or positively skewed. Now the names don't actually matter so much when you're talking about when you're just talking about in theory. But in practice, let's say you're talking about age. If a if you see your distribution is negatively skewed, then you have an older population. Because if this is zero and this is 80 your peak is around 80. If it's positively skewed, then you have, uh, an age, Um, that's most common in the younger populations. And so, in terms of theory, doesn't matter by in terms of practicality. It's very important to know if it's negatively or positively skewed, because as soon as you know that there are two things you can say, Number one, it's non normal. So it's not a normal distribution, and then number two, you can have an idea of the peaks or the trends in the data. So if it's negatively skewed and it's an older population and it's some higher number, if it's positively skewed, then it's a lower number. And in this case is, as you can see, the mean can be affected by extreme values. And that is where your second measure of central tendency comes in, which is your median. The median is the middle value of the data set that splits the data set in half or what represents the 50th percentile. It is very useful for non normal distributions, and it is not affected by extreme values because it's not asking for some of all your values. If you have 10 values, your medium will still be five your fifth value. And it doesn't matter what your 1st and 10 values are. There could be 100 but the median looks at the number rather than the value of each of those things is compared by the mean that looks at the some of every single number. And so, which is why it can be skewed by, um, extreme values. And as you can see in a normal distribution, your medium and your mean are the same. And so in a positively and negatively skewed As you can see, the median is not the same as the mean and which is why, in non normal distributions, the standard practice is reporting both the mean and the median. That's important is because it will give you a sense of the actual, um, data. So let's say you do your analysis and you see the median is higher than the mean that you know it's negatively skewed. But if your median is less than the mean, then you know it's positively skewed. And so then that is how it works. Um, in an ideal situation, um, you'd want the mean to be as close to the median, because then you know it's very close to a normal distribution. And then there are certain assumptions that you can, um, continue with your data mode, not use that often. It's just the most common value or what represents the peak. Because the peak, in essence, is because the Y axis is frequency. Um, the one that's represented the most or the most common is your mode. Next, anytime you hear me and you hear the word standard deviation, Okay, all standard deviation is, um it shows the deviation of your data from the mean. Okay, um, and it is calculated through variance variance. Um represents the entire spread of your data, and the square root of variants is your standard deviation. So they often represent, um, the same thing. But in medical research, standard deviation is used. You will also come across this thing called the standard error of the mean or S e m. Think of it as the standard deviation of the mean. And so if you were to do this research 100 different times, how much would you mean? Deviate? That's your standard. Air it. I mean, next, what is I Q R I Q r stands for inter quartile range. Now, why is that important? When you think of mean, always think of it in conjunction to standard deviation. Okay, because that shows you how the data is deviated from the mean. Okay, I Q R is used when you're reporting medium because median represents your 50th percentile. And think of I Q R as a standard deviation of the median because what I QR represents is your 25th percentile and you're 75th percentile. And so you have your 50th percentile, which is your medium. And then underneath that you've got 25 then above that, you've got 75 so that is what the inter quartile range represents. It represents the range of values that encompass 50% of the distribution i e. Between the 25th and the 50th percentile, and then at the center. Of that, 50% of distribution should be your median. And so when you report median reported with intercoastal range, when you report mean reported with standard deviation. Next, what is the 95% confidence in turtle now? This is not a measure of central tendency, but the reason I have included it here is because of this bottom right graph. So this graph is one of one of our papers. As you can see just by looking at it, this graph is telling us the distribution of the accuracy of a model. That's all it's saying. And as you can see, it's not a normal distribution. It is negatively skewed, very similar to this graph here. But unlike this graph, I am not looking for mean medium mode. I'm here looking for the 95% confidence in turbo. So what does that represent? It's a marker of precision. How precise is your observation? Over here we are models shorter and accuracy of 0.962 and we see are 95% conference in terrible ranges from 950.9 all the way to one. That is very good, because it shows us that our model is very precise, and accuracy of 90% is not often achieved by clinicians. And what it represents is the range that will contain the true observation 95% of the time. So what I mean by that is, if you were now to use this model and test it on different population from around the world 100 times that accuracy of those form those 100 tests the accuracy of 95% of those tests will fall between 0.9 and one. And so that is what you're 95% convents in terrible represents. It is a marker of precision that shows if you were to do this test or or or evaluate what you're doing in 100 different samples or 100 different times 95% of those times the true value will fall between these two numbers. And so the closer your confidence in turbos are, the more accurate your observation and the wider your conference in turbo, the less precise and the less accurate your observation. And so not only do we and there isn't a cut off for 95% contents intervals, but the closer it is as you can see over here, if it's 950.96, it's plus minus 0.6 point +006 in either direction. And so that's very good, because it's a small conference in turbo that gives us quite a good idea of the precision of our test. If let's say, however, this 95 conference in Turnbull was from 0.5 to 1, that's covering half your probability range, and so that would not be a very precise observation, because what it's saying is that from any number from 0.52, it could be any number from 0.5 to 1. If you, if you were to try it in 100 different times, 95% of them would fall on fall within that, and then the other 5% of the time would fall somewhere out of that. And so that is why it's very important. Um, I won't go into details of how it's calculated, but it's calculated using, um, the standard, the mean and stuff like that. And so this is the crux of cohort demographics and analyzing and pre processing your data to understand what your data represents. No, you cannot do any of this on categorical data. So what do I mean by this? There is no mean for your categorical data. There is no median for your categorical data. Um, you could have mode. Uh, but they have that that would have to be multi class or or journal or nominal. It cannot be binary because binary is only two classes to zero in one. Um, and you can't have a standard deviation. You can't have an inter quartile range, so you would use these techniques to analyse continuous data age bm i Your blood tests um, the height of the patient. Those are all continuous data points and you would try and see whether or not they are normally or non normally distributed. In terms of categorical data, you would do frequency analysis. So in terms of percentage, so 10% of our cohort were male or 50% of our cohort smoked. So that is how you would do frequency for categorical and you would do, um, these measures of central tendency for your continuous variables and there are multiple reasons why you would do them. The first thing is to check whether or not your continuous variables are normally distributed and number two to perform cohort demographics. So when you report your results, your table one is often always cohort. Demographics is what was the mean age for this group. What was the median age for this group? And if you see that your continuous variable is non normal, then you would report the median into quite a range. So next. So that's your variable analysis. So you're trying to first identify what types of variables you have, categorical continues. Next, you want to see which variables are your independent or predicted variables and which variables are your category? Are your outcome variables or you're dependent variables. Once you have those two things done, then you move on to the analysis of your variables. So if it's a continuous variable, do mean medium mode, Um, see whether or not it's normally or non normally distribute. Then you can perform the same analysis for categorical variables, but in that case, you would just look at frequency. So that is a big chunk, um, of your method section done already, and I will now give you an idea of what else to move on with. So from now on from Correlation Analysis onward, I will now be talking about the different statistical tests you can use. You could theoretically not need to memorize any of this and just referred to this lecture again and again. And through practice it becomes muscle memory. I wouldn't recommend that because it's already time consuming, but it's good to know what these tests mean and what they do, because you have to know what the type of variables are, what to do for continuous variables, how to make your demographics. But it's even more important to know what to do once you have the data. Once you have cleaned everything, once you have all your variables, you know what everything is. You've done your cohort demographics. You've done mean medium mode. How do you analyze something to test your hypothesis? So now this is where you're getting into the realms of hypothesis testing. So going back to the hypothetical paper that I talked about if we want to see if a judge, um, is a predictor of whether or not a patient will have lung cancer, age is continuous. Whether or not they'll have lung cancer is binary, right? So zero or one. So no or yes, they won't or they will. And so now, in terms of correlation analysis, it's in the name itself. It simply tells you how to variables are correlated does not tell you anything about causality. In medical research, it is very hard to show causality, especially in most of the research that you will be doing as a medical student, which is retrospective. So the correlation analysis the ones I'll be talking to you about our tests to determine any association between two variables and it's called by variable or by various analysis can be used anonymously. The first one is Pearson Correlation on the most common one that you lose. It's represented by figure one. Um, when I started off, I still remember in third year. Um, when I was doing my a pap, I ran Pearson Correlation on every single variable that I had. I had a judge. I had the sex of the patient, the ethnicity, um, and all of that and I ran Pearson on everything just to see if any two variables were correlated and I did found something. I did find some interesting results, but they were all wrong and I'll tell you why. Just because it says Pearson, Correlation Just because any software you use will perform the test does not mean it's right. You can only perform Pearson correlation correctly under once these two assumptions are met, the first assumption is that both continuous variables are being analyzed so you cannot use Pearson correlation to, Let's say, see if there's a correlation between the age of the patient and the sex of the patient. You cannot do that because the sex of the patient is categorical. It's a binary variable. What you can do is see if there's any correlation in the age of the patient and the weight of the patient, because they're both continuous and the second assumption and this might you might not understand this immediately is that the both variables are linearly. Correlated does. It's not suitable for multidimensional data with non linear relationships. For example, if it's quadratic or exponential, what do I mean by this? If you see figure one on the X on the X and the Y axis, I have got continuous variables. The reason why you you can tell this continues because look at these dots. They are between the numbers, and so they are continuous because you can have a half of the number. And which is why you can perform Pearson on I've Met Assumption one. The second assumption I've I'm seeing here is also met, and the reason is because of this blue line and the distribution of the dots. You can see that this blue line quite well approximates the distribution of the numbers of the dots on the graph, which, which you can then say, is that, oh, these two variables are linearly correlated. So if one variable increases the other variable increases and that is what you see, it's an upward linear slope. Additionally, this gray boundary represents the 95% confidence in turbo. And so now everything joins up because the Pearson correlation will give, you will report to things. It'll report the P value to see if it's less than 0.5, and if it is statistically significant and the second is your r value, the higher your R value the better, because it's an indication of how linearly your two variables are correlated, and that then helps you calculate your 95% conference in Turbo And so the tighter the 95% conference in turbo, the more likely you are to have met Assumption, too, Which means even if we do it through this 100 different times, it would still be linearly correlated. And as you can see, this is where it would fall within. And so that's Pearson. Correlation it. You have to be comparing to continuous variables. Now. How do you know if a variable is linearly correlated beforehand? You don't. And so one of the things you do, even before Pearson correlation is you just do a scatter plot. You do a scatter plot of your continuous variables. It will then tell you which variables are, and you can see the distribution of the scatter plot if it's exponential of its quadratic or if it's linear. And once you think it's linear, you can run Pearson correlation, even if you don't think it's linear. If it's quadratic, you can still run the Pearson correlation. However, your results will not be what you expect. Your peers and correlation might show no correlation at all, because the linear line does not approximate the distribution of your data. Now that does not say that there isn't a correlation because you can see that there is an exponential one. So that is when you then move on to an exponential correlation test. Um, and so that's when you want to compare to continuous variables if you want to see their correlated. Now, if you wanna, let's say, compare a continuous variable to a categorical variable to see if it's correlated. As I said, there are three types of categorical variable, orginal, nominal binary. You can then perform something called the Spearman's Correlation Test or the Spearman's rank test and similar to the Peers Inc You have to meet to assumption. The first assumption is that it is Ordina Luc categorical worse as a continuous that you're comparing so you cannot compare using this test age and sex because sex is binary. What you can compare, let's say, is the pain scale. Let's say you're comparing age to a pain scale from 1 to 10. That 1 to 10 pain scale is order nulle, Um, and so you you can do Spearman's correlation analysis In figure two. I have compared a ordinary variable on the Y axis and and continuous variable on the X axis And so how do you know that the the Y variable is ordinary? Because if you look at the distribution of all of these numbers, they are only at these different 20.123. They're not in between. But now, if you look at the X axis, the way you know it's continuous is that these two points iron between these two numbers, so you can have, uh, a point in between. So that's continuous. Very good. Um, the second assumption and this is where the Spearman's is a is a bit better than Pearson is that it assumes a nonlinear relationship. It says that your relationship is non monotone. Ick, um, and so it's very, very well suited for order. Nulle continue order categorical versus continuous, and you can also use it for data sets that are not normally distributed. The Pearson correlation works best for normal distribution. It works best for that. You can still do it for non normal, Um, but the Pearson is aimed for non normal or non monitor tonic data, and so you can perform ordinary versus continues. And so this is how I approach correlation analysis. The first thing you want to do is see, there's any association at all, and that's done through correlation analysis. And so, uh, I'm just I want to reiterate this again. You identify your variables continuous, categorical, independent dependent. You do your cohort demographics or your mean medium mode sanity, aviation and your frequency. Then you move on to correlation analysis. If you have to. Continuous variables, use Pearson. If you've got a continuous and a categorical orginal variable, use spearmint. Okay, you cannot use Um, so far, I have not told you what you can do if you have a binary, categorical and continuous. And I won't talk about that because that's a bit not basic. But if anyone wants to know, you do the point by Serial Pearson Correlation to compare a binary, categorical and continuous. So that's correlation analysis done, and I'll think I'll stop the first, um, part of the lecture here and then I'll continue. Um, the second part of the second video