Home
This site is intended for healthcare professionals
Advertisement

Lecture 5.2. Basic Statistics and Critical Appraisal

Share
Advertisement
Advertisement
 
 
 

Summary

This short on-demand teaching session aimed at medical professionals will discuss different types of variables, how to measure them, and the most common tests used in clinical research. The session will provide attendees with the knowledge to decide which tests to use for different types of variables, as well as how to form a hypothesis. It will cover tests such as the Shapiro Wilk Test, the KS Test, the Student's T Test, the Independent T Test, the Spearman's Correlation, and the Chi Square Test. Attendees will learn the difference between Parametric versus Non Parametric tests and get a better understanding of one-tailed versus two-tailed hypotheses.

Generated by MedBot

Description

Week 5: ‘Basic Statistics and Critical Appraisal’ Part 2 by Sayan Biswas, 4th Year Medical Student

Feedback and certificates:

  • As part of this course, we want to continuously evaluate its success by receiving feedback from our audience.

Pre-Lecture Questionnairehttps://forms.gle/NQ4cFeDWLErinvg9A

Post-Lecture Questionnaire: https://forms.gle/qe3wkN3fEptZCTyQ8

  • To receive a Course Offical Walter E Dandy Completion Certificate, you MUST complete all Pre- and Post-Lecture Forms (link in the description of each lecture)

Learning objectives

Learning Objectives:

  1. Demonstrate the ability to recognize when a data set is sufficiently normally distributed to use a parametric statistical test.

  2. Demonstrate an understanding of the difference between independent and paired t test as well as their non parametric equivalents.

  3. Understand the difference between a one-tailed and two-tailed hypothesis, and the difference between each in terms of their implications.

  4. Proficiency in use of the chi-squared, Fisher’s, and Kruskal-Wallis Tests for comparison of two categorical variables.

  5. Understand the different types and uses of one-way and repeated measures ANOVA.

Generated by MedBot

Related content

Similar communities

View all

Similar events and on demand videos

Advertisement
 
 
 
                
                

Computer generated transcript

Warning!
The following transcript was generated automatically from the content and has not been checked or corrected manually.

so Hi, everybody. Welcome to the second part, which is the continuation of my talk on basic stats and curricula. Appraisal. Now I'll talk to you about the common, the most common tests used in clinical research. And so far I've talked to you about different types of variables how to measure this Measures of central tendencies and, uh, touched upon correlation analysis. Now we're going to move a step ahead to look at comparative tests between the variables. Now, as I mentioned, it is very important to know whether or not a continuous variable is normally or non normally distributed. It is paramount because the distribution of your data will define the use of Parametric versus non parametric tests. So what are they? If a data set or a continuous variable is normally distributed, then you will use Parametric statistical tests. If it is non normally distributed, then you will use non Parametric tests, and I'll come to this table in a second. And so how do you test for normality? As I said, you could look at mean media in mode. If they're all the same or equally the same, then you can then do then a reasonable assumption would be the data demonstrates normal distribution. Um, and as a result, you could do histograms or density Plus to see where the peak is to see if it's negatively or positively skewed. That obviously has a bit of visual analysis to it. So if you just want quantitative answers, you can do two tests you can do. The Shapiro will test or the K s test. The Shapiro will test is normally done for smaller cohorts. I've written 50 here, but recent research has extrapolated to around 5000 as well, so you can use either of the tests. However, in general practice, I use all of them. So I, first of all, start by looking at the histogram and density plots looking at mean medium mode. Then looking at the skin ISS of the data and then doing the KS test in the computer will test. So it's the combination of all of them that provides the best, um, answer as to whether or not the data is normally distributed. And if from this lecture you can take away one thing, it would be, um, this table, this table is singlehandedly the single most important thing you need to know to perform good quality clinical research or something that at least meets the par. So from this table, what have we already talked about? We've already talked about this, which is the relationship within two continuous variables. They have to be on the continuous scale. Um, and if you want to do a Pearson correlation, as I said, ideally, you'd want the data to be normally distributed. And as I said, the Spearman's correlation can also be used during using or dermal categorical data, especially when the distribution is no. Is non monotone ick so non normal. And so this is what I've already talked about. Another, um, important thing, um, is what, like, What do you want to do? What do you do in terms of correlation analysis between categorical variables? Because these are between two continuous variables or between categorical orginal and a continuous variable. And I mentioned previously that if you want to, um, see the relationship between a continuous variable and a binary, categorical variable, you can do the point by serial correlation. But that's I won't talk about that much. But now, if you want to see the relationship between two categorical variables. You do something called the chi squared. Test the chi square test. Um, is you could think of it as the Pearson, but for categorical um, so and so at the end of it, you will get, uh, an X squared value and you'll get a P value and Chi Square test is done for. It can be done for binary, categorical variables and ordinary categorical variables. So comparing, let's say sex and the pain scale you could do chi squared. There is another modulation of the chi squared called the fissure test. Um, the fissure test is done is a chi squared made specifically for binary variables. So if you're comparing, let's say sex to ethnicity and you've only got two ethnicities, um that are depend on each other. Then you could theoretically do, um, a fissure. But you have to make sure that they're both binary categorical variables. If you want to do binary and, uh, any other Ordina low variable, categorical variable, you do quite spread. So that's this. And I've already talked about that. Um, now we come to the top of the table, which is to compare the means of two independent groups so What does this mean? I'll give you an example, but with our hypothetical study, you want to see if a judge is a good predictor of whether or not a patient will have lung cancer. And so your outcome or your dependent variable is lung cancer and your independent variable is age. Okay, you. Could you then want to see for those who do have lung cancer and those who do not have lung cancer, is there a difference in age? Okay, what you would do is you would do the independent T test. What that means is patient who do have lung cancer are independent of patient's who don't have lung cancer. When you have two independent samples, you do an independent D test. You do that if age is normally distributed. If it is not, then you do the Mann Whitney. You test as you can see, the non Parametric alternative, so that is when you're comparing to mutually exclusive groups. You do the paired T test when you're comparing the means of two paired groups. So, for example, you have a cohort of patient's in a trial, and you want to see the effect of a drug that you're trying to do that you're developing. So you take these group of patient's and let's say you look at their hemoglobin. So the group of patient the first group is before you initiated drug, and the second group is the same cohort of patient's. After you've initiated drug, and then your outcome becomes the hemoglobin. Does their hemoglobin change? Is there a mean difference between the hemoglobin's of the two cohorts? If there is, then you could say that the drug probably has some impact on the hemoglobin level. So that is that is the difference between an independent T test and apparently test an independent. He test is when you're comparing two independent groups of petty tests when you're comparing the same group before and after some intervention. Um, they are also called, So the independent repaired T test fall under the umbrella of the student's T test. Um, and the pet test is often called the Paired Samples T test, and the Independent T test is offering called Independent samples T test. They've all got their non parametric alternatives, and as I mentioned, if it is not normal, then you will have to use a non Parametric test and a non normal distributions. You're more interested in the median than the mean because in a non normal distribution, the mean can be affected by outlying data while the median won't. Which is why these non Parametric tests compare the median between the two groups. So that's the difference. The independent samples and the pair of samples t test compare the difference in the mean of two groups for the Independent. They are mutually exclusive, and for the pair they're the same before and after. However, they're non parametric alternatives such as the Mann, Whitney and the Wilcox in ranked. What they compare is the median between the two groups. For for Mann Whitney, you. The two groups are independent for the Wilcox and ranked. The two groups are before and after they are this they are paired or they are matched. And so that is the difference. A non parametric looks at median A parametric looks at mean so that I hope that makes sense. So when you've got a categorical, um, data point that has got two groups, you can then use these tests to compare the difference in continuous data between these two groups. now that often isn't the case. Sometimes your categorical data could have three classes. For example, I did some research on a score ing system and we had a B and C um, for a B and C if you want to, let's say, see the change in hemoglobin between the three of them, you cannot do a T test. The assumption of the T test is that there are only a maximum of two groups, which is why you don't have to move into what we call an an over. As you can see here, uh, one way an an over or repeated measures and over So, what do they mean? An over stands for analysis of variance. And as I mentioned in a normal distribution, you've got variants which talks about the variation in the data you square root the variants, you get your standard deviation. And and and so what happens is when you have three groups or three classes of a categorical variable. Uh, it could be orginal. It could be nominal. Um, what do you do is an unova. The unova does the same thing that the T tests do. I e They compare the mean between the three groups and see if there is any statistical, significant difference. They're non parametric alternatives. Do the median, and that's it. Now, obviously, as I said, the the Independent Um T test is between two independent groups of paired T test between the same group. You have the same thing applying for an over. So, for example, in the grading score system that I talked about A, B and C, there are three mutually exclusive, um, grades, and so you would use a one way and over. Now let's say you want to do three different time points, but for the same cohort. So, for example, let's look at a drug treatment trial again. You have patient's when they started with no drugs. Um, then the second cohort two years after, and then the third cohort, Uh, and third Timepoint. Sorry, which is four years after. So they're the same patient's you're following up over time. And so they're the same paired group, so you would use a repeated measures an over. It's in the name you repeatedly measure, and then you compile, um to then then you perform the an over to see if there's any statistically significant difference in the means. Okay, You've also got to weigh an overs and much more and the and that depends on the number of classes you've got. Um, and as you can see, they've both got their non parametric alternatives. So those are the most common, um, Parametric, which is non parametric test you will use in clinical research. While I'm on this topic, there's one more thing I want to talk about, which is a one way or one tale, which is a to tail. So what does that mean when you form a hypothesis? There are two ways of forming the hypothesis. The first one is saying that Oh, let's say you know, age and lung cancer that the older you are, um, the more likely you are to have lung cancer. That is a one tailed hypothesis. One tale means you're saying that only as age increases is there an effect on lung cancer. A two tailed hypothesis is something that says that there is a relationship between age and lung cancer. I am not sure if that hypothesis depends on whether you're younger or you're older. That is the difference A to tail has two hypotheses to directions of variation for a variable and a one tailed hypothesis has a one uni directional. Now, I would always recommend you to use two tailed hypothesis because you can go into, um, cognitive and subconscious biases, um, using, um, a one tailed hypothesis because, yes, you're expecting that to be the case dependent on let's say, previous literature. But it could just be for your sampling bias is the opposite. But if you do a one tale, you won't see anything and you will then report, um um, Air anus results, um, And then that goes then into type one and type of statistical errors, which I won't talk about. So those are your parametric was a non parametric. Um, I know this table by hard because that's how commonly it's used. Um, you don't need to know all the assumptions, but every because every test will have at least four assumptions. All you need to know is which one's a parametric, which is a non parametric, and whether or not whether or not the outcome has got, um, two groups or three groups and whether or not those groups are paired or not paired, okay, so that is the most common Parametric was non parametric test you'll use, and they are used for comparing the means and non parametric for comparing the mediums. Then we move on to a slightly different type of analysis called regression analysis in this table. As you can see in this row, um, when you want to predict the value of one variable or your outcome variable from a set of predicted variables, there are two types of regression analysis that you can do. Regression simply means continuous. Okay, so regression analysis will give you an a continuous answer. That's all it that's all that means. You've got two types of regression analysis linear and logistic. Um, I'll talk about logistic later, but I'll talk about linear now. Linear. It's in the name. It is able to map a linear relationship between a set of variables and an output variable, but the relationship has to be linear as a result. What ends up happening is that it's often used for simple model for simple modeling, where you expect it to be a linear fit, and what happens is for a set of given predictive variables. So in, uh, in our case age, sex, ethnicity, all of that, it predicts a continuous variable. So this is where it's interesting for our study, where we're trying to have a hypothetical study where we're trying to predict whether or not um, a set of variables will predict whether or not a patient has cancer. You cannot use linear aggression. The reason for that being is that your dependent variable is not continuous. Your deep end invariable must be, um, continuous. It cannot be categorical because the answer of a linear regression is a continuous answer, but a categorical variable want zero or one Yes or no? Okay, so that's very, very important. Let's say you are trying to predict the level of a certain cancer biomarker in, uh in these patient's, um, and that is a continuous variable, like looking at the level. Then you could absolutely use linear regression, so you would add age, sex, ethnicity and other clinical characteristics. And your output would be that that level of cancer biomarker for lung cancer in the patient's blood. And so that is when you can use linear regression, you're dependent. Outcome variable must be continuous. You cannot usually near aggression for categorical Um, and this is where the second type of regression analysis comes into play. Logistic regression can be used for categorical dependent variables, and I'll talk about that now. So logistic regression analysis is singlehandedly the most used statistical technique in modern medicine. Almost every single paper will have something to do with logistic aggression, and that speaks volumes of how powerful this model is. This model, just like the linear regression model, falls under the umbrella of regression analysis. But to be more specific, they fall under the umbrella of G L M, which is generalized linear models. What happens in logistic aggression? Um, and linear regression is very well presented by this graph. This blue line is your linear regression model. At the end of the day, it's simply a mathematically equation. And because it's linear, it's a straight line that is trying to conform to your data. Set a logistic regression model. It ranges from 0 to 1, and it's part and it forms this s shaped curve. And now this is where what a logistic regression shows is very important. So a logistic regression, because it's got the word regression, must give you a continuous output that continuous output is a probability. And so, as you can see here, logistic regression will give you a probabilistic output between zero and one. So for a set of input will give you an output that output being the probability. And what probability does it give the probability of that patient or that input belonging to one of the two outcome criterias? So let me break it down even more. As I said, if you have a categorical output or categorical dependent variable, you cannot do linear regression. You must do logistic regression. What logistic aggression will then do is show you for a given set of characteristics. This is the probability that the patient has or does not have lung cancer. That's all it does. It gives you probably even zero and one that probably could be 99.9871. That is why it's called logistic regression is because the answer is continuous. And for those math fanatics here, this is the equation for that. This p is the probability, and you've got two types of logistic regression Univ area ble and multi variable, which is self explanatory. Um, in uni variable you use one patient predictor one independent variable. So, for example, we if you want to see if a job is a good predictor of whether or not a patient will have lung cancer, you do. Univ. Arable Logistic Regression Analysis. If now you want to see what the effect is by adding race, ethnicity, sex and all of that, you can add all those variables to perform multi variable analysis. That is it. That is simply the difference between logistic and linear. And that's why it's so important to know what you're dependent on. What you're independent variables are whether or not they're categorical or continues because in my head, if you were to tell me that the output is is categorical there already so many tests I can rule out like I can't do a linear regression on that because the output is categorical. Then when you talk about oh, is the output or denial or is the output binary categorical because that then defines the type of logistical question you would do. As you can see here, there are four common types, binary being the most common. So in our hypothetical scenario, our output is binary. Yes or no If the patient has lung cancer, he can also have multi know meal, which is in the name itself. It's a nominal variable that's got multiple classes. Okay. Or it could be orginal self explanatory if the output variable has multiple grade. So, for example, in the story that we did for a scoring system, you had a B and C. That's the output. And so you have three classes and they are Ordina. Well, uh and then the other type of logistic regression you can do is a step wise. I won't go much into it. But you can use a step wise to see which predictor or which independent variables are useful and which aren't. So what? So now that we know the type of variables we can use And now that I've told you that it gives you a probability what exactly happens in a logistic regression? So this table is how usually a logistic regression will look. This table is produced using S. P. S s, which is a statistical analysis software, and this table represents Univ. Area ble binary, logistic regression analysis. Why is it why is it Univ arable? Because it's only looking at the impact of platelet, which is continuous, and the output variable for this analysis that I've conducted here is mortality. So whether or not the patient died at five years and the columns in bold are the columns you need to know as a student because they are the ones that are most often reported in medical papers and the ones that determine what the impact is of that variable. So this significant value is your P value. So if you look at the platelet row, you see that the significance is 0.36, which is less than 0.5. So we know that there that this variable is statistically significant, it impacts the outcome in a significant manner. The next thing we need to look at is X B or the exponential of be also called the odds ratio. The odds ratio is singlehandedly the most useful piece of information from this entire table. The odds ratio tells you or gives gives you an idea of how that variable affect the outcome. So, as I said in a hypothetical study, our goal was to see whether or not age impacts whether or not um the patient will have lung cancer, and how we would determine that is by analyzing the odds ratio. The odds ratio can be can have three values. It can be equal to one. It can be more than one, or it can be less than one, but it will never be negative. Okay, so what do I mean by this? An odds ratio tells you the odds of you having the outcome. Given a certain value of your independent variable, an odds ratio of one means that this variable has no impact at all on the outcome, because your odds will increase by a factor of one i e. Your odds won't increase at all because anything multiplied by one is the number itself. And so your probability remains the same. An odds ratio of less than one means that it negatively affects your outcome. What I mean by that is, if it's less than one, you are less likely to have that. So in this case, where, um, I was looking at mortality platelets. The odds ratio is less than one. It's 0.997. What that tells me is the more platelet I have, the less likely I am to die at five years. So that is what the odds show. It tells you the odds of your outcome. If your odds ratio is more than one, then it's positively prognostic or it has a positive impact on on your outcome variable. So if my platelet if my platelet operation was more than one, I would say the higher my platelet, the higher my odds of dying at five years. And so that is what the odds ratio represents. If it's equal to one, it's useless. It has no impact on the outcome. If it's more than one, then it positively affects your outcome. If it's less than one, then it negatively affects your outcome. And that is how people conclude whether or not a variable is negatively or positively prognostic. And this is where you get into diagnostic and prognostic modeling by using logistic regression, because it'll give you an idea of how the probability changes. Um, for given a set of input variables on the outcome. So before I move on, just to summarize linear regression and logistic regression are two types of regression analysis. Linear regression is done on continuous output variables and logistic regression is done for categorical output variables. Your logistic regression has got multiple different types, depending on your outcome. Variable. If your outcome is binary, you do binary logistic. If it's ordinary, you do order logistic. If it's nominal, you do nominal logistic regression. The logistic regression gives you a probabilistic outcome between zero and one. The most important thing is the odds ratio that it produces. The odds ratio can have three values more than 11 or less than one. If an odds ratio for variable is one, then it has no impact on the outcome. If an odds ratio is more than one, then it is positively prognostic for the outcome. If the odds ratio is less than one, then it's negatively prognostic of the outcome that is as simple as I can make it for you is logistic regression and linear regression. One of the rules that you often see in papers is, if in doubt, do logistic aggression. Um, given, obviously a variable is categorical, but a lot of papers will just do logistic regression. Um, for the sake of doing it, not every paper needs logistic regression, Um, and not every paper uses it rightly, Um And then also before I move on, um, last two columns here are your 95% confidence interval for your odds ratio. And as I mentioned, a 95% contents interval is a marker of your precision. So let's say, um, for this case, if my upper odds ratio were to be more than one, then I cannot conclude anything. So what I mean by that is, um, my odds ratio from my data set is less than one. But if you're if the analysis tells me that the 95% conference in Ter will also include an odds ratio of more than one, then I have two contradictory findings. I have a finding that says it's negatively prognostic and have a finding that says, If you were to perform this 95% of the time, you might see it as positively prognostic. And so, which is why, in that case, this variable would not be statistically significant. So these are a few things you could keep in mind when you're interpreting odds ratio. Always look at the 95% confidence interval if that confidence in Ter bull includes one as well as one of the other sides of one. So if it if it includes, um, negative or positive, then you have to be very careful, because if your data set odds ratio is less than one but the 95% conference and that will include an odds ratio of more than one, then you are add a quite a stalemate because you can't come to a significant conclusion. You can't call it a negatively prognostic because because it isn't because if you were to do this at 95% of the times, you would see a positive prognostic impact and the similar thing the other way around if you were to get an odds ratio of more than one. But your lower conference in Turnbull included a odds ratio of less than one, then you are both positively and negatively prognostic at the same time. Um, luckily, in this case, for platelets, as you can see, um, the conference interval does not leap over to a positive odds ratio, and so that it's still statistically significant. And so you can come to the conclusion that, um, the higher platelet, the more likely the less likely you are to die at five years for at least this analysis. So almost at the end, um, we talked about modeling and testing a lot, especially during this era of covid when we had PCRs and everything come up. And you would also see, um, lap flows and everything you would see 99% specificity, 98% sensitivity for the PCR, and for the lap flow you'll see much lower on the seventies and eighties. What does that mean? And what is it that they're trying to tell you about it? So any this, this classification table I'll talk to you about in terms of a test or a model or a task that involves a prediction. As you can see, this is what we call a two by two matrix or a classification table or a confusion matrix. They are all the same thing. All it does is it looks at the ground, the truth of the data set, which is the original, and then it looks at what your task or what your model or what your test predicts. It is done to analyze the efficacy of that test or that model. So I will tell you to talk to you about this in context of the PCR for covid, the PCR for covid. I have friends who were tested and were negative, and I have friends who will test them were positive. It is not always the case that that answer is the truth because they do not have 100% sensitivity or 100% specificity, which is why you can have false negatives and false positives. So what do all these terms mean? Let's say you have 100 patient's who, um 50 have covid 50 don't you? Then send all of them to a PCR test center. They get tested and then the results come back one or zero. So they have covid or they don't have covid. You then match those results to this data set. So those who had covid and the test correctly predicted them to have covid are truly positive. However, those who had covid but the test predicted, predicted them to not have covid are false. Negative because they're falsely negative. They have covid. But the test said they don't and a similar rule applies for the other criteria. Uh, the the other group, if you don't have covid and the models and the tests that you don't have. It's a true negative. But if you don't have covid and the test says that you do have one, then that's false. Positive. I would recommend you have this by heart, know the difference between a false positive and a false negative in correlation to the predicted group and the original ground truth. And it is from this classification matrix that we can calculate every single thing. So how do we know that the PCRs have a 98% sensitivity in a 98% specificity? Is because they would have done tests where they looked at this classification matrix and they did all this addition, subtraction, multiplication, and they came up with these numbers. So now what do what do these numbers mean? One thing you can talk about is how accurate the test is. Um, in my, um, measures of central tendency slide, I showed you the accuracy, which was this in this graph, we were evaluating the accuracy of a model that we had created, and we see that the accuracy is 90 0.962. You multiply that by 100 you get a 96% accuracy. So all we did was to calculate that we looked at the original data set and then we saw water model predicted, and then we made this classification matrix, and then we plugged in this formula and we got accuracy. Accuracy presented by this red rectangle is simply looking at out of all the false, positive, false, negative, true, positive to negative, out of all the patient's how many did you get correct? And that could include, and that will include the true positives and the true negatives. So it's a ratio of your true, um, prediction's divided by all your prediction's and so going back to our slide, where we saw an accuracy of 96%. What that tells you is that our model has the ability to correctly give a prediction 96% of the time. That is what it tells you. It tells you the ability or the capability of a model in identifying the true um, the positive and the true negatives. I e. It's a marker of their predictive capability. So then what is the use of sensitivity and specificity? Then you might ask sensitivity from the equation. As you can see, all the purple box is your true positives divided by your true positives, plus false positives. So sensitivity and specificity talk about how good your test or how good your model is. So the higher your specificity, the more likely you are to actually identify those who have the disease. So, for example, a PCR has a sensitivity of 98%. What that tells me is that it has a the ability to identify correctly the patient's who do have covid 98% of the time. So, as you can see, it is a ratio of your true positives divided by your true positives, plus false positives. I eat is looking at all the positive prediction's made by the model and finding the ratio of how many of them were actually correct. Send the specificity is the opposite, but for the negative category, so it looks as your true negatives divided by your true negatives, plus false negatives, which is your green box here. Specificity tells us what is the ability of the model to correctly identify those who don't have the disease, and it's calculated as a ratio of those who did have the disease compared to every single negative prediction by the test of the model. And those are the three most important things. Um, from that you can get from a classification table is your sensitivity, your specificity and your accuracy. Um, and this is very important to understand the differences because I can tell you so many times I was confused when people talked about all the the D dimer, which is a blood test. You do, um, in patient's with pulmonary embolisms is highly specific, but it's not highly sensitive. And so I'll tell. I'll tell you what that means. It means if you do have a positive d dimer, it is not the case. You have a P. That is what it means by low sensitivity. But a high specificity tells you that if your d dimer comes back negative, it is highly unlikely that you have a pulmonary embolism. And so that is how sensitivity and specificity work. And in the context of covid PCR tests, it's exceptionally high 98% for for both of them. So it's saying that with high precision, we, um, the the test will tell you whether or not you actually have covid. Um And so that is how you interpret sensitivity and specificity in the in the clinical context. Because, yes, all of this is theoretical. But how does it work? And so you could have the same thing apply for MRI scans for stroke. Um, for other infectious diseases. For as as I mentioned for diamonds, for blood tests for everything. So those were the basic stats. Now the critical appraisal. How do you appraise a paper? And what does it even mean to appraise the paper? Um, in medical school, critical appraisal can be defined Two parts. The first one is where you analyze the methods and the results. And the second part is where you analyze the discussion. They are two very distinct skills. The first analysis of the methods and results is the one that I am particularly good at doing. And this is the one I'll talk to you about is seeing how robust their methodology is and whether or not you think you are, you would be able to reproduce those results. And the second type of analysis is analyzing the discussion to see that the interpretations of the results are similar to how you would have interpreted them. And if they aren't similar, then how then you need to see if their interpretation is even valid, given the wider literature. So, um, are you talking about how you appraise method and results? And so how would you go about doing it? The first question that you wanted to look at is what is the paper trying to answer? And that question will then lead to the sub questions, which is? What are the predictive variables? What is the outcome? Variable. So, for example, if a paper tight aled um, prediction of mortality in patience with lung cancer, your you already know that the outcome variable is mortality, and you already know that these patient's have to have lung cancer because that is how it's set. However, often you have to do a critical, um, read like line by line of the method section, and it's often said, what are the predictive variables? Was the outcome variable? Then you have that in your head, then you need to. Then you have to come to the conclusion of which are continuous variables, which are categorical variables. Is the outcome variable continuous? Is it categorical Is it ordinary? Categorical. What is it? And so it's a step wise process. Identify the variables. What are the types of variables then? Once those are, Once you analyze those, you then look at the tests that are done by the paper. If the tests have done Parametric tests, then you can assume that the variables are normally distributed. And then you have to keep in mind that every single, tentative used is then a normal, um is then a parametric test. Um, and then you move on to your regression analysis. Is the output variable classifications? Is the output variable continuous or categorical? Then the the the last step, which is, Are they the most appropriate tests? If not, what could they have done? That is the one that comes with experience. The more research you do, the more ideas you have as oh, you know, I could have interpreted this, or I could have done this analysis in this particular way. And but the foundation remains the same. You look at the variables, you see. If that continues categorical. Okay, then you look at whether or not it's whether or not they've used Parametric or non parametric tests. And if they've stuck to that, and then you look at the type of regression analysis that they've done, Um and so that's when you are praising another paper. If you are doing the paper, this is the pipeline that I would, um, say you look at the code demographics. Okay, That will also help you determine the normality of the continuous variables if their continuous. Then you move on to correlation tests and Parametric and non Parametric tests, depending on the normality of the continuous variables. And then you move on to your regression analysis, which is in order to see the outcome like, how does this set of input independent variables affect outcome? And so that's the process of doing, um, or having a strict methodology. So now it's time to analyze the paper. I recommend all of you, um, listening into this to go and have a look at this paper. Um, this paper was from 2009, if I remember correctly, and for that time it's a good paper. So what was this paper trying to do? Is trying to develop a scoring system to predict 30 day mortality in patients undergoing hip fracture surgery? No. Already from reading this, I can tell you two things. The first being the outcome is 30 day mortality. So it's binary, yes or no. Were they alive at 30 days? The second thing I can tell you is the cohort. This is the cohort being patient's undergoing hip fracture surgery. And so that is how you start building an idea. Often Title's aren't this descriptive? Um, and you have to read the paper first before you have an idea, even a wake idea of what's happening. But in this paper there, luckily you know, told us from the beginning what exactly it is. So as I said, they probably will look at multiple variables, um, to predict the binary outcome, just 30 day mortality. And I was right. If you look at Table four, this shows all the variables that they look at the look at age, the sex of the patient, their admission, hemoglobin, Um, this is some other scoring system, whether or not they were living in an institution and the number of the core mobilities and whether or not they had any concurrent malignancy at the moment. Um and so those were there Variables that they used or there independent feature predicted variables. So if you read this paper, you'll see in terms of the methodology they've done cohort demographics, and then they've done multi variable, stepwise logistic aggression. And, as you can tell Cho, demographics mean medium mode, Um, frequency analysis, categorical see if they're normal, non normal and then they're straight jumped to logistic regression using all the variables. This is far from perfect. And I'll tell you why. So this paper had jumped a lot of steps leaving a lot of it too. Um, the idea of So this paper jumped a lot of steps. Um, they first did not analyze the characteristics of those who did survive versus with those versus those who didn't. Then they also didn't have any mention of their comparative tests. I assumed that they were, um, Parametric, Um and there were no independent samples know paired samples. None of that. Then they were also no tests of normality, which is interesting because the reader is then left to assume whether or not um, the tests are the right ones, because if there is no normality, tests done and you immediately see the author's using an independent samples. You're assuming that they they have a parametric distribution, a normal distribution. But that might not often be the case, and so it's very important to report your normality test results. And then, if you notice here something interesting that they have done in order to create the scoring system, they have been arised the continuous variables. So if you look at age here, it's not a continuous variable. It's been divided into two categories. 66 to 85 more than or equal to 86 years. Same for hemoglobin. The two categories are less than more than 10 or less than 10, and in doing so, you often lose a lot of information because it might not be the case that these cutoffs that you've made to been a rise, a categorical or continuous variable are in fact the best. Because then you are losing what a 84 year old and an 87 year old have in common and also don't have in common. What I mean by that is it might not always be the case that everyone about the age of 87 behaved in the same way. Um as compared to a continuous one that can give you 87.1 87.2 87.3 years and you can go on and on and on. Um and so as a result, they dichotomies their continuous variables using arbitrary cutoffs. They said that anyone above the age of 85 is a separate category. Everyone less than 85 is a different category. There was no robust methodology as to why the cutoffs were being made apart from the fact that it would be useful in clinical practice. And yes, that is often the case that something that is theoretically helpful might not be practically helpful. However, there are now growing techniques that help you do that. Something called L E S s curve analysis, Um, which can be done in more statistical Softwares that look at the change in the precision of your results, depending on the cut off. Um, and obviously, when this paper was published, I don't assume such techniques existed. And so for this time, this paper is good, but it could be it could have been made much, much better. Additionally, when you're trying to predict mortality, it becomes a Time series event, and you then have to move past logistic aggression, um, into survival analysis. Because because patient's could have lived for more than 30 days, they could they could have lived for one year. Why didn't you do one year? So what it's called, it's called censoring. You've censored your analysis to 30 days, and you and then there's much more types of analysis that you can do on top of logistic aggression when you have the question of mortality and survival in it. And so this is. Then you start moving into the realm of not so basic statistics. And so this is what I mentioned. There are some tests that you will see used in every single paper that is out of the scope of this lecture, and I would love to talk about talk more about them, but they aren't basic. You don't need this to get past your bandeau paper, so the first one is discrimination analysis. This is the graph on the right. It's a rock curve or receive er operating characteristic curve. All it plots, as you can see, is your sensitivity divided by as on the on the Y axis on the X axis. You've got your one minus specificity, and it tells you how accurate test is or how accurate a model is. Then you have survival analysis, as I mentioned, you know, when you whenever you have mortality or survival, you can do so much more than a logistic regression. You can do this thing called a Kaplan Meier analysis, a log rank pair wise comparison test and a Cox proportional hazards model, and you could go on and on. Similarly, you have also done some things called the post hoc test. In a lot of the T test that I mentioned and the an over test that I mentioned, you can perform post hoc tests such as the turkeys HSD and the bonfire, Oni, etcetera, etcetera. And all they do is it gives you a more granular breakdown of the difference. So what I mean by that is the an over will tell you if you've got three categories, it'll give you a P value for the overall if there's any overall difference in the mean between the three categories, but you have to do a post hoc test to see where that significance is, is it between Group one and two between two and three between one and three. So that's how it works. And that is how you do. That is why you do post contests. And lastly, um, you can also do logistic regression, but with some sort of penalization. What I mean by that is is to prevent the model from, um, over fitting to the data. And those are the different types that are not necessary. And that's about it. Um, I hope I have covered most of the basic stats, and I've given you some idea of how to critically appraise. Um, thank you so much for listening to this, um, lecture. You can contact me via email. I'm happy to answer any questions you have. Um and yeah. Thank you so much for listening.