A brief introduction to statistics, delivered by Dr Simon Eaton. Recorded at the 69th BAPS Annual International Congress.
BAPS Congress 2023 | Introduction to Statistics | Dr Simon Eaton
Summary
This on-demand teaching session is a basic introduction to statistics for medical professionals designed to help them interpret different types of data. Senior lecturer, Simon Eaton emphasises understanding when to apply different statistical tests based on the characteristics of the data at hand, such as whether the data is continuous, dichotomous, or ordinal, and if it is normally distributed. With easy-to-understand examples, he explores various statistical concepts including P value, data mining, and data transformation. Dr Eaton also highlights the importance of data interpretation and the potential for misleading statistics.
Description
Learning objectives
- To understand and interpret statistical data within the medical field, delineating between clinical and statistical significance.
- To differentiate between the types of data: continuous, dichotomous, and ordinal, and recognize when each is useful in medical research.
- To identify whether data is normally distributed or not by observing mean and median values, skewness and kurtosis measurements, and applying normality tests.
- To learn the difference between parametric and non-parametric tests and how to choose the appropriate test for analyzing data.
- To explore the concept of data pairing in medical statistics and how to appropriately analyze paired data.
Similar communities
Sponsors
Similar events and on demand videos
Computer generated transcript
Warning!
The following transcript was generated automatically from the content and has not been checked or corrected manually.
Well, welcome everyone. Thank you for joining this session. I'm delighted to introduce Simonton, who's the senior lecturer of the Institute of Child Health that has helped me with some statistical problems and I'm sure will help you learn some more about statistics. Um So thank you. Oh, we've got some more attending now. Um So I'm gonna give what's more or less a stats 101. So um I hope that some of this you will already know. Um But I'm just, if, if you already know it, then I hope there's some new points for you as well and some refresher points. Um So a disclaimer at the start, I don't have any statistical qualification. I didn't do any stats at school. I hated statistics and there were no P values in my entire phd thesis. But when you actually realize what stats can do for you, then you can become more interested in it. Um So what is statistics? So collection, organization analysis, interpretation, presentation of data. Um There are three types of lies, lies, damn, lies and statistics. Benjamin Disraeli statistics is an opinion. Anyone know who that's a quotation from Pao de. Um So statistics themselves aren't very, aren't actually that useful. You need to be able to interpret them. So this is a, a quotation from a book called into the kill zone. A cops eye view of deadly force, which I haven't read. Um, but, um, so it's, it's clear that police shootings are extremely rare events and that few officers less than one half of 1% each year ever shoot anyone. Now that may in America seem to be a low statistic but obviously in the UK and probably in Belgium as well, we regard that less than one half percent, one half of 1% each year is quite a a large amount. So it statistics and numbers always require interpretation. So traditionally, we use ap value of 0.5 as a cut off for significance, but clinical significance and statistical significance are not the same thing. So if you invented a new way of inguinal hernia repair, that resulted in a 1% mortality rate, then obviously that would be clinically quite significant, but it would be quite hard to statistically demonstrate it because you need an awful lot of patients to show that. Um So one of the consequences of using ap value of 0.5 as a cut off is that basically that's saying there's a one in 20 possibility 0.51 divided by 20 that the difference is by chance. So that means if you do 20 t tests, you're quite likely to find at least one of them is significant. So if you look at this and I don't expect you to read the table just to see there's quite a lot of P values on there. Um So with 26 comparisons, it's unsurprising that we find some that are statistically significant according to the P value cut off of people's 0.05. Um And you can see I've put rings around the ones that are significant there. So I mean, that's, that's great for people that want to get a paper at ba S because the more comparisons I do, the more likely I'm gonna find one that's significant and that can be the headline of my my ba s abstract. Um But doing lots and lots of comparisons until you find a significant ones called data mining or more charitably as explorative hypothesis generating analyses. So it's, you know, it's useful, but it's not definitive if you do that kind of like looking around for AP value. So we're going to go through some basic stats. Um So first question is what, what, what kind of data type am I looking at? So are my data continuous like BP or weight, dichotomous like alive, dead, full feeds, not on full feeds, recurrence, no recurrence or ordinal. That means in order. So like a pain score or cancer staging. So if you've got continuous data, then the next question to ask is, are my data normally distributed? So what's a normal distribution. So this is what we call a Gaussian distribution. Um So this is a, uh if you take say weight or height or something like that, then they'll be distributed about around the average. And so you can see this is some data, you can see that um this is birth weight in 502 gastroschisis babies. And you can see that actually, it approximates to a Gaussian curve. Um One thing you can do with um with graphs is hide data. So if you wanted to make the data look really good, you obviously like do a graph like this, which is a mean and standard deviation or standard error sorry here. But actually, if you want to display them honestly, then the best way to do is a scatter plot or you can see that box and whiskers is also quite useful with a median interquartile range and range. So you can see there's some potentially some wrong values down here that have been input into the datas sheet wrong. But you can see that approximately this kind of light approximates a normal distribution. So examples of data that are normally distributed, weight, height, BP IQ scores and they're often normally distributed, doesn't necessarily mean they're always normally distributed. Um But there are other types of data that are not normally distributed. So here, for example, the same dataset looking at length of stay, you can see that although there's a there's a peak at around 30 to 40 days. There's a long tail on there, presumably these are babies that had an atresia or complex gastroschisis or whatever, which meant that they had a longer length of stay. So this doesn't fit a normal distribution. And so you can see that again via some of these types of graphs but not others. So obviously here with the scatter plot, you can see straight away that that's not normally distributed with a box and Whisker plot, you can see that because the two Whiskers are very, very asymmetrical that they're not normally distributed. But hey, look in the middle, we've got mean and standard deviation, you can't tell anything about the, whether that's normally distributed or not because you're hiding the data. So examples of data that are not normally distributed, length of hospital stay, time to full feeds, things like that. Um So how can I tell if they're normally distributed or not? So one thing you can do are are the mean and median, close together a and kurtosis between minus one and plus one and do the data pass a normality test and I'll just show you what these are. So again, with the data that we've already looked at. So birth weight in this column here, you can see that the median is +2413 and the mean is +2421. So the median and the mean are fairly close together, pretty good indication they're reasonably normally distributed length of stay. On the other hand, 35 and 53.7 quite a distance apart. Obviously, the problem with this is there's no definition of how close do they have to be. But it's a pretty good indicator as well as looking at the graphs that I've just shown you. So as well, um, most statistical packages will also give you as well as the median and the mean, they'll give you these data called skewness and kurtosis, which just refers to the spread and the shape of the data. And you can see here that um I'll just go back a slide. I said between minus one and plus one. So you can see that birth weight, um They both fulfill that criteria being between minus one and plus one as length of stay definitely doesn't. Um There are statistical tests for normality, but none of them are particularly great or definitive. So you can see that even birth weight passes three of the tests but not the fourth was length of stay passes, none of them. So they're all just indicators to help you. Um So what do I do with normally distributed or non normally distributed data? So if yes, you can do t tests and over all the things that need normally distributed data. If they're not, you can either use nonparametric tests or you can transform them to normality. That sounds like magic. Sure, no safe. Move it down. Um I personally don't trust them. I personally just go by median mean skin kurtosis and looking at the data so great. And um statistics is an opinion. Remember, power load a copy and again, it's, this is, it's, it's choice. Some, you know, you could, it's like choosing your poison here. You could say, OK, well, it's not past the Shapiro Wilk test. So therefore I, I'm gonna not, but it, it's a bit for your feel of whether you think it's normally distributed or not, but there isn't a definitive testing reads to test was done. Do you read any significance into it? Um, does it reassure you or not? If, if it, so there's lots of different sub bits of that, which is that if it's small numbers, then none of the tests are any good anyway, if it's small numbers, it's very difficult to spot whether anything is normally distributed or not. And so it kind of, again, it's a bit of a sense check. If it, if it makes sense and someone says it passed a normality test, then you're probably going to believe it. But if it doesn't, you might question it even, especially if it's something, you know, is not really going to be normally distributed like length of stay. I mean, you, you could do, if, with a small number of patients, you could do length of stay and find that it passed normality tests because you've just got small numbers So it it is it's a bit of a feel thing, I'm afraid. So what does transformed to normality mean and what are nonparametric tests? So transforming to normality, there's various different transformations you can do. One of the most common ones to do is take a log. So you can see here, I've explained shown graphically what that is. So you can see the length of stay scale. Here is a logarithmic scale. And you can see just by looking at the data that it looks more like a normal distribution, obviously, that doesn't really, still doesn't tell you anything. And this, it still looks a bit funny to be honest. Um And now looking at the um looking at length of stay um having done a log transformation on it, you can see that the median and the mean are a lot closer together, kurtosis is still failing. So it's still outside minus plus minus one to plus one. Um But it still fails on the normality tests. So, I mean, ii would probably be happy to do parametric statistics or T tests or whatever on a date on something that's like that because it's closer to normality. So the next question to ask is, are my data paired? I'll come back to nonparametric in a minute. Um So if you've got paired data, so like before or after, then it's a lot more powerful to use the power of that pairing. So here's an example here of some data on temperature where before and after an intervention and you can see that's not significant. But then actually when you look at the individual data points and these are the same data and you can look at the data and see, well, I think there is a drop and a paired T test will give you a significant P value because you're more appropriately using the power of the data. If it is paired. And again, just to emphasize it yet again, it shows you how useless just a mean and standard deviation or standard error is, doesn't really, it's not really that informative. Um So nonparametric tests rely on rank rather than numerical value. And it's also useful to know where a test result is reported as out of range. If it's too high to measure or too low to measure. If you're doing parametric tests, you can't include that data point. But if you're using nonparametric tests, you can because you can include it as a top rank or a bottom rank value. So a classic example of a nonparametric test is a man wet test, which is the an equivalent of at test that depends on ranks. And so you can see here some data on interleukin six and the lab will only report a range of between 101 pe per me. So they've reported this group, these three is out of range because they're more than 100 they've reported this value as below range. So not given a value. So here using um we can just assign them arbitrarily zero. So below everything else and high values arbitrarily high, and then you, then we rank them in order. So 1000 and two is the highest. And then you can actually do statistics um comparing the two groups or the ranking of the two groups. And that's the way a parametric test, a nonparametric test works. Why don't we use nonparametric tests all the time? It's rarely wrong to do a nonparametric test, but it's not very powerful because it doesn't involve the magnitude of a difference only higher or lower. So to a nonparametric test, then 1.1 versus 100 and 33.2 is, is the same as 1.1 versus 1.15. It's just that's higher and that's higher. And that's, that's the way a nonparametric test sees it. So you're not using the power of the data if you, if you do a nonparametric test where it's appropriate to do a parametric one, um one sided or two sided test. OK. So what does that mean? Um normal distribution, we've got tails to the distribution. And so um if you're aren't asking the question is treatment X different from treatment Y could be greater or less than you have a two tail test. Whereas if you're using the lower tail is treatment X less than treatment Y or the other tail treatment X more than treatment Y. But here's the thing. So if you have a scenario, if you got, you've invented a new antihypertensive drug, which works in an animal model and you try it out in human hypertensives. Your hypothesis is that reduces BP. Do you perform a one tail test? Um Does anyone in the audience want to stick their hand up and say they're doing a one tail test? Yeah. So, but what if the drug increases BP? Would you care? Of course, you would, you're gonna care because you're giving it to hypertensives and it's increasing their BP, you're interested in that. So if you would be interested in either direction, you should do a two tailed test. So I, I've very rarely been able to justify to myself doing a one tailed test at all. Um Pretending that you're only interested in one direction to get a significant result is uh I'll leave it blank here, but I could be an expletive um, dodgy I think would be a charitable way of putting it. What if I have more than two groups? Can I just do a series of T tests? So if I do, if I've got four groups here that want to compare and do six T tests, then remember because I'm just doing six T tests, I've got a greater possibility of finding a significant P value by chance. So what we do is something called analysis of variants or an over which II was bewildered. By the first time I heard about it, it seemed really terrifying, but actually, it's quite easy. So you've got a single P value overall, which is saying basically, is there evidence for a difference between the groups? Here? There's no evidence. But then also you can also get what's called a post test, which is these individual differences. So that's the way to compare when you've got multiple groups. Uh And there are nonparametric equivalents as well. Sorry. Yeah, if you have the same thing over time, yeah, I have. Yeah, that OK. So it depends. Sure questions. OK. Yeah. So, so the question here is um it is if you're looking at the same patient again and again, over time, do you just do, do you do an and over? OK. So there is like the pair T test, there's something called a repeated measures and over. So that's the ideal way to look at it. I say ideal because there's a problem with it, which is it relies on having a complete dataset. So if you had 10 patients that you measured five times and you measured all of them, five times, that's fine. But if there's one patient, you missed a measurement on in theory, you can't do repeated measures and over other tests are available. But yeah, so I quickly conscious of time here because um Anna's got embryology part two to talk about. Uh So I'll go through this relatively quickly regression itself is quite a minefield. So don't try it at home unless you're with someone that knows what they're doing. But we do know about regression analysis. So simple linear regression is probably learned a maths at school Y equals MX plus C uh is the, is the equation of the line. That's the best fit of the points we get an R squared value and AP value the P value is, is the line significantly um not uh not zero. Um So back to our gastroschisis babies. So here I've plotted gestational age against time to full feeds. And you can see that there's a significant but weak relationship. So the R squared value is between zero and 10 is the lowest. So it's a weak uh a weak correlation and you can't really see anything particularly obvious there. Um And that's the equation for it. Um We can also do other parameters. So birth weight, there's also an even weaker but still significant correlation with a different equation. But what happens when we want to combine them? Can we combine them? So multiple linear regression is doing this with more than a single variable at once. So, so you can plot as I've done here, uh one dependent and two independent variables in 3d space. So we've got the Z axis is the time to full feeds. And then we've got the equivalent of those two graphs, birth weight and gestational age but plotted on the same graph. OK. But then how do we do statistics on that? Well, we use a technique called multiple linear regression. And uh what we get is basically an equation that builds up from uh the constant, which is the same as the intercept on linear regression. We have significance and we have these coefficients which tell us how the fit varies with age. Um Sorry. So with gestational age here and birth weight here. So in other words, time to fall feeds is 454 days minus 11.8 times the gestational age in weeks plus 0.009 for every gram of birth weight. Um And here we also get an R squared value which tells us how good the fit is like overall. Um So watch out for colinearity when you're doing this. So, gestational age and birth weight are obviously quite closely correlated with each other as you can see here. And sometimes this falls causes a regression analysis to fall absolutely flat on its face. So you have something you need to be careful of. Um And the other thing to bear in mind if you do regression analysis is just have a think about what all the confound are. So I don't have to tell you as pediatric surgeons that Atresia is a big confounder in gastroschisis time to fall feeds. And unsurprisingly when we throw complex gastroschisis in here, we see that complex gastroschisis is a lot stronger predictor of time to fall feeds than either birth weight or gestational age. So, the next lesson is, um, don't over interpret the results and don't er, confuse association versus causality. So here you see a graph of units of alcohol per capita versus liver cirrhosis mortality. And you might think that, ok, well, that's fair enough because units of alcohol cause cirrhosis and that causes mortality. But then actually when I reveal what the real units are, which are the amount of chocolate consumed versus number of Nobel Prizes, then what looked like something causing something else becomes very clear that chocolate consumed does not cause Nobel Prize winning. There's also the Christmas BMJ is quite good for statistical fun. Um So here's a sample, a study that was from a couple of years ago where they looked at the distribution of heavy metal bands in cities in Finland and looked at all cause hospital admissions and, and, and mortality. They found that if you're interested then, but uh cities with more heavy metal bands in Finland actually had a lower hazard ratio of mortality and hospital admissions, but we probably don't think that that's cause and effect. So the direction of causality is also not easy to established. Um So an example here, uh a chicken and egg one is that Vitamin D levels are associated with depression and it could be that sunlight is required to make Vitamin D and that depressed individuals might go outside less and have lower Vitamin D levels or it might be that lower Vitamin D levels directly or indirectly cause depression and people stay indoors. So you can see that actually, it, it's not only difficult to dissociate association and causality, but it's also sometimes difficult to work out what the direction is. Um So just think about your conclusions whenever you do any statistics. So if you did a new operation on five patients and none of them had a serious complication, you would probably wouldn't conclude that the operation is safe. You might conclude that on five patients, it's, there's no evidence for harm and statistics is like that. So it still depends on the strength of what you've shown, but really have a think about what your conclusions in the light of what you've found there. Um I'm gonna go really quickly through dichotomous data um and then finish. So binary outcomes, we, we use a chi squared test or a Fisher's exact test and that's on a two by two table like this one here. Um So this again is the gastroschisis data set and unsurprisingly statistically, then you're more likely to have a mortality with complex gastroschisis. Um Looking at three centers here, there's no statistical difference overall um between the three centers. But what if you're interested in the individual centers? So then we got this multiple comparisons. Problem center one versus center two, center two versus center three center one versus center three is three comparisons. How do we deal with that? Um We do what's called a Bon Forro correction, which sounds really complicated, but it's actually really easy. What you do is you change your significant cut off from 0.05 to 0.5 divided by three because we do three comparisons. So rather than using 0.5 as a cut off, something has to be less than 0.17 to be significant using that. What if I want to look at more than one factor? Well, there's something called a logistic regression analysis. So if we combine those, uh then you can end up with odds ratios, but that's not for the faint hearted. Um So I'll finish by two slides. First of all recommendations about software, Excel is really good for organizing data, but not for anything more than a basic T test. So it's not a statistical software. Um But when you put data in Excel code it, so 10 rather than success failure rather than lots of text because it's quite often if someone, if I get a data sheet from someone that's coded things in text and strings and it's a bit of a pain in the bum to uh have to, if you put it into S PSS or whatever, to have to recode it into numbers. Graph pad prism is great for graphs and lots of statistical tests, but it doesn't do the really more complex stuff. S PSS and STA are both advanced statistical software. Um They're kind of a bit old school. If I was learning stats from zero, I would use R as open source free, flexible, lots of packages really expandable, great graphics. But it's tough. It's a bit hardcore. Like a programming language isn't great for the casual user. Not. So it's not really the place to start with. So that's just an example of some relatively simple stuff actually in our. But yeah, it's pretty unforgiving. And then finally what to read pss help is really good. Prism has some good example, dataset walk throughs this book Medical Statistics is just a really simple explanation of lots of things giving you the QR code for it there. And in the States has a really nice website that gives you lots of annotated output data analysis examples which statistics test. It's a really fantastic resource. So I'll leave that slide up in case you're interested in looking at those QR codes. Happy to take any questions, Joe Could you just a bit when you transform data? Yeah, so that you can test it with parametric test. How do you then interpret that difference between your log transformed data when you wanna put it back into the sort of form that it came from? OK. So the question here is about how to deal with log transformed data after you've done a statistical test. So you're using the log transform data to make it more normally distributed, to get to do a statistical test, to do AP value. And so it kind of depends which statistical test, how you then deal with it. The main or the median will still be the same. So interpretation of a median of five versus 20 is still the same how you interpret a fold change changes because you're doing it in log space, uh how you interpret odds ratios changes because you're doing it in log space. So it's, it's quite context specific unfortunately, but a difference is still a difference. Any other questions, we were just talking beforehand about courses versus worked and should be going on a course or what's your prefer? If I take my own example, I went on loads of stat courses and didn't learn anything because I didn't have any data to play with and I wasn't interested and it was, you know, ticking off a box. Um I, and I didn't do any stats at school during my phd or whatever. And then I suddenly realized that I was going to have to do it to be able to interpret some of my own data that was coming through. So I learned using my data, I went to ask people that for me was an ideal way of doing it. But obviously, that's quite you, you need to have someone that you can ask that can help you. Uh And, and do that, that's the ideal way. Um If, if you go on a course, try and find a course that is a practical course that allows you to do worked examples with people. An even better one that allows you to bring your own data along and get some advice on playing with it. It's every university department or R and D department should have a statistician that you can go. Yeah. But sometimes unfortunately, statisticians cost money and the statisticians, what they also will typically do is you'll come to them with some data and they will say, oh, you designed the study all wrong. Um You go back and get the right data. Um So, although statisticians can be very helpful, they can also be kind of like I'm not interested in this because it's all wrong and you're faced with two years of work and you need to get something out of it. Yes, ideal is involvement from the start. There's two advantages to that. More than two advantages. First of all, you get collect data in a sensible way. Um Secondly, you collect it prospectively rather than retrospectively. Uh and thirdly you've got, then got a statistician who you can then sort of come back to and say, well, actually having done this now, um can you help me and they might help you? And then just one final question for me is, is a I going to make life much easier or redundant? Will I be able to type into chat GPT. What tests should I do or analyze this data? Um What chat GPT will tell you is what it can find on the web, which is what's the most commonly used test for. So it's an information gathering service. Um It does not going to tell you how to analyze data. It will tell you how other people analyze some data. I mean, but in, in this A I world then an understanding of, of, of data and where it comes from is going to become increasingly important. Not less important. Excellent. Thank you very much, sir.