Slides S4 M2 SPSS Basic Level
Summary
This on-demand teaching session, titled "Manipulating Stats and Generating Reports," is a comprehensive training on statistical skills tailored for medical professionals. Led by Neuroscience experts Razan Youssef, Daniel Bou Najm, and Malak Al Bourji, this session delves deeply into descriptive statistics, data visualization, variable types, and data coding techniques, especially through SPSS. The session carefully elucidates concepts like nominal and ordinal variables, showing medical professionals how to optimize their statistical analyses and data interpretation. The session even provides manual coding guidelines and walks attendees through calculating measures of central tendency and dispersion, including the mean, median, mode, variance, and standard deviation. The course endorses practical data visualization techniques, including bar charts, pie charts, and histograms, so that professionals can efficiently present and interpret data in the healthcare setting. Through this on-demand teaching session, medical professionals can transform their data handling and reporting skills. The future of effective healthcare lies in the interpretation of statistics and this session is ideal for those aiming to make a substantial impact in that sphere.
Learning objectives
- By the end of the session, participants will be able to understand the basics of manipulating statistics and generating reports.
- Participants will learn how to differentiate between various types of variables such as nominal, ordinal, discrete and continuous variables and understand their applications in statistics.
- Participants will be able to understand and apply the process of coding variables and the importance of coding in data analysis.
- Participants will gain skills in generating descriptive statistics and data visualizations using software like SPSS or Excel.
- Participants will understand the concepts of measures of central tendency (mean, median, mode) and measures of dispersion (variance and standard deviation) and be able to calculate these statistical measures for data analysis.
Similar communities
Similar events and on demand videos
Computer generated transcript
Warning!
The following transcript was generated automatically from the content and has not been checked or corrected manually.
Boost Your Potential Workshops: Manipulating Stats and Generating Reports (Basic Level) Session 2 Prepared and Presented by: Razan Youssef, Daniel Bou Najm, and Malak Al Bourji M2 Neuroscience, LU Under the supervision of Dr. Sarine El Daouk 1Descriptive Statistics and Data Visualization 2 Variable Types Categorical Numeric Nominal Ordinal Discrete Continuous Blood type: Education level: • Number of people • Age • A • No formal education • Number of correct • Height • B • Elementary / Middle School answers (MCQ) • Weight • AB • High School (Can’t be halved) • O • Bachelor's Degree • Master's Degree • Ph.D. Gender: • Male • Female 3CODING OF VARIABLES 4 CODING Coding–process of translating data gathered from questionnaires or other sources into something that can be analyzed. It Involves assigning a value to the Data given—often value is given a label. SPSS only reads numbers and not letters!!! We need to code categorical variables on SPSS. No need to code numeric data! 5 CODING Nominal Variables For coding nominal variables, the order makes no difference Example: variable RESIDENCY 1 = Nabatieh 2 = South 3 = Beirut 4 = North 5 = Bekaa 6= Akkar 7= Baalbeck – Hermel 8= Mount Lebanon Order does not matter, no ordered value is associated with each response (coding can be at random) 6 CODING Nominal Variables Common coding systems (code and label) for dichotomous variables: 0=No and 1=Yes OR 1=No and 2=Yes Order does not matter, no ordered value is associated with each response (For yes and no, it is better to start coding from “no”) 7 CODING Ordinal Variables Coding process is similar to other categorical variables Example: variable EDUCATION, possible coding: 1 = Did not graduate from high school 2 = High school graduate 3 = Some college or post-high school education 4 = College graduate Could be coded in reverse order (1=college graduate > 4= Did not graduate high school). 8 CODING Ordinal Variables Example of BAD coding: 0 = Very satisfied 1 = Not satisfied 2 = Satisfied 3 = Neutral Data has an inherent order but coding does not follow that order—NOT appropriate coding for an ordinal categorical variable Correct way: (or vice versa) 0 = Very satisfied 1 = Satisfied 2 = Neutral 3 = Not Satisfied 9 HOW TO CODE ON SPSS Code on Excel Code on SPSS 1. First, identify the variable you want to code and copy its categories from excel or SPSS (so you can paste them directly in SPSS) 1. Open SPSS > Transform > Recode into different variables 2. Select the variable you want to recode and click the arrow to make the variable move to the box 3. Put the name of the variable (best to put beside it NEW). Then label it. Click Change 4. Click Old and New Values 5. Put the old value as the category name, and the new value as its code (1, 2, etc.). Click Add for each variable you put. 6. Once you are done, click continue then OK. 1011 1 2 Once you name, label, and click change, you click on “old and new values”. It is better to name it something unique like “NEW” or “CODED” 12You can copy the value from Excel 13 Remember the code for future step!!!! Male = 1 Female = 2 Then click OK 14You will see in the variable view the new variable you created: Now let us add the values… 15Click on the blue button Recall what code numbers you put for each variable Add the number in “value” Add the name of the category in “label” Click Add Repeat when necessary 16 To confirm, click on Data view > Value Labels Then click OK 17Descriptive Statistics and Data Visualization RUNNING DESCRIPTIVE STATISTICS CATEGORICAL VARIABLES 18Select Analyze > Descriptive Stats > Frequencies 19There are missing values in the data = 17 20 Descriptive Statistics and Data Visualization RUNNING DESCRIPTIVE STATISTICS CONTINUOUS (NUMERIC) VARIABLES 21 MEASURES OF CENTRAL TENDENCY MEAN, MEDIAN, and MODE • Mean: The "average" number; found by adding all data points and dividing by the number of data points. • Median: The middle number; found by ordering all data points and picking out the one in the middle. • Mode: The most frequent number—that is, the number that occurs the highest number of times. • Mean ± SD and Median ± IQR Most of the time the mean is reported (especially if the sample size is large), while the median is reported when the sample size is small. 22 MEASURES OF DISPERSION VARIANCE AND STANDARD DEVIATION • Standard deviation: how spread out the data is. You can think of it as “the average distance of the data from the mean”. • Variance: It is the square of the standard deviation that also represents how dispersed the data is 23THERE ARE TWO WAYS: 1 way to calculate the mean, median, and mode: Select Analyze > Descriptive Stats > Frequencies 24Once you put your variable(s), Click on the “statistics” button > Check the parameters Click Continue 2526 nd 2 way (to calculate mean, median, SD, IQR, and variance) Select Analyze > Descriptive Stats > Explore > Add your variable of interest to the “Dependent List” > Click OK 27Mean ± SD 25.8 ± 4.1 Variance = 17.2 Median ± IQR 26.1 ± 6.8 28 Categorical Data - Frequency Descriptive Statistics - Percentage/proportion Continuous Data - Mean ± Standard Deviation - Median ± IQR Exploring Data Categorical Data - Bar chats Graphical Illustrations - Pie graphs preferab)y % They can be in frequency or percentages Continuous Data - Histogram 29• Bar charts and pie charts for categorical variables can be done in Excel (after cleaning the data). • You can copy and paste the table output (frequencies) from SPSS to Excel and modify it to put your variables (in rows) and Frequency or Percentage in columns. It is important to label the axes!!! 3031• To generate Histograms for numeric variables on SPSS, click Graphs > Legacy Dialogues > Histogram > Add the variable of interest 32It is best to wrap text while pasting the graph in your report 33INFERENTIAL STATISTICS 34Plan A- Main Statistical Tests: Two-way Chi-square Test Two-independent sample T Test One-way ANOVA Test Pearson Correlation B- Linear Regression: Simple Linear Regression Multiple Linear Regression 35 Flow Chart 2 Groups Two-independent sample T Test >2 Groups One-way ANOVA Test Continuous Continuous Pearson Correlation Outcome/ Dependent Variable Exposure/ Independent Variable Categorical >= 2 Groups Two-way Chi-square Test 36 1 Chi-Square: χ2 Test of Independence 37 Chi-Square, χ2 Test of Independence The chi-square test is used to determine whether there is an association between two categorical variables. Conditions: • Both variables should be nominal Yes/No, Male/Female, … • Each variable can contain two or more groups Low/Moderate/High, … 38 Chi-Square, χ2 Test of Independence Steps: 1. Analyze → Descriptive Statistics → Crosstabs 2. Transfer the “exposure” variable into the row(s) and the “outcome” variable into the column(s) 3. In “Statistics”, select “Chi-square”. Then click Continue 4. In “Cells”, select “Observed” in Counts, & “Row” in Percentages 39 Chi-Square, χ2 Test of Independence Output & Interpretation: 3 tables will appear in the output sheet: 1) Case Processing Summary: This table highlights the number of valid and missing cases 2) Crosstabulation: It is a descriptive statistics table 40 How to read the results? Chi-Square, χ2 Test of Independence Output & Interpretation: 3 tables will appear in the output sheet: 3) Chi-square tests The p-value should be <0.05 to deduce a significant association between both variable In this case, the p-value = 0.000 (which is reported as “<0.001”. This means that there is a significant difference in the percentage of the outcome (Depression) and the different groups of the exposure (Social media use) ➢ P-value indicates if there is a statistically significant difference between 2 groups, or if there is an association between 2 variables P-value ➢ The lower the p-value, the higher the significance ➢ A p-value lower that 0.05 (5%) is considered as significant 41 2 Two Independent Sample T-test 42 Two Independent Sample T-test This test is used to determine weather the mean value of an outcome variable is significantly different between two groups of participants. Conditions: 1. Dependent Variable (Outcome) is a continuous variable Ex: Blood Sugar Level, Blood Pressure, Scale total score, … 2. Independent Variable (Exposure) is a categorical variable with 2 groups Ex: Gender, … 43 Two Independent Sample T-test Steps: 1. Analyse → Compare Means → Independent-Samples T Test 2. Transfer the dependent/continuous variable into “Test Variable(s)” and the independent/categorical variable into “Grouping Variable” 3. Click on “Define Groups” and add the code of each group based on how you coded it In this example, 1=Male and 2=Female (You can verify in the Variable View → “Gender” → Label) 44 Two Independent Sample T-test Output & Interpretation: 2 Tables will appear: 1) Group Statistics: Descriptive statistics table showing the frequency (N) with the mean and standard deviation 2) Independent Samples Tests: It is divided into 2 parts: • Levene’s Test for Equality of Variances • t-test for Equality of Means 45 How to read the results? Two Independent Sample T-test Output & Interpretation: A) Levene’s test for equality of variance: • If the Levene’sTest p-valueis > 0.05, read the P-value at the top line (As there is no significant difference in the variances between both groups → “Equal variances assumed) • If this Levene’sTest p-valueis < or = 0.05, read the P-value at the bottomline (As there is a significant difference in the variances between both groups → “Equal variances not assumed) B) T-testresults: • If P-value > 0.05 → no significantdifference between groups • If P-value < or = 0.05 → significant difference between groups 46 How to read the results? Two Independent Sample T-test Output & Interpretation: In this example: A) P-value (Sig.) of Levene’s Test is 0.438 >0.05 ➔ So, we continue reading the top row B) P-value (Sig. 2-tailed) of the t-test is 0.029<0.05 ➔ So, there is a significantdifference in the mean of the outcome (Memory Satisfaction betweenboth groups (Gender) 47 3 One-Way ANOVA ANalysis Of VAriance 48 One-way ANOVATest Same objective as the two independent sample T-test, but it differs but the number of groups of the categorical variable: • Two independent sample T-test is used to show if there is a difference between the mean values of two independent groups. • ANOVA is used to compare differences in the mean values of three or more independent groups Conditions: • Dependent variable (Outcome) is continuous • Independent variable (Exposure) is categorical with three or more groups 49 One-way ANOVATest Steps: 1. Analyze → Compare mean → One-WayANOVA 2. Transfer the dependent/continuous variable into “Dependent List” and the independent/categorical variable into “Factor” 3. Clickon“Option”thenon“Descriptive”and“Homogeneityofvariancetest” 50 How to read the results? One-way ANOVATest Output & Interpretation: 3 tables will appear: 1) Descriptives: It indicates the frequency (N), mean, SD, 95% confidence interval, … 2) Test of Homogeneity of Variances: P-Value > 0.05 is required to use the ANOVA test. In this example, p-value = 0.721 51 How to read the results? One-way ANOVATest Output & Interpretation: 3 tables will appear: 3) ANOVA: P-Value < 0.05 means that the mean of the dependent variable differs significantly among the different groups of the independent variable In this example, P-value <0.001 which means that there is a significant difference in the mean of memory satisfaction among at least 2 stages of depressive symptoms 52 4 Pearson Correlation 53 Pearson Correlation Correlation is used to determine the relationship between two continuous variables. ➢ Positive correlation coefficient ➔ Both variables increase in value together ➢ Negative coefficient ➔ One variable decreases in value while the other increases This test calculates a coefficient called “Pearson’s correlation coefficient (r)” that will give an idea about the strength of the association between the two variables. 54 Pearson Correlation Steps: 1. Analyze → Correlate → Bivariate 2. Transfer both continuous variables to the « Variables » section 3. Click on « Pearson » in the correlation coefficients 55 How to read the results? Pearson Correlation 1) Read the p-value (Sig. 2-tailed): Output & Interpretation: • If >0.05: The 2 variables are not correlated 1 table will appear: • If <0.05: There is a correlation between both variables 2) Read the “Pearson Correlation” coefficient: • Only if p-value < 0.05 • This coefficient indicates the strength of the correlation • The strength is assumed by comparing it to the Pearson coefficient table Pearson Coefficient “r” Correlation In this example: 0.00 – 0.19 Very Weak 0.20 – 0.39 Weak • P-value <0.0001, Pearson coefficient • Memory satisfaction and somatic symptoms variables are 0.40 – 0.59 Moderate negatively correlated. However, this correlation is weak (r=-0.321, 0.60 – 0.79 Strong p<0.0001) 0.80 – 1.00 Very Strong 56 5 Single Linear Regression 57 Simple Linear Regression The simple linear regression is used to predict the value of a dependent variable (outcome) based on the value of an independent variable (Predictor/Explanatory factor). It makes predictions about the values of one variable based on values of a second variable by generation a regression equation. Conditions: • Dependent Variable (Outcome) should be a continuous variable • Independent Variables (Exposure) should also be a continuous variable 58 Simple Linear Regression Steps: 1. Analyse → Regression → Linear 2. Transfer the dependent and Independent variable 59 How to read the results? Simple Linear Regression Output & Interpretation: 4 tables will appear 3) ANOVA: Determines if this model is a good fit to predict the outcome P-value < 0.005 is essential. If not, linear regression can’t be done 4) Coefficients If this regression model was shown to be a good fit, this table permits to generate a regression equation: Y= B0 + B1*X Memory_Satisfaction = 48.23 - 1.2 * (Somatic_Symptoms) 60 6 Multiple Linear Regression 61 Multiple Linear Regression A standard multiple regression allows you to predict a dependent variable (outcome) based on multiple independent variables. It is an extension to simple linear regression Conditions: • Dependent Variable (Outcome) should be a continuous variable • Independent Variables can be both continuous and categorical (unlike the simple linear regression) 62 Multiple Linear Regression Steps: 1. Analyse → Regression → Linear 2. Transfer the dependant variable and all the independent variables that had a p-value<0.05 in the simple linear regression. 3. In Statistics, click on « Confidence Intervals » in addition to « Estimates » and « Model fit » 63 How to read the results? Multiple Linear Regression Output & Interpretation: 1) Variables Entered/Removed: Indicates the dependent variable along the independent variables that were entered 2) Model Summary: R Square determines the percentage of the variability of the outcome by the independent variable. In this example, R Square = 0.194 which means that all the independent variables combinedexplain 19.4%of the variability of the dependent variable (Satisfaction) 64 How to read the results? Multiple Linear Regression Output & Interpretation: 3) ANOVA: Determines if this model is a good fit to predict the outcome P-value < 0.005is essential. If not, multiple linear regression can’t be done 4) Coefficients If this regression model was shown to be a good fit, this table permits to generate a regression equation. Only variables with a p-value <0.05 are includedin the equation: Y= B0 + B1*X + B2X2 + … 65 How to read the results? Multiple Linear Regression Output & Interpretation: In this example: - Gender, Health evaluation and Depressive symptoms are the only variables included in the equation as p- value < 0.05 Equation: Memory_Satisfaction = 48.63 - 4.244*Gender + 2.714*Health_Evaluation - 4.451*Depressive_Symptoms 66 Report Writing How to present data results ? 67What is a Report? • a structured document that presents information, findings, or results in a clear and organized manner. • typically includes an introduction, a main body with details or analysis, and a conclusion. • is used to communicate data, research, analysis, or recommendations to inform decision-making or provide insights on a specific topic. 68Organizing and presenting statistical results The statistical results (numbers) should be presented with appropriate visual aids. Histogram • Continuous variable Bar chart • Categorical variable • Vertical or horizontal Pie chart • Categorical variable 69DO NOT “COPY-PASTE” tables and charts DIRECTLY from the output of SPSS. 70Parts of a Report: 1. Introduction • Background for the topic: definitions, statistics, scale overview, etc. • Objective: the aim of the report specifically (Often, it can be known from the title of the scale used). 2. Methodology = Materials and Methods. • Identify the scale if used. • How the scale was generated (details like how we share the scale and organize it). • General interpretation for the total score: how can we know the result of data analysis? Case of a SCALE. 71In any other General study: • Ethical consideration/approval. • Target population (common with the scale case). • Variables (conditions, exposures), outcomes and their types (categorical or continuous). • Statistical tests conducted. • Significance of the P-value (0.05). 723. a. Results: • Mention the total sample size (N). • Reporting the results of the main variables (interested variables). • Reporting the results of the outcomes. Use the graphs, charts and tables. • Write a title for each figure (below) and table (above). • Figures and tables must have ordered numbers. • Write a simple and direct interpretation (significance difference or values interpretation) of the most important results in this visual aid. • Case of total and individual scores (scales): Split the results into 2 parts. 733. b. What numbers must be reported in the tables? Descriptive Statistics: • Categorical Variables: Frequency (n) and percentage (%). • Continuous Variables: mean and Standard Deviation (SD). Statistical Tests: • Two-Sample T-Test: mean, SD for the 2 groups + mean difference, 95% Confidence Interval (CI), and P-value. 74• One Way ANOVA Test: frequency (n), means, SD and P-value. • Pearson & Spearman Correlation: Pearson coefficient and P-value. 75• Two-way Chi-Square Test: frequency (n), percentage (%) and P-value. Scale: • Total Score • Individual scores for each item/question of the scale. 76• Simple & Multiple linear regression: B coefficient, P-value, 95% CI and R square adjusted. 77 Scale T otal Score domains • Tables Total score of the Total score of • Bar graphs scale (overall) each domain • Dot plots Total score of the Total score of scale in each each subdomain academic year in each academic year 78Categorical variable reporting Continuous variable reporting (N=141) Table 3. (N=141) 794. Conclusion. • General conclusion: can be recognized from the total score in case of scale. • Strengths of the results. • Limitations of the results (or data processing). • Recommendations based on the limitations. 805. Tips for Reporting 8182 THANK Y OU If you have any questions, do not hesitate to contact us! razanbyoussef@gmail.com daniel.bounajm496@gmail.com malakbourji060@gmail.com 83