Data Screening: Between Groups Output Assignment

Points: 50

The data preparation for any analysis begins with data screening and tests of assumptions. The data screening ensures the data are correct and the tests of assumptions ensure the data are suited for the type of analysis to be conducted. The Module 3 assignment is to conduct data screening for an analysis of group differences, and then the other assignment for Module 3 is to conduct the analysis of group differences. This post will guide you through the data screening for a between groups test that uses an independent-measures t-test.

As a reminder, an independent-measures t-test tests the differences of a continuous dependent variable (i.e., interval or ratio level of measure) between two and only two groups. The independent-measures t-test assesses the following hypotheses for the research question:

RQ: To what extent, if any, is there a difference of Neuroticism between genders?

H0: There is not a significant difference of Neuroticism between genders.
H1: There is a significant difference of Neuroticism between genders.

Based upon the research question and hypotheses, the variables for this assignment are the independent variable (IV) gender, a categorical variable. Because we will be testing between two groups—males and females—the variable can also be called a dichotomous variable, which simply means a categorical variable with two and only two groups. The second variable is the dependent variable (DV) Neuroticism, as measured by IPIP-50, a continuous variable with interval level of measure.

Prior to beginning the tests, you may consider splitting the file so you are viewing only the classes (e.g., male/female, White/African American, etc.) in which you are interested. Here is a brief video demonstrating how to successfully split files in SPSS.

To guide us through the data screening and testing of assumptions, which I will show concurrently, I will use the following assumptions (Laerd, 2021):

You have one dependent variable that is measured at the continuous level (i.e., the interval or ratio level).
You have one independent variable that consists of two categorical, independent groups (i.e., a dichotomous variable).
You have independence of observations.
There are no significant outliers in the two groups of your independent variable in terms of the dependent variable.
Your dependent variable is approximately normally distributed for each group of the independent variable.
You have homogeneity of variances.

Data Screening: Frequency tables

Frequency tables are used to see the approximate distribution of the variables. The frequency tables will show you how many males and females are in the data set for the IV gender, and an approximate distribution of the DV Neuroticism.

With the frequency tables completed, the remaining data screening tasks are included within the tests of assumptions.

You have one dependent variable that is measured at the continuous level (i.e., the interval or ratio level).

As described in the hypothesis and variable definition, the analysis is a test of whether there is a difference of Neuroticism, the dependent variable, between groups of gender. By research design, the dependent variable is Neuroticism, which is measured by the IPIP-50 with an interval level of measure, which is continuous. Therefore, Assumption 1 is satisfied.

You have one independent variable that consists of two categorical, independent groups (i.e., a dichotomous variable).

As described in the hypothesis and variable definition, the test of group differences will be between groups of gender, which are male and female. By research design, the independent variable is gender, a dichotomous variable. Therefore, Assumption 2 is satisfied.

You have independence of observations.

By research design, the two groups are unique and separate. As such, members from one group (e.g., males) cannot contribute data to the other group (e.g., females). By research design, Assumption 3 is met.

There are no significant outliers in the two groups of your independent variable in terms of the dependent variable.

Outliers are tested by boxplots. An outlier (more than 1.5 IQR) is described in SPSS as a circle and an extreme outlier (more than 3 IQR) is described as an asterisk. To test this assumption, complete a boxplot of DV (Neuroticism) between the IV groups (male and female).

Your dependent variable is approximately normally distributed for each group of the independent variable.

Normality may be tested using a variety of approaches including Shapiro-Wilk test, which is a statistical test, or the distribution may be tested visual, such as by histogram or a normal Q-Q plot. The video will demonstrate all three approaches.

You have homogeneity of variances.

Homogeneity of variances is a test to ensure the variance of the dependent variable for each group is approximately the same. The homogeneity of variances is tested with Levene’s test, which is part of the independent-measures t-test output. A Levene’s test has a null hypothesis of homogeneity of variances between groups, so that a result that is not statistically significant (p ≥ .05) indicates homogeneity of variances, meeting Assumption 6.

The results of Levene’s test are indicated in the first two columns of results, indicated below. In this example, one can see the significance is p < .001, indicating the result is statistically significant for Levene’s test and the result rejects the null hypothesis. These results indicate equal variances cannot be assumed.

Figure 1

SPSS Results for an Independent-Samples t-Test

The following video describes the process of data screening and tests of assumptions 1-5. The test for assumption 6 is shown as part of the Group Differences: Results Assignment.

Writing up the assignment

Review the assignment instructions. You are to identify the two variables you will use in the assignment. These variables must come from the dataset in Week 2 Data Screening Assignment as you described in the Week 2 Quiz: Pick Topic assignment. One must be categorical, the other continuous (quantitative). For this example, the variables are stated below.

Independent variable: Gender (Male/Female), a dichotomous (categorical) level of measurement

Dependent variable: Neuroticism, as measured by IPIP-50, an interval level of measurement

Writing up the narrative should begin with an introduction.

Data screening was accomplished for the variables of gender and Neuroticism from the EDCO 745 course dataset to test the null hypothesis of there is not a significant difference of Neuroticism between genders.

The next sentences will describe the data screening and tests of assumptions and any notable results from each.

Frequency tables were created for gender and IPIP-Neuroticism (see Tables 1-2). Results of the frequency tables indicated slightly more male (n = 704) than female (n = 596) participants (Table 1). Tests of assumptions were completed for an independent-measures t-test. Assumption 1, which states the dependent variable must be continuous variable was satisfied by research design, as Neuroticism is an interval level of measurement, which is continuous. By research design, the independent variable is gender, which has two groups of male and female and is a dichotomous variable. Therefore, Assumption 2, which is the independent variable must be categorical with only two groups, is satisfied. By research design, the two groups are unique and separate. As such, members from one group (e.g., males) cannot contribute data to the other group (e.g., females). By research design, Assumption 3 is met. Assumption 4 is that no outliers may be present, which was tested by boxplots of Neuroticism between the two groups of gender. Outliers were present; however, they were not removed. Assumption 5, which is the distributions of the dependent variable for each independent variable group must be normally distributed, was tested visually using histograms for distribution of Neuroticism for each gender group. Neuroticism was normally distributed across both groups. Assumption 6 is the groups must show homogeneity of variances, which was tested using Levene’s test. The results of Levene’s test were statistically significant; therefore, equal variances cannot be assumed. To accommodate for the violation of this last assumption, the equal variances not assumed row of the independent-measures t-test (Welch’s t-test) will be used to test for differences.

Note that within the write up, each line references the table or figure that supports the statement being made. This also requires one to correctly label each of the tables or figures prior to submitting the assignment. Also, please note in the sample write-up that the tables are correctly formatted according to APA and they are not directly copied from SPSS, which are not in APA format.

Submitting the assignment
When submitting this assignment, you must first describe the data screening assignment. The write-up should be descriptive of the variables and the activities used to screen the data, along with a description of the results. All submissions must be a single Microsoft Word document. Do not submit the SPSS file.