# Statistical Thinking as a Volunteer Teacher (2)

I desired to know how the grading method of each test given to each of the three classes influenced the class average score for each test.

The first step was to create a suitable measure of the grading model used for each test. I conceived of two complementary measures:

**The Question Count (QC) measure**which counts the total number of test question ‘demands’. For instance, if question one is and question two is , the QC is 3, as three different things are to be outlined by the student. Likewise, if a question asks to give and explain one component of culture, the QC is 2 (one for giving the component, and another for explaining). The reasoning is that with greater QC, there is greater demand on the student’s memorization ability.**The Question Value Concentration (QVC) measure**which measures the extent to which the grade value of each question (in QC terms) is dispersed or concentrated. This is measured as the Coefficient of Variation (CoV) of the grave value of each question. The reasoning is that with greater concentration of marks within one or few questions, the student would have a greater ease of passing if they focused on those ‘priority’ questions. Hence, the combination of a low QC and high QVC (i.e. higher CoV) would bode best for the student in terms of ease of passing the test.The interaction variable I thought would be highly important is based on class average memorization ability (CAMA) – the degree to which a student can memorize and remember things with ease. I measured this by taking random (identified using EXCEL’s random number generation operation) samples out of each class, and then giving them an indicative test. This test had ten random items (selected carefully by me; a mix of mono- and di-syllabic words) which I listed to the participant, and they were to repeat those items immediately I was done listing. The MA is simply the percentage of items remembered out of all items I listed. The interaction variable is simply QC divided by CAMA – a measure of memorization burden. The greater is QC/CAMA, the greater the burden of memorization on the student relative to their memorization ability.

Due to the fact that this was a comparison of class averages, there could only be six data points (JSS1, JSS2 and JSS3 class average scores for each of the 1st CA test and 2nd CA test). Hence, regression analysis could not be used, and I had to rely solely on simple correlation analysis.

The resulting correlations showed that the signs of all the Pearson coefficients corresponded strongly with the theoretical expectations:

Hypothesis 1: The greater the test QC, the lower the expected average score (i.e. a negative relationship. Evidence: Correlation between QC and average score is -0.29 and is statistically significant

Hypothesis 2: The lower the CoV, the lower the expected average score (i.e. a positive relationship) Evidence: Correlation between CoV and average score is 0.36

Hypothesis 3: The greater QC/CAMA, the lower the expected average score (i.e. a negative relationship) Evidence:Correlation between QC/CAMA and average score is -0.31

Interestingly, the interaction variable with the highest correlation coefficient is (QC/CAMA)/CoV at -0.48. This suggests that the ultimate conclusion is that a greater question count relative to memorization ability may lead to lower expected score, but more so when the dispersion of question marks is higher (i.e. concentration is lower).

The systematic correspondence of the coefficient signs to their theoretical expectations is strong evidence for the non-spuriousness of the results; and the statistical significance supports the preliminary adequacy of the results as providing reliable information in the absence of more comprehensive analysis from a more expansive data set.

The practical implication is that I have to consider more carefully the technicalities of setting test questions and the individual values of each question. There is significant cross-class variation in memorization ability which interacts with my grading system to become possibly an important determinant of cross-class variation in test performance – thereby imposing exogenous constraints on students. In fact, the memorization ability class averages show that CAMA rises with class level, although intuitively at a decreasing rate (JSS1 = 52, JSS 2 = 66, JSS3 = 68) – although t-tests show ambiguous evidence for statistical significance of the differences (while F-tests for difference between variance strongly indicate no differences in the MA variance between each pair of classes).

Additionally, I found that there is high correlation (0.79) between the standard deviation of class scores within each class and the standard deviation of MA within each class (however, the number of observations is very small – n = 3). Yet, the results from the previous correlation analysis, as well as insights from my previous analysis recorded in my last blog post reinforce this finding and imply that I really need to pay attention to the variance of students’ ability within each class by adopting more diverse and inclusive teaching styles and techniques, and I must recognize cross-class variations and have a more inclusive grading system that takes into account cross-class variation in average abilities as well as within-class variance in testing abilities.