Вы находитесь на странице: 1из 25

Running header: STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 1

Students’ Ratings of Teachers in Higher Education: A Reliable Measurement of Instructional

Quality?

Fadi El Kallab

American University of Beirut


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 2

INTRODUCTION

In a conference I attended last week at the American University of Beirut, entitled

“Implications of Widening Access to Higher Education in the Arab World: Implementing a

Holistic Lifelong Learning Policy”, the speaker noted that the higher education sector in the

region is still characterized by discrepancies between countries at different levels, namely

enrollment of students, admission criteria, learning strategies and program outcome

expectancies… The speaker concluded that much is still needed in higher education to ensure

just access and widen enrollment opportunities.

Despite the many challenges, there is no doubt that the higher education sector in the

Arab world is a growing one, and that universities will achieve in the future higher

enrollment figures due to the increasing awareness of the importance of attending colleges

and its impact on opportunities in the labor market. Consequently, universities will have to

provide excellence in teaching and foster a learning atmosphere that allows for students’

personal and professional fulfillment and equips them with the necessary knowledge for the

labor market. From this perspective stems the importance of designing and implementing

innovative curricula and part of this is a constant review and evaluation of the programs,

namely the evaluation of the quality of the delivered instruction

Very nice introduction to your topic.

Traditionally, students’ evaluations of teachers in higher education have always been

used as an essential tool for measuring instructional quality through rating the course and

instructors’ performance. These evaluations have also played an important role in instructors’

career-related decisions where administrators have relied on them for salary, tenure and

promotion decisions of faculty. The aim of this paper is to examine the reliability and validity

of using such a measurement, through examining some of the factors that are known,
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 3

according to the written literature about the topic, to question this reliability. In fact, many

researchers have argued that these are taxed with many biases and that using them as the

basis of faculty career-related decisions can be unjust as they do not mirror reliably

instructional quality. Being someone who is involved in university teaching, this topic

acquired great significance as it gave me better insight of the driving factors behind such

ratings.

Historically, students’ evaluations of teaching date back to the early twentieth century

when researchers started to examine if they corresponded to those of peer and alumni

(Wachtel, 1998). In that same study, the writer devises four stages that characterize their use

in that same century. In fact, according to the author, in the period from 1927 till 1960, they

were studied primarily by researchers at Purdue University and in the 1960s, students’

participation in them was optional. The 1970s marked their endorsement by researchers as a

worthy and valid tool to measure instructional quality and in the 1980s till the end of the

twentieth century, research on the topic developed considerably confirming the validity of

such evaluations and addressing methodological apprehensions. Interestingly, the author

points out to the rich and contradicting argumentation in the literature with and against their

use, expressing mainly teachers’ doubts and reservations as to their validity, as well as

detailing the various variables that are known to have interfered with it.

In what follows, a selection of four empirical studies will highlight some of the

factors that are known to influence students’ ratings of teachers, thus causing biased results

and challenging their validity. Each of the studies will reach certain conclusions that either

confirm or reject the hypotheses formulated by the writers at the beginning. However, the

findings, as suggested by the writers themselves, may not be generalizable due to constraints
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 4

inherent to the experimental conditions or as evaluated, due to methodological and

interpretational flaws.

A REVIEW OF FOUR STUDIES

In the first study, Ambadi and Rosenthal (1993) attempt to weigh the impact of

nonverbal behavior of teachers on the students’ end-of-semester ratings. Their method

consisted in showing short excerpts of silent videos of instructors to judge-students and ask

them to rate them on different behavioral aspects. These ratings would next be compared to

actual end-of-semester students’ ratings to analyze the degree to which they can predict them.

To reach significant conclusions, the authors adopt a structured methodological approach. At

first, they start by examining if persons can form consensual judgments of strangers just by

watching short silent video clips of them. The influence of physical attractiveness and

nonverbal behavior is examined to assess their impact on such judgments. Then, they

evaluate these judgments to determine if they can predict an external variable, in this case,

students’ ratings of teachers, a criterion that they discuss the external validity to determine if

it is a worthy measure of comparison.

The writers start by restating the findings of previous studies that examined the

accuracy of judgments of strangers regarding their character traits. These in fact

demonstrated that consensus of judgments of strangers can be reached to a high degree,

especially when compared to persons’ self-reported traits. However, previous studies used

mostly videos with sound. The authors believe that sound is a constituent that that can

interfere with forming a valid judgment on the effect of nonverbal behavior and physical

attractiveness which may be skewed by it. These conclusions formed a valid starting point for

the writers’ study. According to them, a worthy terrain of exploration that extends previous

research is to form judgments through rating behavior based on silent videos then compare
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 5

these judgments to a new variable: students’ ratings. The argument provided to underline the

validity of this new variable is that they are consistent with time and have a real life

significance as they are used for promotions, tenure and salary increase decisions.

The college teachers part of the research were six women and seven men at the

graduate level, giving classes to undergraduate students at a teaching laboratory center where

the performance was being observed by members of that center. The classes given were

already videotaped ones. In an attempt to vary the statistical sample of teachers, the authors

chose ones that had received different observations and ratings from members of the teaching

center. The sections covered were diversified and included humanities, languages, social and

natural sciences and contained a number of students that ranged between 8 and 20. In each

teacher’s case, a total of 30 seconds of silent video clip was prepared, 10 seconds for each of

the starting, middle and ending moment of the class.

The ratings were divided into two categories: ratings of the “molar nonverbal

behaviors”, these being overall personality traits (attentive, competent, active…) and ratings

of “molecular nonverbal behavior”, these being behavioral gestures at the micro level and

included observing head nods, head shakes, smiles, frowns... The raters of each category

were different, but all females. In fact, nine paid undergraduate students were asked to rate

the teachers individually in the first category and two females in the second one. The authors

justify the female raters choice based on a study that concluded that women were better

deciphers of nonverbal behavior than men. Furthermore, to measure the effect of physical

attractiveness in the process and determine if judges’ ratings and students’ ratings

correlations may be credited to it, two female judges were asked to rate the physical

attractiveness of teachers based on a still video shot.


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 6

The results of the research were interesting. In fact, out of the fifteen observed “molar

nonverbal behavior”, ten correlated significantly with the students’ evaluation of teachers:

teachers who received higher end-of-semester evaluations were judged as more active (0.77)

enthusiastic (0.76), confident (0.82), dominant (0.79) optimistic (0.84) (Results significant at

0.76, p<0.005). The other dimensions that did not reach statistical significance correlated also

with the students’ ratings criterion. On the molecular level, the results obtained were studied

to determine their influence on the ratings of molar behavior and students’ ratings of teachers.

These results show that frowning (-0.61) had a significant influence on global molar ratings

(Results significant at -0.61, p<0.5 and that fidgeting also negatively correlated with teacher

effectiveness rating. Some molecular behavior however, was slightly positively related to

students’ ratings (laughing for example 0.34). As for physical attractiveness, the results show

that it had a limited effect on students’ ratings.

Overall, the results show that strangers’ ratings of teachers aligned significantly with

the ratings of students who had spent an entire semester with these teachers, a finding that

underlines the importance of nonverbal behavior as a communicator of influential messages

to targets in an educational setting, and that these results remained valid even after having

accounted statistically for the “physical attractiveness” dimension.

In an attempt to analyze further the impact of nonverbal behavior on students ratings

of teachers, the authors decided to reduce the length of each of the previous video clips from

10 seconds to two sets of five and two seconds. The shorter videos were found to correlate

positively with students’ ratings and as accurately as the previous longer 10 seconds clips.

Here too, according to the authors, contrary to findings of other research, there was no

significant relation between physical attractiveness and students’ ratings of teachers.


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 7

Although the writers admit that the results of the present research may not be

generalized to hold true in an everyday life situation, due to the specific academic setting in

which the study was conducted, however, this study shows that taking into account specific

criteria, the judgments of strangers can be accurate if based on behavioral observations. The

results also show that students’ ratings of teachers might not be necessarily based on

academic competencies of that teacher but on “affective” nonverbal behavior communicated

to students. Teachers who are better communicators of nonverbal information may be highly

rated by students.

EVALUATION OF THE STUDY: I like that you did a search on the authors of the

articles as this gives a broader picture but we usually evaluate the study itself (in its content

and format) and not the authors. A simple search on the internet demonstrates that both

writers are notable academic authorities in the subject discussed. Nalini Ambadi, who died at

the age of 54 from leukemia, was a professor who taught in the department of psychology at

Harvard, Tufts and Stanford, and an expert of nonverbal behavior and interpersonal

perception. Robert Rosenthal, also a professor at the department of psychology at the

universities of Harvard and California, specializes in nonverbal communication as well. The

abundant existing literature about nonverbal communication assesses its impact in contexts

ranging from the personal to the professional. However, this study comes to complement the

few ones that have attempted to measure its impact in an academic context. Although, what

preceeded underlines the seriousness of this work, in my opinion, there are some factors that

may question the resulting findings. The writers consider the students’ ratings variable as an

externally valid one for comparing and confirming the influence of nonverbal behavior on

judgments. Whereas students’ ratings have gained an increasing importance recently,

especially in academia, a great deal of literature suggests their vulnerability by underlining


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 8

the influence of many factors on their results. In fact, class size (Bedard & and Kuhn, 2008),

grading leniency and inflation (Greenwald &and Gillmore, 1997) are some of the variables

that have been determined to weigh on these results. As for the methodology, in my opinion,

some gaps may add further to the questioning of the findings. The writers claim that the

choice of female judges is justified by the fact that they are considered as better decoders of

nonverbal behavior. However, Canary and Babin-Gallagher (2015) claim that certain

dimensions of nonverbal behavior are better deciphered by men than women Good point.

Furthermore, I think that nonverbal behavior can only be understood contextually from both

males and females. Using silent clips that are edited randomly and extracted from the ambient

context may convey inaccurate behavioral messages that will distort the experimental

conditions.

Perrett (2013) examines another factor that is thought to interfere with the validity of

students’ ratings of teachers in higher education. He explores and compares through an

empirical study two different environments in which students’ ratings can take place: the

online administration of ratings as opposed to the on paper one, as well as all the concerns

and difficulties related to the online environment.

At first, the writer starts by stating the findings and conclusions of previous research

papers that examined the challenges of administrating students’ ratings via an online

environment compared to a class administered one. In fact, according to the author, some of

these studies suggest that online administration resulted in significant lower response rates

and asked for a better preparation of students and staff to the completion of this procedure.

Oppositely, other researches showed that an online administration with a satisfying logistical

preparation can yield good response rates with noticing of a higher participation rate of

students with higher academic records, thus challenging the common teachers’ suspicion,
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 9

regarding online students’ ratings, of a higher participation of students who received low

grades as a means of getting back at their teacher.

In his study, the author proposes to use the class as a unit of analysis, contrary to

previous studies where independent students constituted that reference unit. In fact, a teacher

with a poor academic performance would be viewed as such by almost all his students. Using

the class mean of students’ responses and the appropriate statistical methods allow for the

successful implementation of this hypothesis. The author formulates a seven-point scale of

closed-ended questions regarding different evaluation aspects of course and instructor. He

addresses three concerns to be analyzed: 1- the influence of online versus on paper answers

on the response rates 2- Students with low attendance rate will distort the course and

instructor ratings in an online evaluation context since their opinion will be an uninformed

one. 3- The concern that undergraduate students would give the same answer to most

questions, not taking enough time to understand the question thus forming a general

impression, as opposed to graduate students who would have more varied responses.

To address the previous issues, the writer formulated a set of hypotheses opposing the

online and class administered environments, to which statistical models were applied to study

their validity and significance. These are: 1- Course and instructor ratings are lower in an

online-based environment. 2- Response rates are lower as well in the same environment. 3-

Attendance is a determining factor of course and instructor ratings. 4- Varied answers are

found less in undergraduate students’ answers than graduate ones.

The study conducted was at a large university during the fall semester of 2009 and

touched only statistics courses. In a multiple-section course setting, using a random choice,

half of the sections would take the online survey while the other half the paper-administered

one while in a one-section course context the choice of the type of survey was left to the
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 10

instructor’s care. In the final result, thirty six undergraduate sections and twenty five graduate

ones were administered the online survey, whereas nine undergraduate sections and eleven

graduate one were given the paper survey.

The results collected show first that the mean course ratings between online and on

paper surveys were not statistically different (online undergraduate course rating 3.7 and the

paper one 3.5/Online graduate ratings 4.5 and on paper one 4.3), underlining the fact that

online administration does not distort such ratings. Undergraduate ratings averaged less than

graduate ones because at the graduate level, courses rated could be part of a major program.

Concerning response rates, both contexts displayed similar results going against the

unfounded belief that online responses will retain a lower students’ participation (online

undergraduate 71.1%/ paper undergraduate 68.8% and online graduates 60.4%/on paper

graduates 61.8%). As for the effect of attendance on ratings, and after examining the answers

of one item in the questionnaire that measures the course attendance of students, the findings

of the present research concluded that no relation exists between attendance and ratings. As

to analyzing if students would give little thinking to the questions and write general

impressions, as well as comparing the answers between graduate and undergraduate students,

the results collected showed that variability of answers among students was approximately

the same between graduates and undergraduates students (variability of answers averaged

over course section is 2.14 for undergraduates and 2.07 for graduates).

According to the author, the conclusions of this study come to prove that online-based

evaluations and on paper ones do not differ significantly as to their impact on students’

ratings, making online surveys as eligible as the on paper ones in terms of administration.

The writer however underlines at the end some limitations as to the generalizability of the
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 11

study, citing group sizes used in the research and class size in each group as possible

influencing factors.

EVALUATION OF THE STUDY: Jamis Perrett is a professor of Applied Statistics and

Research methods at the University of Northern Colorado. His research interests include

statistical computer programming and applied statistics. In my opinion, the background of the

researcher clearly focuses the interest of the study on developing the appropriate statistical

approach to measure each of the proposed hypotheses regardless of taking into account the

different surrounding variables. In fact, and to point out some, the stark difference in group

sizes (as indicated by the writer himself) between undergraduates in both contexts (36

sections online versus 9 sections on paper) and graduates (24 sections online versus 11

sections on paper) is a serious compromise to the experimental conditions. Furthermore, the

class size in each section is not mentioned, thus diluting the possible biasing effect of this

mediating variable on the findings (Bedard &and Kuhn, 2005). The sole use of statistics

classes in this study may also question its findings as course interest and liking has been

determined to have an indirect effect on the global ratings of instruction as will be determined

in the results of the next study.

There’s no doubt that the behavioral performance of instructors in the classroom

weighs greatly on students’ perceived quality of instruction and class. Greimel-Fuhrmann and

Geyer (2003) propose in this third empirical-based research to assess teaching behavior’s

impact on end-of-semester students’ ratings. Several mediating biases are explored on the

way, namely, students’ opinion of the ratings and their consequences, the liking of the teacher

and the course itself.

As customary to any empirical study, the writers start by discussing the previous

literature that deals with the topic and underline the fact that there’s little consensus about the
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 12

validity and reliability of students’ ratings, especially that other findings determined that the

ratings are not related to effective students’ learning. The literature ascertains as well several

biases on these ratings, among possible ones are grading leniency and the liking of teacher.

According to the authors, their intention in the present study is to assess the impact of interest

in course subject and the liking of teacher as well as other possible biasing variables such as

school climate on the global ratings of instruction. The study was conducted in Austrian

commercial colleges using accounting classes and the direct instruction model as basis for the

research with 2,121 participating students.

The writer’s primary objective aims at understanding students’ point of view of good

instructional quality as well as their opinion of ratings themselves. Unlike the previous Perrett

(2013) research, they decided at this stage to generate hypotheses to address these issues

through conducting interviews with students instead of formulating ones and then testing

them. In a second stage, a questionnaire was also designed to understand the factors that

weighed on the ratings of teachers. Finally, the complete version of the questionnaire that was

used for the purpose of the present study contained several items that covered aspects of

general evaluations of teachers, items that measure specific teaching performance, and other

items that measure possible biasing variables.

The parameters of good teaching concluded from the previous interviews showed that

a good teacher in general is one that provides understandable explanations supported by

examples, fosters a learning environment with concern as to students learning and has

positive character traits such as humor and friendliness. As for the students’ opinion of the

usefulness of students’ ratings, one third of the students questioned their validity and

usefulness and believed that they do not lead to a change in instruction’s behavioral

performance.
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 13

The collected data from the previous qualitative study about instructional behavior

was classified according to three factors: 1-subject-oriented instruction, 2- student-oriented

instruction and 3- classroom management, each of which was tested to determine if it

correlated with global ratings of teachers. Using a bivariate correlation coefficient of teaching

behavior with the evaluation of teacher, the results show that there is a correlation between all

three factors and global evaluation of teacher, with a strong correlation in the case of subject-

oriented teaching behavior (0.707/significant at the 0.01 level). A high correlation of some

biasing factors that influenced students’ ratings of teacher were also noted in the study at the

level of interest in the course (0.507), liking of the teacher (0.687) and students’ estimation

that the teacher takes the evaluations seriously (0.499) , and a lower correlation with general

factors such as class well-being (0.154). To measure the simultaneous effect of biasing

factors and mediating variables on students’ ratings, the writers developed another statistical

model based on exploratory factor analyses. In this new statistical setting, subject-oriented

behavior still retained the highest direct effect on students’ ratings. Subject-oriented behavior

was found to have an indirect effect on ratings of teachers through interest in the course

subject and liking of the teacher. Student-oriented behavior was found to be highly correlated

with liking of the teacher, and its indirect effect on students’ ratings is much higher than the

direct calculated effect. Other indirect variables such as school climate were found to be

insignificantly correlated with students’ ratings.

In conclusion, the study validates that subject-oriented teaching behavior proved to

impact most students’ ratings as did students’ interest in the course. The main three

categories of teaching behavior described previously were found to have also indirect

influences on students’ ratings through mediating variables such as liking of the teacher and

interest in the course. It can be also concluded that sSubject-oriented class instruction is the
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 14

one favored most by the students, a finding that invites teacher to organize their classes

around, among other aspects, clear explanations and efficient examples.

EVALUATION OF THE STUDY: Bettina Greimel-Fuhrmann is an assistant professor

at the department of Business Education at the Vienna University of Economics who is a

researcher in the fields of evaluation and instructional quality and Aloys Geyer is a professor

of Operations Research at the same university interested in methodological issues in

empirical research. In my opinion, this interdisciplinary approach of the analyzed topic is a

welcomed attempt that adds to the objectivity of the methodological approach. However,

some the research’s conditions may cast some doubts as to the generalizability of the results.

In fact, the experimentation was conducted in a single-course context (accounting class), and

the results here show an indirect effect of course liking on overall students’ ratings, which

means that students who do not like accounting classes but are forced to take it part of their

studying program will give lower global ratings. As indicated in the study, the sole

instructional model examined here is the direct instruction one, which also means that using

another instructional strategy may also alter the findings. Finally, the research analyzed the

factors that may influence global ratings and yielded results without comparing them to an

external criterion such as actual students’ ratings, and this in my opinion leaves the results at

the hypothetical level.

In this fourth study, MacNell, Driscoll and Hunt (2014) retackle the ever-present

concern of the effect of gender and the bias it implies on sStudents’ ratings of their teacher.

The recent date of publication of this research suggests the timeliness of the topic and

underlines the fact that it adds to the doubts as to the validity of students’ ratings in higher

education. In this study, the authors propose to measure the gender effect using an online

class setting as a primary tool of experimentation, where an assistant instructor will give two
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 15

classes using different gender identities and then interpret the ratings given by students. This

environment allows researchers better insight of gender effect, where the same instructor can

be presented to students through different identity names and photos, thus permitting them to

measure the effect of the “perceived gender of the instructor” variable. Although they

consider their research a pilot study however the results collected acquire a significant

meaning in their opinion.

According to the authors, the concept of gender is not linked to the biological

connotations of sex (male and female) but it rather acquires a social or cultural role and is

linked to expectations within those specific categories. In fact, western societies still consider

women inferior to men in terms of capacities and ability to achieve, and that a university

setting does not grant any exceptions in this regard: students, on different occasions, have

always favored male instructors. In this context, the authors find that previous studies about

the same topic presented some methodological gaps. Their findings have always concluded

that there are differences in students’ ratings between male and female teachers, without

however firmly attributing these findings to gender bias. Furthermore, what may question the

previous results are flaws in the design setting itself of these researches. The educational

context in which the studies take place usually may bear influencing biases too: in fact, male

instructors seem to hold an advantage in higher education institutions when it comes to

instruction positions as well as nature of classes taught. The online class environment

answers previous concerns and presents the advantage of isolating the mediating variables of

teaching behavior and style, always present in a face-to-face class context.

In an educational environment, male instructors in general are expected to be

professional and objective, whereas female instructors are assumed to be warm and

accessible. The consequent students’ ratings will have at their center such expectations and
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 16

will bear in them any signs of violations of such traits. The impact of gender on the difference

of ratings is further underlined as male instructors who do not seem to adhere to their

expected gender role are not sanctioned by students compared to female instructors who do

not. The obligations in higher education are more taxing on females as compared to male

instructors: a female instructor will not be taken seriously if in above its expected gender role

she does not assume an attitude of “professionalism and objectivity”. Such gender differences

and perceptions manifest themselves particularly in students’ ratings. These hypotheses are

important to mention here because, according to the authors, the results collected will

confirm them.

The experimental design was conducted in an introductory anthropology/sociology

course, part of a summer program in a high population university and built around four

discussion groups taught by one assistant male and one assistant female. The group sizes

ranged between eight and twelve students. Each assistant was in charge of two groups in

which his/her actual gender identity was revealed in one group and changed in the other

group. Course content was delivered through a “learning management system” in the form of

assigned readings and presentations and the contact with instructors was either through

emails or posted comments. Instructors, who created similar biographical information, kept

close presence with each discussion group giving necessary comment and feedback and

diversifying their teaching strategies, agreed on the same grading criteria, as well as posted

grades at the same time. The authors are trying here to minimize the mediating probable

effects teaching styles underlined in other studies. At the end of the semester, students were

asked to rate the instructors on different aspects and over 90% of the students submitted their

feedback. Instructors were evaluated on twelve aspects that addressed teaching in relation to

effectiveness, interpersonal traits, communication skills and overall quality as a teacher. The
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 17

authors, using specific statistical models confirmed that the questions will measure the same

variable which is the overall “evaluation of instructor’s teaching”. Gender bias evidence was

analyzed through comparing the mean results of students to these questions in actual gender

category and perceived gender.

The data collected revealed interesting information in several regards: the presence of

a significant difference in the ratings between perceived male and actual female instructors

(obviously the same person) and no significant difference between the real male and female

instructors. Significant differences are observed also when comparing the perceived male to

the perceived female ratings, the latter show lower ones on all twelve measured aspects with

statistically significant differences that relate to enthusiastic, fair, gives praise, professional,

prompt and respectful aspects. The mean of students’ ratings by actual and perceived

instructor gender shows a significant difference only between the two perceived genders

where the perceived male received 4.24 and the perceived female 3.7 on a scale of 5. It’s

interesting to note that despite the fact that both male and female instructors made sure to

post their feedback at the same time, the perceived female instructor received a 3.55 on a

scale of 5 and the perceived male 4.35. To confirm the hypotheses formulated above, the

authors noted as well that the perceived male instructor achieved higher ratings on

interpersonal skills, suggesting that females who are expected to have these skills are not

rewarded when they display them, and males, who are associated with different expectations

achieve higher ratings when they display such skills.

In conclusion, the authors think that these findings should be accounted for in tenure

and promotion of women instructors and question once again the validity of students’ ratings,

despite the fact that they weigh on any career-related decision of instructors. Therefore,
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 18

according to them, there should be a reevaluating of the ratings system for assessing

instructional quality.

EVALUATION OF THE STUDY: The brief biography of the authors written on the

first page of the article, shows that all three of them are sociology academicians. On a closer

level, one of MacNell’s fields of interest resides in food access and justice, Driscoll focuses

on the environmental impacts of industrial agriculture and Hunt researches about engaging

teaching practices in an academic environment. In my opinion, the findings here confirm, as

suggested by the authors themselves, previous researches’ findings about the role played by

gender in biasing students’ ratings. However, despite the claim of the authors that the

findings may be generalizable and have an external validity, I would like to point out to some

gaps in the methodology and the results as well. To restate a previous finding cited earlier,

the limited variability of the type of classes used in the experimental setting is a mediating

factor that may bias the ratings themselves. In addition, the low number of students per

section and the number of sections themselves play against the research conditions. In fact, it

would be interesting to study the ratings in the context of Massive Open Online Course

(MOOCs) classes where enrollment rate of students is very high per class and courses offered

are much more diversified. Another methodological concern maybe that the instructors

themselves should not have been aware of the experimental conditions in fear of having

biased the results, even unintentionally, when interacting with students. Furthermore,

knowing the students’ gender distribution of each section would have been important in this

study’s context. As for the results, I think that generalizing gender bias findings in any

research should not just be at the perceived gender level, rather it should transcend it to actual

gender as well. In the results of the study, we surprisingly notice that in the actual male and
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 19

female category, students’ ratings on most aspects were higher (even if not significantly) for

the actual female.

CONCLUSION

The present paper attempted to assess the worthiness of students’ ratings of teachers

in a higher education context. The four studies reviewed here analyzed some of the factors

that may bias these findings and make the students’ ratings measurement used in this context

unreliable. In fact, Ambadi and Rosenthal (1993) evaluated the impact of nonverbal behavior

and physical attractiveness on these ratings only to conclude that movements and facial

expressions communicate implicit messages captured by students, regardless of the academic

competencies of teachers. Perrett (2013) examines the administration of the ratings factor in

two different environments: online administration as opposed to the in-class one to conclude

that, contrary to other research findings, both environments yield significantly similar ratings.

Greimel-Fuhrmann and Geyer (2003) examined the teaching behavior and style factor and

determined that subject-oriented class instruction is the one that is preferred by students and

guarantees high ratings. Finally, MacNell, Driscoll and Hunt (2014) examined the gender

bias factor on students’ ratings and concluded that it questions their impartiality. As evaluated

earlier, all the findings of the above-mentioned studies hold true in specific narrow contexts

that may not be generalizable and are plagued with deficiencies at the methodological and

interpretational level, which leaves the reader with a sense of firm impression that they

should not constitute solely the basis of measuring instructional quality.

What adds to the unreliability of these ratings is a personal experience in a higher

education context. In fact, I teach the same general music appreciation class at the Lebanese

American University to two consecutive sections on the same days, using the same method of

instruction and naturally the same teaching behavior and style. At the end of the previous
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 20

semester, my students were asked to rate my instruction on different aspects. Using an online

administered survey, and almost equal responses, the ratings were marked with some

interesting differences (Appendix 1). In fact, in the satisfaction level evaluation of the course,

the third item that assesses how well I manage class discussions was different by 15.6%

between sections. Item number eight that asked students if I evaluated their performance

fairly was also different by 15.6%. In the last item evaluating my performance, the ratings

were different by 20%. The overall course evaluation rating was also different by 31.2%, a

significant difference in my opinion. As for the class workload item, the difference between

sections was also 15.4%. Naturally, these differences are of no surprise and should all be

understood through the lens of the numerous biases highlighted in the previously discussed

studies.

Centra (1973) compares teachers’ self-ratings in college with their students’ ratings,

to conclude at the end that they presented notable differences and that teachers tended to

overrate themselves in general. While the comparison used in his study stands solely to

demonstrate a theory, it’s the comparison concept itself that advances, in my opinion, a

suitable solution to the investigated topic in the present study. In fact, if students’ ratings of

instructional quality in college have succeeded in generating a rich literature in which their

validity was debated, using them solely in faculty essential career-decisions may

disadvantage persons with serious academic competencies. An approach that envisages

constant comparison and cross-checking of ratings may be the best solution to measure

instructional quality.

I propose a three-part framework as to assessing properly the quality of instruction

based on the following scheme. As a first step, conducting formative surveys to generate

periodical ratings based on course units rather than summative end-of-semester ratings based
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 21

on the entire course may yield more consistent results. The final ratings could be calculated

on the basis of the mean of the previous ratings. The rationale is that students assessing

teachers periodically have always in mind short term class performances of their teacher and

his/her explanations of various course units which will allow them to give better judgments

on the questioned aspects. Second, cross-checking a teacher’s ratings with other individuals’

ratings of the same teacher results in creating an informed opinion about instructional quality

and justifies any significant decisions on a career-related level. In this regard, the course

coordinator has an important role to play. In fact, the course coordinator is a key figure within

departments that has specific set of responsibilities related to the course. Many colleges

assign them with important tasks related to students’ learning and course faculty. Although

course coordinators may not have employment-decision privileges, their close relation with

course students and course faculty enables them to form accurate ideas about course progress

as well as acquire good knowledge of course faculty, character traits and instructional

strategies through the occasional attending of their classes. The second part of the solution

framework I propose consists in asking the course coordinator to participate in the same

survey as students’ and rate the teacher on the same aspects and compare both ratings later.

Naturally, these two solutions will generate constraints related to the additional

administrative processes due to the high amount of data to be collected and analyzed, which,

as a third and completing part of the framework, urges for the creation in each college of a

department of instructional excellence and feedback that has the task of collecting all

instruction-relevant feedback (students’ ratings and those of the course coordinator for

example) and analyze them and draw adequate conclusions to be used in building a strategy

of quality-instruction. Within the responsibilities of this department, falls also the task of

helping teachers improve their teaching performance through making them aware of the
STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 22

significance of the ratings and provide them with training on different instructional aspects

such as, class management, instructional strategies, efficient interaction with students,

nonverbal skills…

The proposed three-part framework advances a possible alternative to the long

debated students’ ratings and constitutes an efficient solution to measure reliably instructional

quality. This way, administrators are given a firmer ground to stand on for any faculty career-

related decisions.

Grade: 27/30

Fadi,

Your topic is very important in the field of education, and I read it with keen interest.

You did a good job of summarizing and criticizing the research studies. Overall, this is a very

well-written paper with relevant discussions. One comment related to the discussion, it would

be helpful to organize your conclusion section into different subsection, where each

subsection would address a certain issue/theme that was addressed in the research articles.

REFERENCES

Ambadi, N., & Rosenthal, R. (1993). Half a Minute: Predicting Teacher Evaluations From

Thin Slices of Nonverbal Behavior and Physical Attractiveness. Journal of

Personality and Social Psychology, 64(3), 431-441.

Bedard, K., & Kuhn, P. (2008). Where class size really matters: Class size and student ratings

of instructor effectiveness. Economics of Education Review, 27(3), 253-265.


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 23

Canary, D., & Babin-Gallagher, B. (2015). Interpersonal Communication, Sex and Gender

Differences in. In The Concise Encyclopedia of Communication (1st ed., pp. 298-

299). John Wiley & Sons.

Centra, J. (1973). Self-Ratings of College Teachers: A Comparison With Student Ratings.

Journal of Educational Measurement, 10(4), 287-295.

Greenwald, A., & Gillmore, G. (1997). Grading leniency is a removable contaminant of

student ratings. American Psychologist, 52(11), 1209-1217.

Greimel-Fuhrmann, B., & Geyer, A. (2003). Students' Evaluation of Teachers and

Instructional Quality--Analysis of Relevant Factors Based on Empirical Evaluation

Research. Assessment & Evaluation in Higher Education, 28(3), 229-238.

MacNell, L., Driscoll, A., & Hunt, A. (2014). What’s in a Name: Exposing Gender Bias in

Student Ratings of Teaching. Innovative Higher Education.

Perrett, J. (2013). Exploring graduate and undergraduate course evaluations administered on

paper and online: A case study. Assessment & Evaluation in Higher Education, 38(1),

85-93.

Wachtel, H. (1998). Student Evaluation Of College Teaching Effectiveness: A Brief Review.

Assessment & Evaluation in Higher Education, 23(2), 191-211.

APPENDIX 1: RATINGS OF STUDENTS IN A MUSIC CLASS

1- FALL 2014 – SECTION 13:

Project Audience 31

Responses Received 9

Response Ratio 29.03%


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 24

1- FALL 2014 – SECTION 16:

Project Audience 32

Responses Received 10

Response Ratio 31.25%


STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 25

Вам также может понравиться