Students' Ratings of Teachers in Higher Education

Running header: STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 1
Students’ Ratings of Teachers in Higher Education: A Reliable Measurement of Instructional
Quality?
Fadi El Kallab
American University of Beirut

STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 2
INTRODUCTION
In a conference I attended last week at the American University of Beirut, entitled
“Implications of Widening Access to Higher Education in the Arab World: Implementing a
Holistic Lifelong Learning Policy”, the speaker noted that the higher education sector in the
region is still characterized by discrepancies between countries at different levels, namely
enrollment of students, admission criteria, learning strategies and program outcome
expectancies… The speaker concluded that much is still needed in higher education to ensure
just access and widen enrollment opportunities.
Despite the many challenges, there is no doubt that the higher education sector in the
Arab world is a growing one, and that universities will achieve in the future higher
enrollment figures due to the increasing awareness of the importance of attending colleges
and its impact on opportunities in the labor market. Consequently, universities will have to
provide excellence in teaching and foster a learning atmosphere that allows for students’
personal and professional fulfillment and equips them with the necessary knowledge for the
labor market. From this perspective stems the importance of designing and implementing
innovative curricula and part of this is a constant review and evaluation of the programs,
namely the evaluation of the quality of the delivered instruction
Very nice introduction to your topic.
Traditionally, students’ evaluations of teachers in higher education have always been
used as an essential tool for measuring instructional quality through rating the course and
instructors’ performance. These evaluations have also played an important role in instructors’
career-related decisions where administrators have relied on them for salary, tenure and
promotion decisions of faculty. The aim of this paper is to examine the reliability and validity
of using such a measurement, through examining some of the factors that are known,
according to the written literature about the topic, to question this reliability. In fact, many
researchers have argued that these are taxed with many biases and that using them as the
basis of faculty career-related decisions can be unjust as they do not mirror reliably
instructional quality. Being someone who is involved in university teaching, this topic
acquired great significance as it gave me better insight of the driving factors behind such
ratings.
Historically, students’ evaluations of teaching date back to the early twentieth century
when researchers started to examine if they corresponded to those of peer and alumni
(Wachtel, 1998). In that same study, the writer devises four stages that characterize their use
in that same century. In fact, according to the author, in the period from 1927 till 1960, they
were studied primarily by researchers at Purdue University and in the 1960s, students’
participation in them was optional. The 1970s marked their endorsement by researchers as a
worthy and valid tool to measure instructional quality and in the 1980s till the end of the
twentieth century, research on the topic developed considerably confirming the validity of
such evaluations and addressing methodological apprehensions. Interestingly, the author
points out to the rich and contradicting argumentation in the literature with and against their
use, expressing mainly teachers’ doubts and reservations as to their validity, as well as
detailing the various variables that are known to have interfered with it.
In what follows, a selection of four empirical studies will highlight some of the
factors that are known to influence students’ ratings of teachers, thus causing biased results
and challenging their validity. Each of the studies will reach certain conclusions that either
confirm or reject the hypotheses formulated by the writers at the beginning. However, the
findings, as suggested by the writers themselves, may not be generalizable due to constraints
inherent to the experimental conditions or as evaluated, due to methodological and
interpretational flaws.
A REVIEW OF FOUR STUDIES
In the first study, Ambadi and Rosenthal (1993) attempt to weigh the impact of
nonverbal behavior of teachers on the students’ end-of-semester ratings. Their method
consisted in showing short excerpts of silent videos of instructors to judge-students and ask
them to rate them on different behavioral aspects. These ratings would next be compared to
actual end-of-semester students’ ratings to analyze the degree to which they can predict them.
To reach significant conclusions, the authors adopt a structured methodological approach. At
first, they start by examining if persons can form consensual judgments of strangers just by
watching short silent video clips of them. The influence of physical attractiveness and
nonverbal behavior is examined to assess their impact on such judgments. Then, they
evaluate these judgments to determine if they can predict an external variable, in this case,
students’ ratings of teachers, a criterion that they discuss the external validity to determine if
it is a worthy measure of comparison.
The writers start by restating the findings of previous studies that examined the
accuracy of judgments of strangers regarding their character traits. These in fact
demonstrated that consensus of judgments of strangers can be reached to a high degree,
especially when compared to persons’ self-reported traits. However, previous studies used
mostly videos with sound. The authors believe that sound is a constituent that that can
interfere with forming a valid judgment on the effect of nonverbal behavior and physical
attractiveness which may be skewed by it. These conclusions formed a valid starting point for
the writers’ study. According to them, a worthy terrain of exploration that extends previous
research is to form judgments through rating behavior based on silent videos then compare
these judgments to a new variable: students’ ratings. The argument provided to underline the
validity of this new variable is that they are consistent with time and have a real life
significance as they are used for promotions, tenure and salary increase decisions.
The college teachers part of the research were six women and seven men at the
graduate level, giving classes to undergraduate students at a teaching laboratory center where
the performance was being observed by members of that center. The classes given were
already videotaped ones. In an attempt to vary the statistical sample of teachers, the authors
chose ones that had received different observations and ratings from members of the teaching
center. The sections covered were diversified and included humanities, languages, social and
natural sciences and contained a number of students that ranged between 8 and 20. In each
teacher’s case, a total of 30 seconds of silent video clip was prepared, 10 seconds for each of
the starting, middle and ending moment of the class.
The ratings were divided into two categories: ratings of the “molar nonverbal
behaviors”, these being overall personality traits (attentive, competent, active…) and ratings
of “molecular nonverbal behavior”, these being behavioral gestures at the micro level and
included observing head nods, head shakes, smiles, frowns... The raters of each category
were different, but all females. In fact, nine paid undergraduate students were asked to rate
the teachers individually in the first category and two females in the second one. The authors
justify the female raters choice based on a study that concluded that women were better
deciphers of nonverbal behavior than men. Furthermore, to measure the effect of physical
attractiveness in the process and determine if judges’ ratings and students’ ratings
correlations may be credited to it, two female judges were asked to rate the physical
attractiveness of teachers based on a still video shot.

The results of the research were interesting. In fact, out of the fifteen observed “molar
nonverbal behavior”, ten correlated significantly with the students’ evaluation of teachers:
teachers who received higher end-of-semester evaluations were judged as more active (0.77)
enthusiastic (0.76), confident (0.82), dominant (0.79) optimistic (0.84) (Results significant at
0.76, p<0.005). The other dimensions that did not reach statistical significance correlated also
with the students’ ratings criterion. On the molecular level, the results obtained were studied
to determine their influence on the ratings of molar behavior and students’ ratings of teachers.
These results show that frowning (-0.61) had a significant influence on global molar ratings
(Results significant at -0.61, p<0.5 and that fidgeting also negatively correlated with teacher
effectiveness rating. Some molecular behavior however, was slightly positively related to
students’ ratings (laughing for example 0.34). As for physical attractiveness, the results show
that it had a limited effect on students’ ratings.
Overall, the results show that strangers’ ratings of teachers aligned significantly with
the ratings of students who had spent an entire semester with these teachers, a finding that
underlines the importance of nonverbal behavior as a communicator of influential messages
to targets in an educational setting, and that these results remained valid even after having
accounted statistically for the “physical attractiveness” dimension.
In an attempt to analyze further the impact of nonverbal behavior on students ratings
of teachers, the authors decided to reduce the length of each of the previous video clips from
10 seconds to two sets of five and two seconds. The shorter videos were found to correlate
positively with students’ ratings and as accurately as the previous longer 10 seconds clips.
Here too, according to the authors, contrary to findings of other research, there was no
significant relation between physical attractiveness and students’ ratings of teachers.

Although the writers admit that the results of the present research may not be
generalized to hold true in an everyday life situation, due to the specific academic setting in
which the study was conducted, however, this study shows that taking into account specific
criteria, the judgments of strangers can be accurate if based on behavioral observations. The
results also show that students’ ratings of teachers might not be necessarily based on
academic competencies of that teacher but on “affective” nonverbal behavior communicated
to students. Teachers who are better communicators of nonverbal information may be highly
rated by students.
EVALUATION OF THE STUDY: I like that you did a search on the authors of the
articles as this gives a broader picture but we usually evaluate the study itself (in its content
and format) and not the authors. A simple search on the internet demonstrates that both
writers are notable academic authorities in the subject discussed. Nalini Ambadi, who died at
the age of 54 from leukemia, was a professor who taught in the department of psychology at
Harvard, Tufts and Stanford, and an expert of nonverbal behavior and interpersonal
perception. Robert Rosenthal, also a professor at the department of psychology at the
universities of Harvard and California, specializes in nonverbal communication as well. The
abundant existing literature about nonverbal communication assesses its impact in contexts
ranging from the personal to the professional. However, this study comes to complement the
few ones that have attempted to measure its impact in an academic context. Although, what
preceeded underlines the seriousness of this work, in my opinion, there are some factors that
may question the resulting findings. The writers consider the students’ ratings variable as an
externally valid one for comparing and confirming the influence of nonverbal behavior on
judgments. Whereas students’ ratings have gained an increasing importance recently,
especially in academia, a great deal of literature suggests their vulnerability by underlining

the influence of many factors on their results. In fact, class size (Bedard & and Kuhn, 2008),
grading leniency and inflation (Greenwald &and Gillmore, 1997) are some of the variables
that have been determined to weigh on these results. As for the methodology, in my opinion,
some gaps may add further to the questioning of the findings. The writers claim that the
choice of female judges is justified by the fact that they are considered as better decoders of
nonverbal behavior. However, Canary and Babin-Gallagher (2015) claim that certain
dimensions of nonverbal behavior are better deciphered by men than women Good point.
Furthermore, I think that nonverbal behavior can only be understood contextually from both
males and females. Using silent clips that are edited randomly and extracted from the ambient
context may convey inaccurate behavioral messages that will distort the experimental
conditions.
Perrett (2013) examines another factor that is thought to interfere with the validity of
students’ ratings of teachers in higher education. He explores and compares through an
empirical study two different environments in which students’ ratings can take place: the
online administration of ratings as opposed to the on paper one, as well as all the concerns
and difficulties related to the online environment.
At first, the writer starts by stating the findings and conclusions of previous research
papers that examined the challenges of administrating students’ ratings via an online
environment compared to a class administered one. In fact, according to the author, some of
these studies suggest that online administration resulted in significant lower response rates
and asked for a better preparation of students and staff to the completion of this procedure.
Oppositely, other researches showed that an online administration with a satisfying logistical
preparation can yield good response rates with noticing of a higher participation rate of
students with higher academic records, thus challenging the common teachers’ suspicion,
regarding online students’ ratings, of a higher participation of students who received low
grades as a means of getting back at their teacher.
In his study, the author proposes to use the class as a unit of analysis, contrary to
previous studies where independent students constituted that reference unit. In fact, a teacher
with a poor academic performance would be viewed as such by almost all his students. Using
the class mean of students’ responses and the appropriate statistical methods allow for the
successful implementation of this hypothesis. The author formulates a seven-point scale of
closed-ended questions regarding different evaluation aspects of course and instructor. He
addresses three concerns to be analyzed: 1- the influence of online versus on paper answers
on the response rates 2- Students with low attendance rate will distort the course and
instructor ratings in an online evaluation context since their opinion will be an uninformed
one. 3- The concern that undergraduate students would give the same answer to most
questions, not taking enough time to understand the question thus forming a general
impression, as opposed to graduate students who would have more varied responses.
To address the previous issues, the writer formulated a set of hypotheses opposing the
online and class administered environments, to which statistical models were applied to study
their validity and significance. These are: 1- Course and instructor ratings are lower in an
online-based environment. 2- Response rates are lower as well in the same environment. 3-
Attendance is a determining factor of course and instructor ratings. 4- Varied answers are
found less in undergraduate students’ answers than graduate ones.
The study conducted was at a large university during the fall semester of 2009 and
touched only statistics courses. In a multiple-section course setting, using a random choice,
half of the sections would take the online survey while the other half the paper-administered
one while in a one-section course context the choice of the type of survey was left to the
instructor’s care. In the final result, thirty six undergraduate sections and twenty five graduate
ones were administered the online survey, whereas nine undergraduate sections and eleven
graduate one were given the paper survey.
The results collected show first that the mean course ratings between online and on
paper surveys were not statistically different (online undergraduate course rating 3.7 and the
paper one 3.5/Online graduate ratings 4.5 and on paper one 4.3), underlining the fact that
online administration does not distort such ratings. Undergraduate ratings averaged less than
graduate ones because at the graduate level, courses rated could be part of a major program.
Concerning response rates, both contexts displayed similar results going against the
unfounded belief that online responses will retain a lower students’ participation (online
undergraduate 71.1%/ paper undergraduate 68.8% and online graduates 60.4%/on paper
graduates 61.8%). As for the effect of attendance on ratings, and after examining the answers
of one item in the questionnaire that measures the course attendance of students, the findings
of the present research concluded that no relation exists between attendance and ratings. As
to analyzing if students would give little thinking to the questions and write general
impressions, as well as comparing the answers between graduate and undergraduate students,
the results collected showed that variability of answers among students was approximately
the same between graduates and undergraduates students (variability of answers averaged
over course section is 2.14 for undergraduates and 2.07 for graduates).
According to the author, the conclusions of this study come to prove that online-based
evaluations and on paper ones do not differ significantly as to their impact on students’
ratings, making online surveys as eligible as the on paper ones in terms of administration.
The writer however underlines at the end some limitations as to the generalizability of the
study, citing group sizes used in the research and class size in each group as possible
influencing factors.
EVALUATION OF THE STUDY: Jamis Perrett is a professor of Applied Statistics and
Research methods at the University of Northern Colorado. His research interests include
statistical computer programming and applied statistics. In my opinion, the background of the
researcher clearly focuses the interest of the study on developing the appropriate statistical
approach to measure each of the proposed hypotheses regardless of taking into account the
different surrounding variables. In fact, and to point out some, the stark difference in group
sizes (as indicated by the writer himself) between undergraduates in both contexts (36
sections online versus 9 sections on paper) and graduates (24 sections online versus 11
sections on paper) is a serious compromise to the experimental conditions. Furthermore, the
class size in each section is not mentioned, thus diluting the possible biasing effect of this
mediating variable on the findings (Bedard &and Kuhn, 2005). The sole use of statistics
classes in this study may also question its findings as course interest and liking has been
determined to have an indirect effect on the global ratings of instruction as will be determined
in the results of the next study.
There’s no doubt that the behavioral performance of instructors in the classroom
weighs greatly on students’ perceived quality of instruction and class. Greimel-Fuhrmann and
Geyer (2003) propose in this third empirical-based research to assess teaching behavior’s
impact on end-of-semester students’ ratings. Several mediating biases are explored on the
way, namely, students’ opinion of the ratings and their consequences, the liking of the teacher
and the course itself.
As customary to any empirical study, the writers start by discussing the previous
literature that deals with the topic and underline the fact that there’s little consensus about the
validity and reliability of students’ ratings, especially that other findings determined that the
ratings are not related to effective students’ learning. The literature ascertains as well several
biases on these ratings, among possible ones are grading leniency and the liking of teacher.
According to the authors, their intention in the present study is to assess the impact of interest
in course subject and the liking of teacher as well as other possible biasing variables such as
school climate on the global ratings of instruction. The study was conducted in Austrian
commercial colleges using accounting classes and the direct instruction model as basis for the
research with 2,121 participating students.
The writer’s primary objective aims at understanding students’ point of view of good
instructional quality as well as their opinion of ratings themselves. Unlike the previous Perrett
(2013) research, they decided at this stage to generate hypotheses to address these issues
through conducting interviews with students instead of formulating ones and then testing
them. In a second stage, a questionnaire was also designed to understand the factors that
weighed on the ratings of teachers. Finally, the complete version of the questionnaire that was
used for the purpose of the present study contained several items that covered aspects of
general evaluations of teachers, items that measure specific teaching performance, and other
items that measure possible biasing variables.
The parameters of good teaching concluded from the previous interviews showed that
a good teacher in general is one that provides understandable explanations supported by
examples, fosters a learning environment with concern as to students learning and has
positive character traits such as humor and friendliness. As for the students’ opinion of the
usefulness of students’ ratings, one third of the students questioned their validity and
usefulness and believed that they do not lead to a change in instruction’s behavioral
performance.
The collected data from the previous qualitative study about instructional behavior
was classified according to three factors: 1-subject-oriented instruction, 2- student-oriented
instruction and 3- classroom management, each of which was tested to determine if it
correlated with global ratings of teachers. Using a bivariate correlation coefficient of teaching
behavior with the evaluation of teacher, the results show that there is a correlation between all
three factors and global evaluation of teacher, with a strong correlation in the case of subject-
oriented teaching behavior (0.707/significant at the 0.01 level). A high correlation of some
biasing factors that influenced students’ ratings of teacher were also noted in the study at the
level of interest in the course (0.507), liking of the teacher (0.687) and students’ estimation
that the teacher takes the evaluations seriously (0.499) , and a lower correlation with general
factors such as class well-being (0.154). To measure the simultaneous effect of biasing
factors and mediating variables on students’ ratings, the writers developed another statistical
model based on exploratory factor analyses. In this new statistical setting, subject-oriented
behavior still retained the highest direct effect on students’ ratings. Subject-oriented behavior
was found to have an indirect effect on ratings of teachers through interest in the course
subject and liking of the teacher. Student-oriented behavior was found to be highly correlated
with liking of the teacher, and its indirect effect on students’ ratings is much higher than the
direct calculated effect. Other indirect variables such as school climate were found to be
insignificantly correlated with students’ ratings.
In conclusion, the study validates that subject-oriented teaching behavior proved to
impact most students’ ratings as did students’ interest in the course. The main three
categories of teaching behavior described previously were found to have also indirect
influences on students’ ratings through mediating variables such as liking of the teacher and
interest in the course. It can be also concluded that sSubject-oriented class instruction is the
one favored most by the students, a finding that invites teacher to organize their classes
around, among other aspects, clear explanations and efficient examples.
EVALUATION OF THE STUDY: Bettina Greimel-Fuhrmann is an assistant professor
at the department of Business Education at the Vienna University of Economics who is a
researcher in the fields of evaluation and instructional quality and Aloys Geyer is a professor
of Operations Research at the same university interested in methodological issues in
empirical research. In my opinion, this interdisciplinary approach of the analyzed topic is a
welcomed attempt that adds to the objectivity of the methodological approach. However,
some the research’s conditions may cast some doubts as to the generalizability of the results.
In fact, the experimentation was conducted in a single-course context (accounting class), and
the results here show an indirect effect of course liking on overall students’ ratings, which
means that students who do not like accounting classes but are forced to take it part of their
studying program will give lower global ratings. As indicated in the study, the sole
instructional model examined here is the direct instruction one, which also means that using
another instructional strategy may also alter the findings. Finally, the research analyzed the
factors that may influence global ratings and yielded results without comparing them to an
external criterion such as actual students’ ratings, and this in my opinion leaves the results at
the hypothetical level.
In this fourth study, MacNell, Driscoll and Hunt (2014) retackle the ever-present
concern of the effect of gender and the bias it implies on sStudents’ ratings of their teacher.
The recent date of publication of this research suggests the timeliness of the topic and
underlines the fact that it adds to the doubts as to the validity of students’ ratings in higher
education. In this study, the authors propose to measure the gender effect using an online
class setting as a primary tool of experimentation, where an assistant instructor will give two
classes using different gender identities and then interpret the ratings given by students. This
environment allows researchers better insight of gender effect, where the same instructor can
be presented to students through different identity names and photos, thus permitting them to
measure the effect of the “perceived gender of the instructor” variable. Although they
consider their research a pilot study however the results collected acquire a significant
meaning in their opinion.
According to the authors, the concept of gender is not linked to the biological
connotations of sex (male and female) but it rather acquires a social or cultural role and is
linked to expectations within those specific categories. In fact, western societies still consider
women inferior to men in terms of capacities and ability to achieve, and that a university
setting does not grant any exceptions in this regard: students, on different occasions, have
always favored male instructors. In this context, the authors find that previous studies about
the same topic presented some methodological gaps. Their findings have always concluded
that there are differences in students’ ratings between male and female teachers, without
however firmly attributing these findings to gender bias. Furthermore, what may question the
previous results are flaws in the design setting itself of these researches. The educational
context in which the studies take place usually may bear influencing biases too: in fact, male
instructors seem to hold an advantage in higher education institutions when it comes to
instruction positions as well as nature of classes taught. The online class environment
answers previous concerns and presents the advantage of isolating the mediating variables of
teaching behavior and style, always present in a face-to-face class context.
In an educational environment, male instructors in general are expected to be
professional and objective, whereas female instructors are assumed to be warm and
accessible. The consequent students’ ratings will have at their center such expectations and
will bear in them any signs of violations of such traits. The impact of gender on the difference
of ratings is further underlined as male instructors who do not seem to adhere to their
expected gender role are not sanctioned by students compared to female instructors who do
not. The obligations in higher education are more taxing on females as compared to male
instructors: a female instructor will not be taken seriously if in above its expected gender role
she does not assume an attitude of “professionalism and objectivity”. Such gender differences
and perceptions manifest themselves particularly in students’ ratings. These hypotheses are
important to mention here because, according to the authors, the results collected will
confirm them.
The experimental design was conducted in an introductory anthropology/sociology
course, part of a summer program in a high population university and built around four
discussion groups taught by one assistant male and one assistant female. The group sizes
ranged between eight and twelve students. Each assistant was in charge of two groups in
which his/her actual gender identity was revealed in one group and changed in the other
group. Course content was delivered through a “learning management system” in the form of
assigned readings and presentations and the contact with instructors was either through
emails or posted comments. Instructors, who created similar biographical information, kept
close presence with each discussion group giving necessary comment and feedback and
diversifying their teaching strategies, agreed on the same grading criteria, as well as posted
grades at the same time. The authors are trying here to minimize the mediating probable
effects teaching styles underlined in other studies. At the end of the semester, students were
asked to rate the instructors on different aspects and over 90% of the students submitted their
feedback. Instructors were evaluated on twelve aspects that addressed teaching in relation to
effectiveness, interpersonal traits, communication skills and overall quality as a teacher. The
authors, using specific statistical models confirmed that the questions will measure the same
variable which is the overall “evaluation of instructor’s teaching”. Gender bias evidence was
analyzed through comparing the mean results of students to these questions in actual gender
category and perceived gender.
The data collected revealed interesting information in several regards: the presence of
a significant difference in the ratings between perceived male and actual female instructors
(obviously the same person) and no significant difference between the real male and female
instructors. Significant differences are observed also when comparing the perceived male to
the perceived female ratings, the latter show lower ones on all twelve measured aspects with
statistically significant differences that relate to enthusiastic, fair, gives praise, professional,
prompt and respectful aspects. The mean of students’ ratings by actual and perceived
instructor gender shows a significant difference only between the two perceived genders
where the perceived male received 4.24 and the perceived female 3.7 on a scale of 5. It’s
interesting to note that despite the fact that both male and female instructors made sure to
post their feedback at the same time, the perceived female instructor received a 3.55 on a
scale of 5 and the perceived male 4.35. To confirm the hypotheses formulated above, the
authors noted as well that the perceived male instructor achieved higher ratings on
interpersonal skills, suggesting that females who are expected to have these skills are not
rewarded when they display them, and males, who are associated with different expectations
achieve higher ratings when they display such skills.
In conclusion, the authors think that these findings should be accounted for in tenure
and promotion of women instructors and question once again the validity of students’ ratings,
despite the fact that they weigh on any career-related decision of instructors. Therefore,
according to them, there should be a reevaluating of the ratings system for assessing
instructional quality.
EVALUATION OF THE STUDY: The brief biography of the authors written on the
first page of the article, shows that all three of them are sociology academicians. On a closer
level, one of MacNell’s fields of interest resides in food access and justice, Driscoll focuses
on the environmental impacts of industrial agriculture and Hunt researches about engaging
teaching practices in an academic environment. In my opinion, the findings here confirm, as
suggested by the authors themselves, previous researches’ findings about the role played by
gender in biasing students’ ratings. However, despite the claim of the authors that the
findings may be generalizable and have an external validity, I would like to point out to some
gaps in the methodology and the results as well. To restate a previous finding cited earlier,
the limited variability of the type of classes used in the experimental setting is a mediating
factor that may bias the ratings themselves. In addition, the low number of students per
section and the number of sections themselves play against the research conditions. In fact, it
would be interesting to study the ratings in the context of Massive Open Online Course
(MOOCs) classes where enrollment rate of students is very high per class and courses offered
are much more diversified. Another methodological concern maybe that the instructors
themselves should not have been aware of the experimental conditions in fear of having
biased the results, even unintentionally, when interacting with students. Furthermore,
knowing the students’ gender distribution of each section would have been important in this
study’s context. As for the results, I think that generalizing gender bias findings in any
research should not just be at the perceived gender level, rather it should transcend it to actual
gender as well. In the results of the study, we surprisingly notice that in the actual male and
female category, students’ ratings on most aspects were higher (even if not significantly) for
the actual female.
CONCLUSION
The present paper attempted to assess the worthiness of students’ ratings of teachers
in a higher education context. The four studies reviewed here analyzed some of the factors
that may bias these findings and make the students’ ratings measurement used in this context
unreliable. In fact, Ambadi and Rosenthal (1993) evaluated the impact of nonverbal behavior
and physical attractiveness on these ratings only to conclude that movements and facial
expressions communicate implicit messages captured by students, regardless of the academic
competencies of teachers. Perrett (2013) examines the administration of the ratings factor in
two different environments: online administration as opposed to the in-class one to conclude
that, contrary to other research findings, both environments yield significantly similar ratings.
Greimel-Fuhrmann and Geyer (2003) examined the teaching behavior and style factor and
determined that subject-oriented class instruction is the one that is preferred by students and
guarantees high ratings. Finally, MacNell, Driscoll and Hunt (2014) examined the gender
bias factor on students’ ratings and concluded that it questions their impartiality. As evaluated
earlier, all the findings of the above-mentioned studies hold true in specific narrow contexts
that may not be generalizable and are plagued with deficiencies at the methodological and
interpretational level, which leaves the reader with a sense of firm impression that they
should not constitute solely the basis of measuring instructional quality.
What adds to the unreliability of these ratings is a personal experience in a higher
education context. In fact, I teach the same general music appreciation class at the Lebanese
American University to two consecutive sections on the same days, using the same method of
instruction and naturally the same teaching behavior and style. At the end of the previous
semester, my students were asked to rate my instruction on different aspects. Using an online
administered survey, and almost equal responses, the ratings were marked with some
interesting differences (Appendix 1). In fact, in the satisfaction level evaluation of the course,
the third item that assesses how well I manage class discussions was different by 15.6%
between sections. Item number eight that asked students if I evaluated their performance
fairly was also different by 15.6%. In the last item evaluating my performance, the ratings
were different by 20%. The overall course evaluation rating was also different by 31.2%, a
significant difference in my opinion. As for the class workload item, the difference between
sections was also 15.4%. Naturally, these differences are of no surprise and should all be
understood through the lens of the numerous biases highlighted in the previously discussed
studies.
Centra (1973) compares teachers’ self-ratings in college with their students’ ratings,
to conclude at the end that they presented notable differences and that teachers tended to
overrate themselves in general. While the comparison used in his study stands solely to
demonstrate a theory, it’s the comparison concept itself that advances, in my opinion, a
suitable solution to the investigated topic in the present study. In fact, if students’ ratings of
instructional quality in college have succeeded in generating a rich literature in which their
validity was debated, using them solely in faculty essential career-decisions may
disadvantage persons with serious academic competencies. An approach that envisages
constant comparison and cross-checking of ratings may be the best solution to measure
instructional quality.
I propose a three-part framework as to assessing properly the quality of instruction
based on the following scheme. As a first step, conducting formative surveys to generate
periodical ratings based on course units rather than summative end-of-semester ratings based
on the entire course may yield more consistent results. The final ratings could be calculated
on the basis of the mean of the previous ratings. The rationale is that students assessing
teachers periodically have always in mind short term class performances of their teacher and
his/her explanations of various course units which will allow them to give better judgments
on the questioned aspects. Second, cross-checking a teacher’s ratings with other individuals’
ratings of the same teacher results in creating an informed opinion about instructional quality
and justifies any significant decisions on a career-related level. In this regard, the course
coordinator has an important role to play. In fact, the course coordinator is a key figure within
departments that has specific set of responsibilities related to the course. Many colleges
assign them with important tasks related to students’ learning and course faculty. Although
course coordinators may not have employment-decision privileges, their close relation with
course students and course faculty enables them to form accurate ideas about course progress
as well as acquire good knowledge of course faculty, character traits and instructional
strategies through the occasional attending of their classes. The second part of the solution
framework I propose consists in asking the course coordinator to participate in the same
survey as students’ and rate the teacher on the same aspects and compare both ratings later.
Naturally, these two solutions will generate constraints related to the additional
administrative processes due to the high amount of data to be collected and analyzed, which,
as a third and completing part of the framework, urges for the creation in each college of a
department of instructional excellence and feedback that has the task of collecting all
instruction-relevant feedback (students’ ratings and those of the course coordinator for
example) and analyze them and draw adequate conclusions to be used in building a strategy
of quality-instruction. Within the responsibilities of this department, falls also the task of
helping teachers improve their teaching performance through making them aware of the
significance of the ratings and provide them with training on different instructional aspects
such as, class management, instructional strategies, efficient interaction with students,
nonverbal skills…
The proposed three-part framework advances a possible alternative to the long
debated students’ ratings and constitutes an efficient solution to measure reliably instructional
quality. This way, administrators are given a firmer ground to stand on for any faculty career-
related decisions.
Grade: 27/30
Fadi,
Your topic is very important in the field of education, and I read it with keen interest.
You did a good job of summarizing and criticizing the research studies. Overall, this is a very
well-written paper with relevant discussions. One comment related to the discussion, it would
be helpful to organize your conclusion section into different subsection, where each
subsection would address a certain issue/theme that was addressed in the research articles.
REFERENCES
Ambadi, N., & Rosenthal, R. (1993). Half a Minute: Predicting Teacher Evaluations From
Thin Slices of Nonverbal Behavior and Physical Attractiveness. Journal of
Personality and Social Psychology, 64(3), 431-441.
Bedard, K., & Kuhn, P. (2008). Where class size really matters: Class size and student ratings
of instructor effectiveness. Economics of Education Review, 27(3), 253-265.

Canary, D., & Babin-Gallagher, B. (2015). Interpersonal Communication, Sex and Gender
Differences in. In The Concise Encyclopedia of Communication (1st ed., pp. 298-
299). John Wiley & Sons.
Centra, J. (1973). Self-Ratings of College Teachers: A Comparison With Student Ratings.
Journal of Educational Measurement, 10(4), 287-295.
Greenwald, A., & Gillmore, G. (1997). Grading leniency is a removable contaminant of
student ratings. American Psychologist, 52(11), 1209-1217.
Greimel-Fuhrmann, B., & Geyer, A. (2003). Students' Evaluation of Teachers and
Instructional Quality--Analysis of Relevant Factors Based on Empirical Evaluation
Research. Assessment & Evaluation in Higher Education, 28(3), 229-238.
MacNell, L., Driscoll, A., & Hunt, A. (2014). What’s in a Name: Exposing Gender Bias in
Student Ratings of Teaching. Innovative Higher Education.
Perrett, J. (2013). Exploring graduate and undergraduate course evaluations administered on
paper and online: A case study. Assessment & Evaluation in Higher Education, 38(1),
85-93.
Wachtel, H. (1998). Student Evaluation Of College Teaching Effectiveness: A Brief Review.
Assessment & Evaluation in Higher Education, 23(2), 191-211.
APPENDIX 1: RATINGS OF STUDENTS IN A MUSIC CLASS
1- FALL 2014 – SECTION 13:
Project Audience 31
Responses Received 9
Response Ratio 29.03%

1- FALL 2014 – SECTION 16:
Project Audience 32
Responses Received 10
Response Ratio 31.25%


Students' Ratings of Teachers in Higher Education

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Students' Ratings of Teachers in Higher Education

Загружено:

Авторское право:

Доступные форматы

Running header: STUDENTS’ RATINGS AND INSTRUCTIONAL QUALITY 1

Students’ Ratings of Teachers in Higher Education: A Reliable Measurement of Instructional

American University of Beirut

In a conference I attended last week at the American University of Beirut, entitled

“Implications of Widening Access to Higher Education in the Arab World: Implementing a

region is still characterized by discrepancies between countries at different levels, namely

enrollment of students, admission criteria, learning strategies and program outcome

just access and widen enrollment opportunities.

namely the evaluation of the quality of the delivered instruction

Very nice introduction to your topic.

Traditionally, students’ evaluations of teachers in higher education have always been

such evaluations and addressing methodological apprehensions. Interestingly, the author

inherent to the experimental conditions or as evaluated, due to methodological and

A REVIEW OF FOUR STUDIES

nonverbal behavior of teachers on the students’ end-of-semester ratings. Their method

To reach significant conclusions, the authors adopt a structured methodological approach. At

it is a worthy measure of comparison.

accuracy of judgments of strangers regarding their character traits. These in fact

demonstrated that consensus of judgments of strangers can be reached to a high degree,

the starting, middle and ending moment of the class.

attractiveness of teachers based on a still video shot.

that it had a limited effect on students’ ratings.

underlines the importance of nonverbal behavior as a communicator of influential messages

accounted statistically for the “physical attractiveness” dimension.

In an attempt to analyze further the impact of nonverbal behavior on students ratings

significant relation between physical attractiveness and students’ ratings of teachers.

academic competencies of that teacher but on “affective” nonverbal behavior communicated

perception. Robert Rosenthal, also a professor at the department of psychology at the

universities of Harvard and California, specializes in nonverbal communication as well. The

judgments. Whereas students’ ratings have gained an increasing importance recently,

especially in academia, a great deal of literature suggests their vulnerability by underlining

students’ ratings of teachers in higher education. He explores and compares through an

and difficulties related to the online environment.

grades as a means of getting back at their teacher.

successful implementation of this hypothesis. The author formulates a seven-point scale of

closed-ended questions regarding different evaluation aspects of course and instructor. He

found less in undergraduate students’ answers than graduate ones.

graduate one were given the paper survey.

EVALUATION OF THE STUDY: Jamis Perrett is a professor of Applied Statistics and

sections on paper) is a serious compromise to the experimental conditions. Furthermore, the

in the results of the next study.

There’s no doubt that the behavioral performance of instructors in the classroom

and the course itself.

research with 2,121 participating students.

items that measure possible biasing variables.

a good teacher in general is one that provides understandable explanations supported by

was classified according to three factors: 1-subject-oriented instruction, 2- student-oriented

instruction and 3- classroom management, each of which was tested to determine if it

insignificantly correlated with students’ ratings.

In conclusion, the study validates that subject-oriented teaching behavior proved to

around, among other aspects, clear explanations and efficient examples.

EVALUATION OF THE STUDY: Bettina Greimel-Fuhrmann is an assistant professor

at the department of Business Education at the Vienna University of Economics who is a

of Operations Research at the same university interested in methodological issues in

empirical research. In my opinion, this interdisciplinary approach of the analyzed topic is a

the hypothetical level.

meaning in their opinion.

instructors seem to hold an advantage in higher education institutions when it comes to

teaching behavior and style, always present in a face-to-face class context.

In an educational environment, male instructors in general are expected to be

The experimental design was conducted in an introductory anthropology/sociology

category and perceived gender.

achieve higher ratings when they display such skills.