Академический Документы
Профессиональный Документы
Культура Документы
ISSN 0023-8333
Yuly Asencion-Delaney
Northern Arizona University
Research on the acquisition of Spanishs two copulas, ser and estar, provides an understanding of the interaction among syntax, semantics, pragmatics, morphology, and
vocabulary during development (e.g., Geeslin, 2003a, 2003b; Gunterman, 1992; Ryan
& Lafford, 1992). Recent research suggests that linguistic features in the surrounding discourse influence learners copula choice. We present a corpus-based analysis
of the lexico-grammatical features co-occurring with copula + adjective usage among
foreign-language learners of Spanish at three levels of instruction. Findings revealed
the following: (a) both ser + adjective and estar + adjective occur at all levels where
little linguistic complexity typically occurs; (b) ser + adjective appears in descriptive
and evaluative discourse; and (c) estar + adjective is present in narrations, descriptions,
and hypothetical discourse.
Keywords second language acquisition; Spanish interlanguage; learner corpus; corpus
linguistics; grammatical development; ser and estar; copula choice
Introduction
Studying the acquisition of Spanish copulas, ser and estar, interests second
language acquisition (SLA) researchers because it requires studying syntax, semantics, pragmatics, morphology, and vocabulary during development
We wish to thank Dr. Roy St. Laurent of the Northern Arizona University Statistical Consulting
Lab for his valuable assistance in the design of the statistical analyses of this project. Any errors
reside solely with us. Our thanks also go to Dr. Vincent and Dr. Ojeda for their financial support to
transcribe the texts written by the learners.
Correspondence concerning this article should be addressed to Joe Collentine, Northern Arizona
University, Modern Languages, Box 6004, Flagstaff, AZ 86011. Internet: Joseph.Collentine@
nau.edu
Language Learning 60:2, June 2010, pp. 409445
!
C 2010 Language Learning Research Club, University of Michigan
DOI: 10.1111/j.1467-9922.2010.00563.x
409
Collentine and Asencion-Delaney
410
Collentine and Asencion-Delaney
Collentine and Asencion-Delaney
412
Collentine and Asencion-Delaney
Collentine and Asencion-Delaney
the following section we not only briefly delimit what corpus-based research
can reveal about SLA, but we also describe important corpus assumptions
and techniques. Because our analysis compares learner data to corpus-based
native-speaker models, we also describe relevant perspectives that recent corpus
research has uncovered about the nature of Spanish discourse.
Not only does a corpus-based approach lend itself to questions of L2 discourse development, but the techniques also permit empirical comparisons
between learner behaviors and native-speaker models. For instance, using an
English learner corpus and two British native-speaker corpora, Siyanova and
Schmitt (2007) found that, in informal speech, learners are less likely to use
two-word verb constructs (e.g., run into, put off ) than are native English speakers. One advantage of comparing learner performance to native-speaker models
is that the SLA researcher can make empirically defensible and testable assumptions about the end state of the acquisition process, an approach we adopt in
the present study.
Myles (2005) and Myles and Mitchell (2004) lamented that SLA research
has not been quick to embrace new technologies for collecting and analyzing
data, especially as it relates to corpus linguistics. They argued that corpus linguistics complements the current research by examining large amounts of data
with relative ease, thus increasing the generalizability of findings (Rutherford
& Thomas, 2001). Still, some notable corpus-based SLA research has contributed to our understanding of the context on language development (Belz,
2004; Collentine, 2004; Granger, Hung, & Petch-Tyson, 2002; Klein & Purdue,
1997). Some corpus research exists on ser and estar.
Corpus-Based S/E Findings
Corpus-based S/E research provides some evidence that learners copula choice
is sensitive to contextual factors and that there is reason to suspect that Spanish
copula + adjective segments are distributed to different discourse types. Cheng,
Lu, and Giannakouros (2008) examined a corpus of Mandarin Chinese L1
learners of Spanish. They show how advanced learners copula choice varies
according to the pragmatic intent of the surrounding discourse they themselves
produce. They reported that exploratory writing evoked greater estar + adjective usage and that estar + adjective is compatible with the semantic and
pragmatic goals of narratives or descriptions. Collentine (2008), in an invited
commentary article on Cheng et al. (2008), conducted a study on whether copula + adjective segments might serve discernable discourse functions in native
Spanish discourse. His analysis uncovered a significant interaction between
copula and text type. Ser + adjective was relatively frequent in most all types
Language Learning 60:2, June 2010, pp. 409445
414
Collentine and Asencion-Delaney
Collentine and Asencion-Delaney
416
Collentine and Asencion-Delaney
Table 1 Discourse dimensions and features targeted in the learner-native speaker comparison (cf. Biber et al., 2006)
Discourse type
Lexico-grammatical features
Informationally rich
Hypothetical
Subjunctive use
Conditional use
Future use
Verbs of obligation and causation (e.g., dejar, permitir,
hacer + infinitive)
Infinitives not preceded by a verb or article
Verbs followed by an infinitive
Progressive aspect (imperfect use or present participle)
Dependent que clauses
Narrative
Descriptive
Clitic usage
Imperfect tense/aspect
Preterit tense/aspect
Possessives
Third-person pronouns
Reflexive se and changes of states
Infinitives not preceded by a verb or article
Verbs followed by an infinitive
Defined as those that have an average number of characters in the dataset, plus that
calculations standard deviation, plus one characterthus, six or more characters.
Informationally rich discourse is one that conveys large amounts of information densely. Derived nouns, adjectives, multisyllabic words, and passives
convey information in a decidedly encyclopedic fashion. Another important
type of discourse in Spanishwhich is not found in English analyses (cf. Biber
417
Collentine and Asencion-Delaney
& Conrad, 2001), perhaps because Spanish has a neatly defined mood system
(with readily discernable inflections)is hypothetical discourse, which communicates possibilities and counterfactual information. It is characterized by
features such as verbs in the subjunctive and the conditional. The other two
discourse types identified by Biber et al. (2006) are well known to most (viz.,
narratives and descriptions).
Research Questions
The present study adds to our understanding of the acquisition of how contextual
variables interact with learners use of attributive sentences. Although the field
has a good idea of the communicative factors that motivate copula choice, we
do not know how each copula + adjective segment works with other lexical and
grammatical structures to communicate coherent discourse. To address this gap
in the literature and to understand the discursive function that ser + adjective
and estar + adjective segments serve over time, we provide a corpus-based
analysis of the lexico-grammatical features that predict the use of these two
segments with foreign-language (i.e., at-home) learners in the first, second, and
third years of the university level. More specifically, we address the following
research questions:
1. What are the lexico-grammatical features that co-occur with ser + adjective
usage? What are the discursive functions that these co-occurring features
serve?
2. What are the lexico-grammatical features that co-occur with estar + adjective usage? What are the discursive functions that these co-occurring
features serve?
To address these questions, we present the results of a series of regression
analyses predicting the occurrence of each copula + adjective segment from
a variety of lexico-grammatical features (see the Corpus Description section).
We predict that ser + adjective and estar + adjective segments will have
distinct lexico-grammatical associations that change over time. Specifically,
we posit that ser + adjective segments appear in simple discourses (e.g., highly
descriptive and listlike discourse) and estar + adjective segments become
increasingly associated with discursive complexity. However, we posit that the
association of estar + adjective with a particular discourse type will be more
difficult to identify because previous research suggests that even advanced
learners are more sensitive to contextual (i.e., pragmatic) constraints than are
native speakers with this construct.
Language Learning 60:2, June 2010, pp. 409445
418
Collentine and Asencion-Delaney
Method
Corpus Description
This study used a 432,511-word learner corpus of written Spanish, comprising
edited and nonedited compositions collected from English-speaking Spanish
learners at three levels of instruction: first year (230,270 words), second year
(109,224 words), and third year (93,017 words). The compositions were not
specific tasks designed to collect the data for this study but rather writing
samples used for assessment purposes. Students wrote letters, narratives, descriptions, summaries, and argumentative essays both in and out of class as well
as on exams. Topics related to the textbook themes (e.g., family, childhood)
and the cultural readings assigned in class. Each text was tagged for numerous
lexical and grammatical features (see above).
To determine what lexico-grammatical features co-occur with ser + adjective and estar + adjective usage, we considered a total of 75 potential predictor
variables, each operationalized in the form of a regular expression. In corpus
studies, variables refer to the linguistic features in the texts being analyzed. This
studys predictor variables included various lexical features, such as adjectives
other than the ones in the copula + adjective frame (e.g., derived adjectives,
adjective in postnominal position), nouns (e.g., derived nouns, feminine nouns,
masculine nouns), adverbs (e.g., adverbs of place, adverbs of time), and verb
classes (e.g., verb in imperfect aspect, verb in past participle), as well as morphosyntactic features such as dependent clauses, noun phrase configurations
(e.g., article plus noun), pronoun usage (e.g., cliticthird person), as well as
a variety of verb phrases (e.g., verbs of communication, verbs of knowledge).
The set of variables considered involved all parts of speech, common morphosyntactic constructs studied by learners, as well as additional constructs
studied in Biber et al. (2006).
Data Analyses
Learner Models Analysis
To identify the types of lexico-grammatical features that learners use with
ser + adjective and estar + adjective segments and to identify which variables distinguish among the three levels of learners, we constructed regression
models of lexical-grammatical regressors predicting copula + adjective usage:
a ser + adjective learner model and a estar + adjective learner model. We
constructed regression models for each copula + adjective segmentrather
than, for instance, a single regression model for which the choice between the
two is the dependent variablebecause the previous research suggests that the
factors motivating the use of ser + adjective usage are not the same as those
419
Collentine and Asencion-Delaney
motivating estar + adjective usage (cf. Guntermann, 1992). The process involves screening a set of potential predictor variables for standard assumptions
of linear regression, submitting the reduced set to a best-subsets analysis
rather than a stepwise procedureto identify the so-called best subset, and,
finally, comparing the predictor variables ability to distinguish among the three
levels of learners in terms of copula + adjective usage.
We employed a standard data-screening process, identifying which of the
potential predictor variables had honest correlations with the criterion variables,
thus discarding the following: (a) variables that had no correlation with a criterion variable (by examining correlation coefficients and scatter plots between
a potential predictor variable and the criterion); (b) variables that represented
inflated correlations (i.e., where two features correlated highly with each other
and constituted too high an overlap in semantic or structural properties, so as
to avoid colinearity problems in the final model selection phase);1 and (c) variables that constituted deflated correlations, eliminating predictor variables that
had a highly reduced range of responses to the criterion variable (e.g., those
variables whose frequency was very small, such as n = 2, regardless of the
level of the participant or the genre). This screening of the data yielded a list
of 58 potential linguistic variables (37 for ser + adjective and 21 for estar +
adjective) that could be meaningful for the regression analyses to be performed.
Table 2 shows the preliminary list of variables.
We used best-subsets analyses to derive the two regression models for ser +
adjective and estar + adjective. Social scientists frequently employ stepwise
procedures for building regression models. Although these procedures for variable selection work adequately for reducing a small set of potential predictor
variables to a small, more meaningful set (e.g., a subset that does not have a
high degree of overlap), statisticians do not favor stepwise analyses when the
initial pool of predictor variables is extremely large (Miller, 2002), such as the
present case. Following Rencher (2002), we employed instead a best-subsets
analysis for building the two models for predicting ser + adjective and estar +
adjective. The principal advantage that a best-subsets approach has over statistical/stepwise regression (with a large number of predictor variables) is that
best-subsets approaches attempt to reduce the number of predictor variables
by comparing various combinations of variables, whereas the stepwise procedure attempts the reduction process by considering each and every potential
predictor variable individually. The best-subsets approach has been shown to
produce less spurious results than stepwise procedures when reducing a large
set of potential predictor variables. With large pools of potential predictor variables that have an almost infinite number of combinations, stepwise regression
Language Learning 60:2, June 2010, pp. 409445
420
Collentine and Asencion-Delaney
Table 2 Linguistic variables used in the study after initial data screening
Variable class
Ser + adjective
Estar + adjective
Noun
noun - derived
noun - feminine
noun - masculine
noun - singular
Adjective
adjective - singular
adjective - type 1
adjective - type 2
adjective - derived
adjective - feminine
adjective - masculine
adjective - plural
adjective - postnominalUna
casa grande a large house
adjective - prenominalUna
bella mansion A beautiful
mansion
adjective - singular
adjective - type 1Descriptive
adjective with four inflections:
masculine, feminine, singular,
and plural. Blanco/a(s) white
adjective - type 2Descriptive
adjective with two inflections:
singular and plural. Interesante(s),
liberal(es) interesting, liberal
Pronoun
Verbs
clitic - preverbal
pronoun - third
person
que subordinator
article noun segment
possessive adjective
SE plus 3rd-singular
verb
verb - Gustar-like
verb - third person
verb - knowledge
verb - past participle
verb - present
participle
(Continued)
421
Collentine and Asencion-Delaney
Table 2 Continued
Variable class
Adverbs or
adverbial clauses
Total
Ser + adjective
Estar + adjective
verb suasiveQuerer
want, mandar
order, etc.
verb probabilityCreer
believe, negar
deny, dudar
doubt, etc.
adverb - time
adverbial clauses contingency
adverbial clauses time
adverb - place
adverb - time
adverbial clauses - contingency
adverbial clauses - time
37
21
Note. All adjectives in this list did not follow one of the two copulas.
may never consider combinations of predictor variables that are equally good at
predicting the occurrence of the response variable (i.e., the dependent variable)
in question.2 Because this analysis is computationally intensive and not available in many commercial software packages for the social sciences, we used
the statistical package R and its best-subsets regression package to perform the
analysis (see Dalgaard, 2008).3
We employed what is termed a subgroup regression analysis to determine
which of the variables in the two models predicting ser + adjective and estar +
adjective usage distinguished among the three levels (Hardy, 1993). The process employs indicator variables (sometimes called dummy variables) to add
categorical predictor variables (into the model described earlier) called differential intercept coefficients. This reveals the effect for each group for each
Language Learning 60:2, June 2010, pp. 409445
422
Collentine and Asencion-Delaney
predictor variable (i.e., the unique contribution of each level in our study to
each coefficient calculated for the predictor variables), producing k 1 difference (predictor) variable models, where k represents the number of groups.4
Because this group-level coefficient effect process is derived from two regression models, we adjusted the alpha for significant coefficient differences via a
Bonferroni adjustment to 0.025 (i.e., 1 (1 .05)1/2 ).
Native-Speaker Model Comparison
To objectively identify the types of discourse that the lexico-grammatical structures (dis)associated with each copula + adjective segment represent (derived
from the best-subsets analysis), we compare the two copula + adjective learner
models with the native-speaker discourse model described in Table 1. Our
analysis measured the extent to which the learners discourse possessed indicators of informational richness, hypothetical discourse, narrative discourse,
and descriptive discourse.
As described earlier, we calculated the normed frequency of the occurrence
of each of these variables in the learner corpus to a scale of 10,000 per text.
Subsequently, we calculated the extent to which documents representing high
concentrations of each copula + adjective model correlated with high concentrations of each of the four native-speaker discourse types in three steps: (1) For
each document we calculated z-score totals for both the ser + adjective and the
estar + adjective models; (2) for each document we calculated a z-score total
for each of the four discourse types in Table 2; (3) we regressed the four discourse type z-score totals against each of the copula model z-score totals along
with subregession analyses to assess differences between the three levels. A
z-score value for any document on a given variablebe it a criterion variable as
in step 1 or a regressor as in step 2represents the extent to which that variable
is represented in that document vis-`a-vis all other documents. Summing a set
of z-scores produces a value representing to what extent any document had a
concentration of that set of variables (see Biber et al., 2006, as well as Biber and
Conrad, 2001, for in-depth discussions of this technique). Thus, summing the
z-scores for each document for variables representing, say, narrative discourse
indicated how narrative each document is. Likewise, z-score totals for the set of
regressors representing the ser + adjective model and for the set representing
the estar + adjective model for each document yields values indicating how
much each document more or less represented each model. (Of course, all
z-scores here must be weighted according to their +/ sign in the model.) The
regression and subregression analyses answer the following question: When
documents reflect the ser + adjective model and the estar + adjective model,
423
Collentine and Asencion-Delaney
424
Estimate sign
Estimate
Std. error
+
+
+
+
+
+
+
+
+
+
+
81.371
.050
.040
.100
.170
.200
.150
.070
.020
.020
.050
.060
.040
.060
.040
.070
.040
.110
.040
.070
.090
.080
9.011
.020
.020
.020
.020
.020
.020
.030
.010
.010
.010
.040
.030
.010
.010
.050
.020
.060
.020
.030
.040
.040
t test
9.030
2.470
2.180
5.420
10.010
9.370
9.680
2.580
2.030
2.640
5.800
1.560
1.500
12.380
3.730
1.570
1.530
1.860
2.460
2.300
2.340
1.980
p
.000
.010
.030
.000
.000
.000
.000
.010
.040
.010
.000
.120
.130
.000
.000
.120
.130
.060
.010
.020
.020
.050
that at all levels in contexts/discourses where ser + adjective segments appear, learners use adjectives in general in a variety of inflections. Interestingly,
however, the positive correlation with type-2 adjectives (i.e., adjectives with
only two inflections: singular and plural) tempers this conclusion because they
are also significantly associated with the criterion. Finally, although various
morphological properties of adjectives associate with ser + adjective, this construction is not associated with more complex uses of adjectives because ser +
adjective is disassociated with adjectives that appear in either prenominal (e.g.,
bella casa beautiful house) or postnominal position (e.g., casa grande large
house).
An analysis of the two nominal regressors indicates that a certain degree
of morphological nominal complexity occurs where ser + adjective segments
predominate, as both had a significant positive association with the criterion
425
Collentine and Asencion-Delaney
variable. The association with feminine nouns shows an association with the criterion variable of gender-inflectional processes, whereas the association with
derived nouns (which represent nouns packaging semantic information in a
dense fashion, as these derived forms have a base/root morpheme and an additional derivational morpheme; e.g., constitu-cion, sereni-dad, procesa-miento).
It is important to note, however, that this is the only indication of ser + adjective
association with semantically dense forms. As with the adjectival regressors,
neither of these two nominal regressors distinguished among the three levels,
suggesting that the association of ser + adjective with a certain degree of morphological complexity occurs from the beginning to more advanced levels of
instruction.
Subject pronouns for the most part also appeared where there was a preponderance of ser + adjective segments, although the subregression analysis
revealed that this regressor significantly distinguished among the three levels
of learners. The subregression analysis revealed that for the first-year learners
subject pronouns were positively associated with ser + adjective (beta = 0.06;
std error = 0.001), that for the second-year learners there was no association at
all (beta = 0.001; std error = 0.017), and that for the third-year learners there
was a disassociation with the criterion variable (beta = 0.06; std error =
0.043); the analysis also revealed that the significant difference came from the
first-year learners rather than the other two (t = 3.00; p = .003), meaning that
the association of ser + adjective with subject pronoun use was primarily due
to the first-year-learner data.
The best-subsets analysis identified two adverbial constructions as important contributing predictors of overall ser + adjective usage: adverbs of place
and adverbial clauses of cause. Although neither of the two contributed significantly on an individual basis, adverbs of place significantly distinguished
among the three levels of learners in terms of predicting when ser + adjective
would occur. The subregression analysis indicated that for the first-year learners, adverbs of place were disassociated with ser + adjective (beta = 0.12;
std error = 0.05), whereas these adverbs were (positively) associated with the
criterion at the second (beta = 0.07; std error = 0.06) and third years (beta =
0.06; std error = 0.09), with the significant difference being attributed to the
difference between the first-year and second-year individual contributions to
the model (t = 2.45; p = .015).
There were six grammatical features of verbs that predicted ser + adjective
usage at the three levels. For the most part, verbal variables were disassociated
with ser + adjective. Similar to the adverbial regressors, three were important
enough to be included in the ser + adjective model but did not individually
Language Learning 60:2, June 2010, pp. 409445
426
Collentine and Asencion-Delaney
indicates that only 5% of the variance in the Spanish learners use of estar +
adjective could be explained by this regression model. This indicates that the
association of estar + adjective with other lexical-grammatical features is weak
within the interlanguage for all levels of learners. The model did account for
a significant amount of the overall variation in estar + adjective usage, [F(10,
1590) = 8.42; p < .0001].
As observed in Table 4, most of these 10 variables distinguished significantly among the three levels, with the subgroup regression analysis revealing
that four regressors significantly distinguished among the three levels of learners: type-2 adjectives (i.e., adjectives with singular and plural inflection), article
noun segments, preverbal clitics, and possessive adjectives. It is interesting to
note that this group of variables is entirely different from the group of significant regressors for the ser + adjective copula. At any rate, these differences
are considered below in the interpretation of the variables, where we discuss
all 10 variables by grouping them into three lexico-grammatical regressor categories: nominal (noun and adjectival), verbal, and syntactic variables.
In contrast to ser + adjective segments, estar + adjective is associated with
decidedly basic grammatical properties. For example, noun phrases in discourse
where estar + adjective occurs usually comprises nouns preceded by articles
or possessive determiners (e.g., mi mama my mother, la universidad the
university) and adjectives that have only two inflections (e.g., inteligente intelligent) or adjectives in their singular form (alta tall [feminine]). Three of
the four level-distinguishing regressors identified in the subregression analysis
Table 4 Best subset regression model for estar + adjective
Coefficient
(Constant)
adjective - singular
adjective - type 2a
noun - singular
article noun segmenta
possessive adjectivea
verb - Gustar-like
verb - present participle
verb - probability
clitics - preverbala
adverbial clauses - cause
a
Estimate sign
Estimate
Std. error
t test
+
+
4.460
.010
.020
.000
.010
.010
.020
.030
.020
.020
.020
3.518
.003
.009
.002
.002
.003
.008
.011
.011
.005
.009
1.267
2.459
2.616
1.716
3.544
3.669
2.419
2.364
1.871
3.617
2.326
.205
.014
.009
.086
.000
.000
.016
.018
.062
.000
.020
+
+
+
+
+
+
428
Collentine and Asencion-Delaney
were nominal in nature. Type-2 adjectives were found to distinguish significantly between first- and third-year learners (t = 2.73; p = .006), indicating
that the trend to associate inflectionally simple adjectives with estar + adjective appears to become stronger as learners progress in their acquisition of
Spanish. This predictor variable was disassociated with the criterion variable
(beta = 0.01, std error = 0.009) for the first-year students and was positively
associated with estar + adjective for the second (beta = 0.01, std error =
0.021) and third year (beta = 0.13, std error = 0.046). The article noun segment significantly distinguished only between first- and second-year learners
(t = 3.30; p = .001). This regressor was weakly associated with the criterion
variable for first-year (beta = 0.002; std error = 0.003) and third-year students
(beta = 0.003; std error = 0.011) and only slightly more associated with estar + adjective for second-year students (beta = 0.018; std error = 0.005).
Finally, possessive adjectives significantly distinguished between second-year
and third-year learners (t = 2.78; p = .005). This regressor was found to be
weakly associated with the criterion level for the first year (beta = 0.008; std
error = 0.003) and the third year (beta = 0.003; std error = 0.011) and only
slightly more associated with estar + adjective for the second-year writing
(beta = 0.041; std error = 0.010).
Among verbal regressors, the significant predictor variables also showed
no evidence that complexity is associated with the criterion. Although Gustarlike verbs are usually associated with complex syntax, in the learners writing
this variable is negatively associated with the occurrence of estar + adjective.
The other grammatical verb formpresent participleis expected to co-occur
with estar + adjective because it is mostly associated with estar to form
the progressive aspect. Indeed, its beta coefficient was the highest of those
regressors included in the best-subsets analysis (0.030).
Two syntactic features were positively associated with estar + adjective.
Preverbal clitics positively associated with estar + adjectives at all levels, perhaps the only indication of complexity associated with this phrase structure.
The other syntactic regressor, causal adverbial clauseswhich usually started
with the conjunction porquealso predicted criterion usage. Preverbal clitics
was the only syntactic regressor variable that distinguished significantly between learners use of estar + adjective at different levels. This variable was
weakly associated with the criterion for first-year learners (beta = 0.006; std
error = 0.007), which increases modestly yet significantly (t = 2.31; p = .021)
into the second and third years, with the association being greater for second(beta = 0.033; std error = 0.011) and third-year (beta = 0.056; std error =
0.019) learners.
429
Collentine and Asencion-Delaney
Estimate sign
Estimate
Std. error
t test
+
+
0.001
0.079
0.303
0.376
0.350
0.131
0.045
0.039
0.124
0.114
9.007
1.776
7.766
3.030
3.060
.995
.076
.000
.002
.002
Estimate sign
Estimate
Std. error
t test
+
+
+
+
+
81.371
0.129
0.059
0.550
0.135
9.011
0.026
0.023
0.072
0.067
9.030
4.954
2.574
7.592
2.025
.000
.000
.010
.000
.043
430
Collentine and Asencion-Delaney
Collentine and Asencion-Delaney
432
Collentine and Asencion-Delaney
old. She is from Oregon and she is a brunette, short and very beautiful. She
wears a green t-shirt and blue jeans. Her clothes cost a lot of dollars. She likes to
dance and sing for me. She likes tennis. Jessicas mother is beautiful, intelligent
and short. Her name is Valerie. We play tennis a lot. She is good. We learn it at
the University. She wears a green shirt and blue jeans at the university . . .)
(2) yo soy bien porque yo soy amo con novia, selena. ella [es] bonita y
simpatica. ella [es] soltera y practicar. ella es alta y la ropa es mocha colores.
mi muchacha lleva rojo gora, blanco jacqueta, azul jeans, y negro sandalias.
ella es mi amora. selena (stays) con madre en casa grande. la familia [es] baja.
la madre [es] rica y lista y soltera . . . (I am well because I am in love with
my girlfriend, Selena. She is beautiful and nice. She is single and practical.
She is tall and she wears clothes in a lot of colors. My girl wears a red cap,
white jacket, blue jeans, and black sandals. She is my love. Selena stays with
her mother in Casa Grande. Her family is small. Her mother is rich, smart and
single . . .)
In both of these samples we see simple discourse, grammar, and lexicon,
with few verbs except for the copula and an overuse of subject pronouns.
Additionally, although there are numerous adjectives in both segments, it is
apparent that noun + adjective segments are scarce. These first-year samples
are nonnarrative and possess almost no conjecturing.
Among second-year learners, ser + adjective segments appear in list fashion
in discourse with few conjunctions expressing interpropositional relationships
(e.g., ser + adjective + que copula + adjective + that). Such loosely connected discourse not only describes people, places and concepts, but it also
describes evaluations and reactions to events and states. As the learner model
suggests, there is a marked absence of epistemic verbs to demonstrate the
stance (verbs of knowledge, pienso que I think that; verbs of perception,
vemos que we see that; verbs of communication, se dice que it is said
that). Instead, copula + adjective segments present (seemingly) indisputable
assertions. Structurally speaking, we see subject pronouns omitted to mark
continuity; still, there are various referents and allusions to the things they do
frequently. This probably accounts for why ser + adjective segments are associated with a mix of descriptive and narrative features. Finally, the derivational
sophisticationand thus semantic densityof the nouns employed is slightly
greater at this level in nouns, although these are mostly cognates. The following
is an argumentative essay a second-year student wrote using short stories as the
topic.
(3) . . . este cuento es un ejemplo que muchos padres estan usando la television como ninera. pienso que esto es un problema porque los jovenes no saben
433
Collentine and Asencion-Delaney
si es realidad o no. los ninos no reciben la atencion que necesitan para crecer.
tambien pienso que los jovenes necesitan atencion y amor en los primeros
anos mas que de cuando [son] maduros porque cuando son jovenes ellos no
saben que [es] malo o que [es] bueno. tambien, la television [es] mala para
los padres. para los adultos la puede ser un escape tan ellos no tienen hacer
trabajo, o cosas diferentes que necesitan hacer durante el da. pero, tambien
pienso que hay diferentes programas que [son] buenas. hay programas que
ensena como cocinar, leer (para los ninos), y que dice que esta haciendo en el
mundo hoy. no todos de los programas de television [son] mala. pero yo pienso
que [es] malo usar la mas de necesario. (. . . this story is an example that many
parents are using the television as babysitters. I think this is a problem because
young people dont know whether it is real life or not. Children do not receive
attention enough to grow up. I also think that young people need attention and
love in their first years of life more than when they are mature because when
they are young they dont know what is good or what is bad. Also, television
is bad for parents. For adults it can be an escape because they dont have to do
their work or the different things they need to do during the day. But, I also
think that there are different programs that are good. There are programs that
teach you how to cook, to read (for children) and that tell you what is being
done in the world today. Not all the TV programs are bad, but I think it is bad
to use it more than necessary.)
With the third-year learners, ser + adjective is less frequent, reflected by a
lower overall average z-score of ser + adjective. It is now mixed among other
verbs in the third person and adjectives modifying nouns. The discourse is descriptive and evaluative in nature, with references to relevant events, producing
a mix of descriptive and narrative elements. The following texts are expository
essays students wrote in a third-year course about different occupations.
(4) al principio de su vida, el bebe atleta es una hija diferente de sus
hermanas. el grito del bebe [es] mas fuerte, el apetito mas famelico y el
cuerpo pequeno mas musculoso que los otros bebes . . . de repente, en la escuela
primaria, es la estrella de su partido de futbol y la parte necesaria entre su
equipo de basquetbol. al fin, no se puede negar todos los hechos, ella es
atleta. [es] seguro que hay cualidades particulares para las atletas; factores
que definen las mujeres que aman los deportes . . . mientras que la atleta esta
entrenandose, se come un dietetico rico con una variedad de las frutas y las
verduras. sin las vitaminas y minerales de estas comidas, el cuerpo no funciona
mejor . . . se come mucho pescado y tofu, [es] justo porque los dos son comidas
saludables sin mucha grasa . . . en el concepto de la diversion, el cuerpo de la
atleta es su templo. por eso, no pasan los viernes bebiendo cerveza y fumando
Language Learning 60:2, June 2010, pp. 409445
434
Collentine and Asencion-Delaney
Collentine and Asencion-Delaney
subject pronouns are scarce perhaps due to topic continuity. As with the firstyear learners, we see expression of stance via epistemic verbs, and statements
are given as unqualified facts.
Regarding the estar + adjective segments, their principal discourse functions appear to be narrative and descriptions within narrations. In the first year,
estar + adjective mostly appears with a fixed expression such as estoy feliz
I am happy or is used in descriptive contexts where ser was required with
adjectives such as bonita pretty and grande large. The following examples
come from in-class letters that learners wrote to a friend. The examples relate
life events as well as describe familiar people and places.
(6) querida maria, hola! [estoy] muy feliz porque yo tengo un novio nueva.
su nombre es Pete. Pete tiene veinte anos. mi novio es de indiana. Pete es
moreno y alto. mi novio es muy inteligente y optimista. (Dear Mary, Hello! I am
happy because I have a new boyfriend. His name is Pete. Pete is twenty years
old. My boyfriend is from Indiana. Pete has dark hair and is tall. My boyfriend
is very intelligent and optimist.)
(7) hola aubrey! fue a costa rica para un semana. fue a un hotel en la
playa dominical de costa rica. la playa dominical [estuvo] mas bonita! viajo
con mis padres y mi hermano. fue en un avion y lo [estuvo] mas grande. dorm
en un hotel en la playa. el mar [estuvo] muy largo y yo pesque mucho. me
gustaron las comidas mucho! (Hello Aubrey! I went to Costa Rica for a week.
I went to a hotel in the Dominical beach in Costa Rica. The Dominical beach
was very beautiful. I traveled with my parents and my brother. I went by plane
and it was very big. I slept in a hotel by the beach. The ocean was very big and
I fished a lot. I liked the meals very much!)
Learners couple their assessments of peoples states with causes embedded in porque because adverbial clauses. The semantically dense nature is
attributable to the use of various cognates that are long words, which describe
places, disciplines, actions, or events.
In second-year writing, estar + adjective is used in narrative and descriptive discourse that is detached from the writer. Writing is elicited from tasks
in which students must summarize events and describe characters in readings
or audiovisual material. The description of events favors the use of the present
participle. The summarizing task also allows students to speculate about characters motives or actions by using verbs of probability such as creer to believe
and causal adverbial clauses that begin with the porque because causal conjunction. These are the types of behaviors that account for the hypothetical
nature identified for estar + adjective.
Language Learning 60:2, June 2010, pp. 409445
436
Collentine and Asencion-Delaney
437
Collentine and Asencion-Delaney
438
Collentine and Asencion-Delaney
and (b) the function of attributive sentences in terms of orders of acquisition in different learning contexts (Gunterman, 1992; Ryan & Lafford, 1992;
VanPatten, 1985, 1987) and (c) the contextual and semantic factors that predict
learner usage of this construct as compared to native speakers (Geeslin, 2003a,
2005), the present study is the first to provide a corpus-based analysis of the
lexico-grammatical features that co-occurred with the Spanish copula (i.e., ser
and estar) + adjective usage and so the different discursive functions that the
ser + adjective and the estar + adjective segments play at three learner levels
and in comparison to native-speaker models. The study delves into important
learner issuesfor example, the discourse types learners associate with copula
usage (Gunterman, 1992), the strong influence of contextual cues on copula
choice (Geeslin, 2003a, 2003b)identified in the S/E research but not fully
developed to date. The results overall revealed the following: (a) Both ser +
adjective and estar + adjective were associated with simple discourse at all
levels; (b) ser + adjective appears in descriptive and evaluative discourse where
much linguistic complexity reliably occurs; (c) estar + adjective is present in
narrations, descriptions, and hypothetical discourse where, nonetheless, little
linguistic complexity typically occurs.
Specifically, findings showed that the model predicting ser + adjective
usage identified more variables (n = 21) and accounted for more variation
(41%) than the estar + adjective model, which only identified 10 predictors
and 5% of the variation. It seems that at beginning levels of instruction, learners
find ser + adjective more communicatively productive and thus more easily
associated with a large array of features within their interlanguage, although
these features are basic grammatical and lexical items. Ser + adjective is one
of the first copula segments taught and recycled during various semesters,
whereas estar + adjective is primarily used at beginning levels in routines
and formulaics like estar + bien, mal, ocupado, enfermo. In this sense, the
input provided by teacher, materials, and other students in the class through
task completion emphasizes the use of ser + adjective over estar + adjective
constructions and, therefore, encourages more ser + adjective usage. These
findings are in line with early SLA studies on ser/estar acquisition, which found
that ser + adjective was acquired well before estar + adjective (Gunterman,
1992; Ryan & Lafford, 1992; VanPatten, 1987) presumably because of the
higher frequency and saliency of ser + adjective in instructional and naturalistic
input.
It was also found that many of the lexico-grammatical predictor variables in
both models were characteristics of simple discourse and they did not differentiate learners copula + adjective usage among the three levels of instructions.
439
Collentine and Asencion-Delaney
All levels seem to use copula + adjective as a discourse tool such as to communicate evaluatives like es importante, lastima its important, its a shame, and
so forth. However, when the discourse becomes more syntactically and grammatically complex, ser + adjective segments are absent and estar + adjective
segments become more prevalent. On the one hand, these observations contrast
with native speakers, who use ser + adjective for evaluative purposes in a wide
variety of discourses, simple or complex; on the other hand, they are consistent
with natives propensity to use estar + adjective in more complex discourse
(Collentine, 2008).
The ser + adjective model was mostly associated with adjective and grammatical/lexical verb variables. Various morphological properties of adjectives
(e.g., feminine, plural) associated with ser + adjective, whereas more complex adjectival syntactic processes (e.g., prenominal or postnominal adjectives)
emerged as disassociated. Most of the verbal variables reflecting complex syntax (e.g., periphrastic future, past subjunctive, Gustar-like verbs) were disassociated with the copula construction and started to emerge as associated with
ser + adjective at advanced levels of instruction. Other features such as null subjects also indicated some grammatical sophistication at advanced levels where
ser + adjective became less frequently used. As for the discursive functions
served by the co-occurrence of the variables in the predictive model for ser +
adjective, the disassociation of verbs of observation and communication with
the construction indicated a discourse that was nonepistemic/nonhypothetical
in nature. Comparisons with native speakers discourse showed that learners
used ser + adjective in discourse that is highly descriptive in nature and accompanied by story-telling elements, especially at advanced levels of instruction.
These findings corroborate those of Gunterman (1992), who examined learners
in study-abroad contexts where ser + adjective was indicative of descriptive
discourse. Spanish learners, regardless their level, associate an evaluative stance
with ser + adjective.
The estar + adjective regression analysis revealed a weak association with
other lexical-grammatical features. This indicates that throughout the early to
middle stages of acquisition, this phrase structure is weakly integrated into the
interlanguage in terms of being a productive, necessary tool for the types of
communication in which learners engage. In other words, the use of estar +
adjective segments is not obviatedor evoked, cognitively speakingwhen
learners use their standard repertoire of lexico-grammatical tools. All told, the
story is complicated for estar + adjectives, which ultimately might account
for its late acquisition. On the one hand, it appears where there is little associated inflectional sophistication (recall that Spanish is a highly inflectional
Language Learning 60:2, June 2010, pp. 409445
440
Collentine and Asencion-Delaney
language, both in verbal and nominal constructs.). The few variables associated
with estar + adjective suggest that it appears in discourse lacking in overall
inflectional sophistication (e.g., type-2 adjectives or adjectives with singular
and plural inflections, singular nouns, negative association with Gustar-like
verbs). Its use places significant processing demands on learners, as shown
by Geeslin (2003a, 2003b), who noted that learners use estar + adjective
segments according to pragmatic factors (which require a consideration of a
multitude of contextual variables) rather than according to semantic/lexical
constraints (which are local to the copula + adjective phrase structure). With
this in mind it is not unreasonable to conjecture that learners are more likely
to have the cognitive resources to employ it when other structural demands
are not overwhelming. On the other hand, estar + adjective segments usually
occurred in discourse that was semantically denseprobably because it was
based on sources, with hypothetical elements (e.g., verbs of probability, causal
adverbial clauses), narrative features (e.g., present participles), and descriptive
features (e.g., of adjectives type 2). Like study-abroad learners (see Gunterman,
1992), in an instructional context, learners also use estar + adjective when they
need to fulfill communicative functions that go beyond description. Learners
awareness and experience with different kinds of discourse (e.g., narration,
arguments) at advanced levels of instruction might explain these associations
rather than learners acquisition of discrete grammatical and lexical items,
given the simplicity of their interlanguage, as Lafford (2004) asserted for the
gains observed for learners studying abroad. Cheng et al. (2008) concluded that
more abstract registers can evoke greater estar + adjective usage. In this study,
learners at advanced levels of instruction were asked to complete written tasks
in which they summarized a story or argued in favor of a position. We have
no way of knowing if the task demands affected learners estar + adjective
usage, however, the results indicate that it would be possible that weighty processing demands of discourse where referents and events were detached from
the writer and in some cases based on reading with grammar and vocabulary
beyond their linguistic knowledge could have lead to more estar + adjective
use. The semantic and pragmatic goals of narratives as well as hypothetical discourse seem to entail more consideration of the states of affairs of referents and
changes in the background of a story or situation, thus being more compatible
with estar + adjective.
The findings in our study provide evidence of the influence of the lexicogrammatical and discourse predictors in learners copula + adjective usage, as
attested in previous studies (Cheng et al., 2008; Geeslin 2003a, 2003b, 2005).
Under classroom setting conditions, learners written discourse in response to
441
Collentine and Asencion-Delaney
442
Collentine and Asencion-Delaney
Notes
1 Screening data for inflated correlations is difficult in the linguistic sciences. Whereas
two words might represent the same part of speech (e.g., adjectives), the inflectional
morphology of adjectives can represent important distinctions for learners, such as
those that are singular and those that are plural. Additionally, natural language is
extremely redundant as communication systems go, and so our initial screening
process sought to balance semantic and structural collinearity considerations.
2 The analysis identifies the optimal combination of regressors for a criterion by
comparing and contrasting all possible predictor-variable combination values for
Mallows C p , which simultaneously represents any given models bias (i.e., how well
it predicts the referent variable) and the variation associated with that bias. The
best-subsets analysis comparesnumerouscombinations of variables by
identifying (a) the number of and (b) which predictor variables balance bias and
variance where the mean-square error of a combination is small. The resulting
model has a small bias with the least amount of predictor variables, such that the
resulting model contains a highly reduced number of predictor variables whose
combination predicts values for the criterion variable that are closest to the observed
values. Statisticians recommend best-subsets analysis when the potential number of
predictor variables is large because stepwise methods tend to miss identifying
models that are equally good at balancing bias and variance as the resulting model
they produce.
3 R is an open-source statistical package based on Bells Labs (proprietary) S
programming language, a standard among statisticians for statistical programming
(see http://www.r-project.org/). R is gaining increasing popularity in academic
circles because of its reliability, statistical accuracy, and flexibility (it contains
numerous [tested] add-on modules) and due to the fact that it is freely available in
the public domain.
4 As a simplified example, because this process extrapolates the true coefficient for
each level, we can extrapolate individual level effects in the following fashion. For
instance, if the X1 coefficient were 8.0 for level 1, 5.0 for level 2, and 3.0 for level 3
and if the process calculates the difference coefficient for X1 between levels 1 and 2
to be 3.0 (i.e., 8.0 5.0 = 3.0) and between levels 2 and 3 to be 2.0 (i.e., 8.0
5.0 = 3.0), we infer that the difference between levels 1 and 3 by summing these
two difference coefficients, or 5.0 (i.e., (8.0 5.0) + (5.0 3.0)). See Hardy
(1993) for details.
References
Belz, J. (2004). Learner corpus analysis and the development of foreign language
proficiency. System, 32, 577597.
443
Collentine and Asencion-Delaney
Biber, D., & Conrad, S. (2001). Introduction: Multi-dimensional analysis and the study
of register variation. In S. Conrad & D. Biber (Eds.), Variation in English:
Multi-dimensional studies (pp. 313). London: Longman.
Biber, D., Davies, M., Jones, J., & Tracy-Ventura, N. (2006). Spoken and written
register variation in Spanish: A multi-dimensional analysis. Corpora, 1, 137.
Cheng, C., Lu, H., & Giannakouros, P. (2008). The uses of Spanish copulas by
Chinese-speaking learners in a free writing task. Bilingualism: Language and
Cognition, 11, 301317.
Collentine, J. G. (2004). The effects of learning contexts on morphosyntactic and
lexical development. Studies in Second Language Acquisition, 26, 227248.
Collentine, J. G. (2008). The role of discursive features in SLA modeling and
grammatical frequency: A response to Cheng, Lu and Giannakouros. Bilingualism:
Language and Cognition, 11, 319321.
Dalgaard, P. (2008). Introductory statistics with R. New York: Springer.
Fernandez Leborans, M. J. (1999). La predicacion: las oraciones copulativas. In I.
Bosque & V. Demonte (Eds.), Gramatica descriptiva de la lengua espanola (pp.
23542460). Madrid: Espasa.
Geeslin, K. (2002). The second language acquisition of copula choice and its
relationship to language change. Studies in Second Language Acquisition, 24,
419451.
Geeslin, K. (2003a). A comparison of copula choice in advanced and native Spanish.
Language Learning, 53, 703764.
Geeslin, K. (2003b). The role of adjectival features in the second language acquisition
of copula choice. In P. Kempchinsky & C. Pineros (Eds.), Theory, practice and
acquisition: Papers from the 6th Hispanic Linguistics Symposium and the
5th Conference on the Acquisition of Spanish and Portuguese (pp. 332351).
Medford, MA: Cascadilla Press.
Geeslin, K. (2005). Crossing disciplinary boundaries to improve the analysis of second
language data: A study of copula choice with adjectives in Spanish. Munich:
LINCOM Europa Publishers.
Geeslin, K., & Guijarro-Fuentes, P. (2006). Second language acquisition of variable
structures in Spanish and Portuguese speakers. Language Learning, 56, 53107.
Granger, S., Hung, J., & Petch-Tyson, S. (Eds.). (2002). Computer learner corpora,
second language acquisition and foreign language teaching. Amsterdam:
Benjamins.
Gunterman, G. (1992). An analysis of interlanguage development over time: Part II,
ser and estar. Hispania, 75, 12941303.
Halliday, M. A. K. (1970). Language structure and language function. In J. Lyons
(Ed.), New horizons in linguistics (pp. 140165). Harmondsworth, UK: Penguin
Books.
Hardy, M. A. (1993). Regression with dummy variables. Sage University Papers, QASS
# 07-093. Newbury Park, CA: Sage.
Language Learning 60:2, June 2010, pp. 409445
444
Collentine and Asencion-Delaney
Klein, W., & Perdue, C. (1997). The basic variety (Or: Couldnt natural languages be
much simpler?). Second Language Research, 13, 301347.
Lafford, B. A. (2004). The effect of the context of learning on the use of
communication strategies by learners of Spanish as a second language. Studies in
Second Language Acquisition, 26, 201225.
Leonetti, M. (1994). Ser y estar: estado de la cuestion. Barataria, 1, 182205.
Lujan, M. (1981). The Spanish copulas as aspectual indicators. Lingua, 54, 165210.
Miller, A. (2002). Subset selection in regression. Boca Raton, FL: Chapman &
Hall/CRC.
Myles, F. (2005). Interlanguage corpora and second language acquisition research.
Second Language Research, 21, 373391.
Myles, F., & Mitchell, R. (2004). Using information technology to support empirical
SLA research. Journal of Applied Linguistics, 1, 169196.
Rencher, A. (2002). Methods of multivariate analysis. New York: Wiley-Interscience.
Ryan, J., & Lafford, B. (1992). The acquisition of lexical meaning in a study abroad
environment: Ser + estar and the Granada experience. Hispania, 75, 714722.
Rutherford, W., & Thomas, M. (2001). The Child Language Data Exchange System in
research on second language acquisition. Second Language Research, 17, 195212.
Shehadeh, A. (2002). Comprehensible output, from occurrence to acquisition: An
agenda for acquisitional research. Language Learning, 52, 597649.
Silva-Corvalan, C. (1986). Bilingualism and language change: The extension of estar
in Los Angeles Spanish. Language, 62, 587608.
Silva-Corvalan, C. (1994). Language contact and change: Spanish in Los Angeles.
Oxford: Clarendon Press.
Siyanova, A., & Schmitt, N. (2007). Native and nonnative use of multi-word versus
one-word verbs. IRAL, 45, 119139.
Swain, M. (1985). Communicative competence: Some roles of comprehensible input
and comprehensible output in its development. In S. Gass & C. Madden (Eds.),
Input in second language acquisition (pp. 235253). Rowley, MA: Newbury House.
VanPatten, B. (1985). The acquisition of ser and estar in adult second language
learners: A preliminary investigation of transitional stages of competence.
Hispania, 68, 399406.
VanPatten, B. (1987). The acquisition of ser and estar: Accounting for developmental
patterns. In B. VanPatten, T. Dvorak, & J. Lee (Eds.), Foreign language learning: A
research perspective (pp. 6175). New York: Newbury House.
445
Copyright of Language Learning is the property of Wiley-Blackwell and its content may not be copied or
emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.