Вы находитесь на странице: 1из 21

Computer Simulation Experiments to Assess the Performance of Measures of Quantity of Pottery Author(s): Clive Orton Source: World Archaeology,

Vol. 14, No. 1, Quantitative Methods (Jun., 1982), pp. 1-20 Published by: Taylor & Francis, Ltd. Stable URL: http://www.jstor.org/stable/124371 Accessed: 25/11/2010 03:57
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=taylorfrancis. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to World Archaeology.

http://www.jstor.org

Computersimulationexperimentsto assess the performanceof measuresof quantityof pottery


Clive Orton

Introduction The need for the quantification of groups of pottery has been stressed in recent years (e.g. to be describedin terms Young 1980: 5). Many modern methods of study requireassemblages of the proportions of different 'types' present - 'types' meaningsource, form or functional types, dependingon the use to which the informationis to be put. For example, seriationstudies requirethe proportionsof variousforms (e.g. Millett 1979), studies of distributionrequirethe proportions of pottery from different sources (Hodder and Orton 1976: 104-19, 164-7) and studies of activities carriedout at different partsof a site requirethe proportionsof functional types. Studies of other aspectsof pottery may also requirea proportionalbreakdownof pottery assemblages(e.g. Redman 1979). In each of these cases, the proportionsare used as a way of comparingtwo or more assemblages:the breakdown of just one group seems to be of little value. of quantity Measures Various ways of quantifyingpottery have been used or advocated.They are all measuresof the quantity of pottery, and can be divided into two groups: those which seek to measurethe number of vessels(consideringall vessels to be equal in a quantitativesense) and those which seek to measurethe amount of pottery as such (counting largeor heavy vesselsas more pottery than small or light ones). The point of using the first approachis often seen as the estimation of the numbersof vessels of each type represented. Variousestimateshave been proposed,e.g. minimum number of vessels are reconstructedas far as possible. 1977: vessels (a) (Vince 63): Fragmentswhich do not join but which could feasibly come from the same vesselare counted as part of the same vessel. Variantsof this estimate may use rim or base sherds,handles,other distinctivefeatures,or some combinationof these. (b) maximum numberof vessels: are reconstructedas far as possible. Any fragmentwhich does not recognisablybelong to the same vessel as another fragmentis counted as a separatevessel. (c) hybrid estimates: since (a) will in generalunderestimatethe numberof vesselsrepresented, and (b) generallyoverstatesit, varioushybrids have been suggestedto meet the case when (a) and (b) are not equal. World Archaeology Volume14 No. 1 Quantitativemethods $1.50/1

?R.K.P. 1982 0043-8243/82/1401-001

CliveOrton

(d) vessel-equivalents (Bloice 1971; Egloff 1973; Orton 1975): in principle, each sherd is expressed as a fraction of a complete vessel, and these fractionsare summed.In practicethis is rarely possible, and instead estimated vessel-equivalents (eves) are calculated, based on rim and/orbase sherds,handlesor some other distinctivefeature. (e) sherdcount: sometimesbased on all sherds,sometimesonly on rim sherds. (f) weight: sometimes used for the sake of speed and simplicity, and sometimesadvocatedas a way of givingmore emphasisto largervessels(Hulthen 1974). Otherrelatedmeasures,e.g. surfacearea(Glover 1972: 92-6; Hulthen 1974) or displacement volume (Hinton 1977: 231), have also been suggested. Previouswork Severalattempts have been made to assessthe relativemeritsof these methods. Solheim(1960) suggested that sherd count and weight together give more informationthan either separately. Hinton (1977) compared total sherd count, rim sherd count, weight and volume, concluding that sherd count was probably the most accurate,but weight was the fastest, while rim sherd count seemed unreliableand the measurementof volume was messy. However,his conclusions were arrivedat by comparingthe outcomes of the experimentwith his expectation of what the figures should show. Glover (1972: 93-6) and Millett (1980b) both demonstratedsignificant correlationsbetween different measures.Glover concluded that 'any one would be quite adequate as a measure of pottery frequency' (1972: 96) and Millett favoured sherd weight for mainly practical reasons. These assessments share the weakness that the measures are comparedeither with each other or with the author'sexpectations,and not with any objective standard. Criteria The assessmentpresentedhere is based on two assumptions: (i) that the pottery as found is of interest not intrinsically,but as representingpast activities. This point has been expressedin variousways - as the differencebetween 'sample'and 'population' (Cherry, Gamble and Shennan 1978: 1-8), or between 'archaeological' and 'systemic' context (Schiffer 1972), and perhapsmost forcibly by Binford (1981). A measureof quantity of pottery should therefore take into account the processes that turn a 'population' into a 'sample',or at least shouldbehaveconsistently undera varietyof processes. (ii) that the value of a measureof quantity should be seen in terms (a) of its behaviouras an estimator of the proportions of different types of pottery in an assemblage,expressedby its bias and variability(either as standarddeviation or combined into the mean squarederror),or more generally by the frequency distribution of the estimate that it produces, and (b) of its behaviour in comparingsuch proportions from different assemblages,which may have been createdundera wide varietyof circumstances. It was decided to examine four of the above measures- sherdcount (e), weight (f), vessels represented(assumingthat this can be accuratelycounted, since (a) and (b) are both dependent on the skill of the individualworker, and thus virtually impossible to model), and eves (d), based on rim sherds for the sake of simplicity, although the argumentcould be modified to take account of other estimators.

Computersimulationto assess the performanceof measuresof quantity of pottery Structureof presentwork

An initial assessment,the results of which will be discussedbelow, examinedjust the algebraic propertiesof these four measures(Orton 1975). It looked only at the question of bias, and not that of variability,and even that under very restrictiveconditions, the main ones being that all vessels of the same type would break into the samenumberof rim sherds,and that excavation froma hypothetical 'targetpopulation'.The resultsnevertheless could be seenas randomsampling gave interestingindications as to how the measurescould be expected to behave, but no firm conclusionscould be drawn. To look at the question more thoroughly, computer simulation is needed because the and there is apparently statisticaldistributions involvedare too complex to handle algebraically, no generalstatisticaltheory of broken objectswhich could be employed. Workon the problem recommencedin 1980, when computing facilities became availableas part of an SRCresearch fellowship at the Institute of Archaeology, University of London. The project, which is still (December 1981) in progress,can be dividedinto four stages: (i) modelling: the construction of stochastic models of the processes that transformwhole vesselsin use into the sherdsfound in an archaeological context, (ii) simulation:imitation of these processesundera wide rangeof circumstances, (iii) verification: matching the outcomes of carefully designed simulations against chosen excavatedassemblages, (iv) assessment: the analysis of the outcomes using standardarchaeologicalmethods (listed above, pp. 1-2). Since we know the 'populations'of vessels,we can see which methods give the 'best' inferencesabout them from the excavated'samples'. In this work, the modelling and simulation arenot seen as ends in themselves,but simply as a way of providingcontrolled data which can be used to assessthe performanceof the various methods of analysis.They are therefore kept as simple as possible, consistentwith the need to provide reasonablyrealistic data. The temptation to become deeply involved in the minutiae of how and where a pot breaks,for example,has been avoided.

Modelling The destructive processes are seen as one or more discrete 'events', each of which may cause breakageto a vessel or some part of it. The first event is generallyeither breakagein use or primary disposal of a complete (but perhaps foul) vessel. Subsequent events may include secondary disposal and disturbanceby later activity on the site. Excavationis modelled as a samplingprocess: the pottery containedin a context is a sampleof all the pottery derivedfrom the parent population,and if only part of a context can be excavatedthen sub-sampling occurs. of of the of no is either the selected At this stage, explicit account taken re-use problem (i) fragmentsof a broken vessel (e.g. as counters)or (ii) the possibility of breakageinto fragments too small to be retrievedor recorded.However,a method of analysisshould be able to detect either of these if it occurs on a significantscale. Breakage The basic idea was inspiredby Kirby and Kirby (1976), who experimentedwith tramplingon sherds of various sizes and recorded the sizes of the sherds produced. They presented their

CliveOrton

results as a matrix showing the numbersof sherds of certain sizes producedby subjecting100 sherds of a given size to one tramplingevent (see Table 1). Matricesto show the effect of any number of events can be generatedby matrix multiplication - there is no need to repeat the experiment. Table 1 Matrix to show the effect of one treadingevent on groups of 100 sherds of various sizes (from Kirbyand Kirby 1976: 237) Size class after event (cm) Size classbefore event (cm) 16-32
8-16 4-8 2-4 1.4-2 1-1.4 0.5-1

8-16 0
11 -

4-8 230
252 29 -

2-4 400
450 243 38 -

1.4-2 ---165 108 143 82 -

1-1.4 500 ------>


100 35 50 29 95 -

1 30
45 23 19 17 13 100

This approachis of interest because it providesa way of generatingsherdsfrom vesselsvia a of being deterministic.That is, given a numberof number of events, but has the disadvantage vessels of a certain type, subjectedto the same number of events, it will alwaysyield the same number of sherds,with the same size distribution.Whatis needed is a stochasticversion,which will yield the same results on average,but will vary from one occasion to anotherin a way that might be expected in reallife. We can build up a suitableprocessas follows:
(i) assume events occur in discrete time, at t = 0, 1, 2 ... (intervals between these may vary

widely), (ii) assume the size of each sherd can be measuredin terms of a size variable,e.g. maximum diameter, weight, or (for rim sherds) percentageof complete rim. This variablewill be called 'weight',usingthe word in a statisticalsense. Denote it by w. (iii) assumethat the valuesof this variablecan be dividedinto ranges,called 'states'(e.g. 1-2 cm, 5-10 per cent). Each sherd belongs to exactly one state. Therearem states in all, with weights
Wl, W2, .., Wm .

(iv) suppose that a sherd in state i undergoesan event at time t, generatingL sherdsin states
il, ... L. Then, since the total weight of pottery is unchanged,

I=L

1 WI=

Denote by py the probabilitythat a sherdgeneratedby this event is in statej, i.e.


Pij = pr(il = j,, < 1 L).

Computersimulationto assess the performanceof measuresof quantity of pottery

the states 1,... ., m so that wl > w2 > . .. >Wm > O, then pij= 0 ifj < i, and the Arranging probabilitiescan be written as a matrix
Pll 0 Pl2 ? . Plm P22 ..

P2m

...

Pmm

The averagesize of a sherdgeneratedfrom one of size wi


m
m fl

is

i.=
j=

WPijYj Pvi
-i=1

P
j=l

W = (Pw)i, Pwj

and the average numberof sherdsgeneratedis


hi. -= Wif/Wi.,

so that the expected numbergeneratedin the kth state is


Im

nik = Pikni. = PikWi

P
j= 1

pi,

which is directly related to the matrix shown by Kirby and Kirby. The matrixP and vector w together define a stochasticprocesswhich can be used to model the effects of successiveevents. It bears a superficialresemblanceto a MarkovChain(Cox and Miller1965: 76), but the py are not generallytransitionprobabilities.However,if all the wi are set equal, the processbecomes a MarkovChain. It can therefore be seen as a generalisationof this better-knownstochastic process, and is here called a Kirby process, defined by a probabilitymatrix P and a weighting vector w. It might be possible to generalisethe process by relaxingthe constraintthat the weight of pottery is unchangedby breakageto one that the weight is not increased,i.e.
L W= 1= 1 < wi

record. so taking into account factors of attrition and loss which often affect the archaeological

CliveOrton

Retrieval The retrievalof archaeological materialis frequentlytreatedimplicitly as an example of random Unease about this sampling. assumptionhas been transferredto the idea of population,giving rise to the idea of a 'target'population(Cherryet al., 1978: 5), which can be loosely if unkindly defined as the population that we need in order that our sample may be consideredrandom. Otherwisethere seems no logical connectionbetweenpopulation and sample,and quantification falls flat on its face. The undefined notion of 'representative sample' does not seem to help. So there is a need for a more generalform of sampling,of which randomsamplingwould be one example among many, and within which frameworkarchaeological samplescould be tested for their randomness.In order to generalise,it is useful to look in detail at randomsampling.A simple randomsample can be defined as a set of observationsdrawnfrom a populationin such a way that every possible observationhas an equal chance of being drawnat every trial (Davies and Goldsmith1972: 28); for morecomplicatedschemes(e.g. stratifiedsampling,samplingwith probability proportionalto size), the probabilitiesmay vary but they are known or at least calculable. This definition includes an implicit assumption of independence: the chances of have been drawingan observationat a trial is not affected by the details of which observations drawn in previous trials. This second part of the definition seems the most questionable in archaeologicalsituations: for example, one might feel that finding one sherd from a certain vessel increasesone's chances of finding another sherd from that vessel as againsta sherdfrom a differentvessel. By relaxingthis condition we can create a new family of samplingtechniques, defined by the procedure: 1 specify initial probabilitiesand numberof selections to be made, 2 select next observation(i.e. sherd), 3 change relative probabilitiesof selecting remainingobservationsaccordingto some rule, and reducenumberof selectionsto be made by one, 4 go back to 2. and these techniquesare thereforehere called recursive Such a procedureis called 'recursive' to the rulechosento changethe probabilities formulations are sampling.Many possible,according after each selection. So far, only one rule has been used even experimentally:the selection of vessel increasesthe probabilityof selectinga remaining any number of sherds from a particular sherd from the same vessel by a factor k, as againstsherds from other vessels. By varyingthe factor k we can imitate a rangeof archaeological situations,e.g. k < 0: rarelyis more than one sherdper vessel found ('dispersed'), k = 0: randomsampling, k > 0: severalsherdsfrom the samevessel are commonly found together, k > 0: mainly complete (but broken)vesselsfound. Thus the randomness(in this respect) of a sample can be tested by attemptingto model the observeddistributionof fragmentsizes (the word fragmentis used to mean a numberof joining sherds)by distributionsgeneratedundervariousvaluesof k, and testing k = 0. Trying to detect whether a sample is random from the evidence of the sampleitself sounds like the proverbial'pullingoneself up by one's bootstraps'.But one can detect non-randomness by examining the distribution of sizes of recovered fragments.Paradoxically,this approach works only becausepottery breaksreadily,and would not work on unbrokenmaterial.

Computersimulationto assess the performanceof measuresof quantity of pottery Simulation

which will imitate these models of Kirbyprocesses The next step is to write computerprograms and recursivesampling. Kirbyprocess Given the matrix P and the vector w, it is apparentlysimple to write a routine to simulatethe effect of a breakageevent. For example, if we startwith a whole rim, with w, = 100 per cent,
and a 'top row' of probabilities P l, 2 ,. . ., Plm, we can generate a random number in [1,100]

which will have probabilityP i of being in the ith state, to be the size of the first sherd,say s1 . In the same way we can generatethe sizes of the 2nd, 3rd, etc. sherds. But at some point the sum of the generated sizes will exceed 100 (it is very unlikely that they will sum to exactly 100), i.e. for some numberr
r

sli i= 1

> 100

Alternatively,we can see this problem as havingto subtracta sherdof size sir from a fragment
r--1

of size 100 -

sli, which is smallerthan sir (see fig. 1). This last fragmentwe call the

'left-over', and the problem of coping with it is the 'left-overproblem'. If we treat it as if it were the last sherd to be generated,it will on averagebe smallerthan required,and the number of sherds generatedwill be on averagegreater than specified by the matrix P. On the other hand, if we merge it with the previoussherd, we obtain a sherd which is on averagetoo large, and the numberof sherdsgeneratedwill on average be less than specified. This difficulty is overcome by use of a stopping rule, which sometimesmergesthe left-over with the previoussherd, and sometimesmakesit a new last sherd.The simple rule of mergingif the left-over is smallerthan the previoussherd seems to work satisfactorily.Morerefinedrules may be developedas work proceeds. The program KIRBY can simulate the breakageof complete rims into an averageof six sherdseach at a rate of about 400/second. Recursivesampling The programWREXHAM (a pun on 'rec sam' = recursivesampling)which simulatesthis is of the stages set out on p. 6 above, using a randomnumber generator very simple, consisting which to perform the selection. There is more difficulty in choosing values of the parameters of the sampling,which are: describethe circumstances (a) initial size of assemblage, (we use the convention that all vesselsof interest (b) proportionof'type A' vesselsin assemblage for the time being are called 'type A' and the rest 'type B'), (c) samplingfraction,

CliveOrton
16% --

?^-G~~~ \.~ \^~ )~

~~~~~242~~~%

,-sherd

Figure 1 Illustration of the s~'left-over' problem. Sherds of sizes 14%, 27%, 24% and 24% have been generated, leaving 11%, but the required size of the next is 16%.

(d) numbers of sherds per rim of different types of vessels, (e) relative weights of different types of vessels, (f) k (p. 6). Each parameter can take many values and in real life almost any combination of them could occur. Since we wish to assess the different measures across a wide spectrum of archaeological situations we need to perform simulations under many combinations of the parameters. The values chosen for initial investigation are as follows: (a) three sizes of assemblage: 10, 30 and 100 vessels. Assemblages larger than 100 vessels take up much computer time, and it should be possible to extrapolate from 100 vessels to larger groups. (b) five proportions of type A, 1 per cent, 3 per cent, 10 per cent, 30 per cent and 50 per cent. There is no point in simulating for higher proportions since this can be done by exchanging type A and type B. The parameters (a) and (b) are combined into twelve codes. (c) four sampling fractions, 1 per cent, 3 per cent, 10 per cent, 30 per cent. A quick look at some excavated groups suggested that sampling fractions of more than 30 per cent are rare (see p. 10 below). The parameters (a)-(c) together give a range of excavated assemblage size from 0.1 to 30 eves. (d) the parameters of number of sherds per rim is treated at three levels of complexity: (i) all rims of one type have the same number of sherds. This assumption is used to provide a direct comparison with the algebraic results (Orton 1975, and p. 14 below).

Computersimulationto assess the performanceof measuresof quantity of pottery

(ii) the numberof sherdsper rim varies,but all sherdsfrom one rim are the same size, (iii) the numbersand sizes of rim sherds vary, being generatedby a suitable Kirby process. Initially, simulationwas carriedout at level (i). More recently, some level (ii) simulationshave been done, while level (iii) is still in the future,when the programs KIRBYand WREXHAM are linked. Four combinationsof averagenumbersof sherdsper rim of 5 and 10 are used. (e) fortunately,it is not necessaryto varythe weightsof vesselswithin the simulations.Different relativeweights can be appliedto the outcome of any simulationwhen the resultsare analysed. (f) the first simulationswere carriedout with k = 0 (i.e. randomsampling).Workwith k = 1 is now in progressand other valueswill also be used. The number of combinationsof parameters (a)-(e) is 12 X 4 X 3 X 4 = 576. Twenty simulations are carriedout for each combinationof the parameters.

Verification This section asks, 'Do Kirby processesand recursivesamplinggenerategroups of pottery that look like realgroups?'To answerthis, we need to look in detail at some largegroupsof excavated pottery which have been recordedin a way which can yield the requiredstatisticalinformation. The site of Aldgate, excavated by the Museumof London, Departmentof UrbanArchaeology, in 1974 (Thompson1975) meets these needs.The pottery,which datesmainly to the seventeenth and eighteenth centuries,has been cataloguedand written up by the authorand JacquiPearce (Orton and Pearce, in Thompson, forthcoming). Three of the largest groups are chosen as of deposition and disturbance: a varietyof circumstances representing Context 1103: a cesspit, containinga relativelysmall group (38 vesselsrepresented= 23 eves) of mostly complete but broken pottery. It appearsto have undergoneonly one breakageevent and to have a high samplingfraction. The value of k might be expected to be high. Context 1156: the packing round a well, containing a largergroup (166 vessels represented; 35 eves) of more brokenpottery. It appearedto have undergonemore than one breakageevent, and to have a mediumsamplingfraction.The value of k might be expected to be low. Contexts 1241, 1262 and 1293 (referredto as 1241): fill of a large cellar, much removed for a later foundation,with largegroup(223 vesselsrepresented; 36 eves) of generallyratherbroken and incomplete pottery. The number of breakageevents and the samplingfractionboth appear to be fairly low. The value of k might be high, if the amount of mixing takingplace before the main disturbance (the later foundation)were small. Four main fabricgroupswere presentin these contexts, coded: 'borderware'(from north Hampshire/west Surrey) 'coarsered ware'(from London) 'delftware'(from London) 'fine red ware'(from Essex). while F is a tin-glazedware and, havingbeen fired twice, is far B, C and F are earthenwares, more friable than the other fabrics. Differencesin the matrix P for the earthenwaresand the delftwareare thereforeto be expected, andareconfirmedby a chi-squared test. As no significant differences between B, C and F are found, these fabric groups are treated as one. Significant B= C= D= F=

10

Clive Orton

differences are found between different forms, but appear to reflect the production of different forms in different fabrics. Similarly, observed differences between vessels of different diameters appear to relate to the different forms produced to these diameters, and so to the fabrics again. A major factor in the observed differences are the plates, which are (i) mostly of delftware, (ii) have large rim diameters and (iii) break into large numbers of rim sherds. The rather complicated relationships can be reflected adequately by assuming one Kirby process for all the earthenware and another for the delftware. A Kirby process can be verified only for assemblages with a large number of reasonably complete vessels - in this case, only for Context 1103. Since for this context we expect k = 1, we can verify only the 'top row' of the matrix P. The agreement between the simulated and observed rim sherds is very good, which is to be expected as the parameters (the py) are estimated directly from the data. Indeed, over-fitting could be more of a problem than lack of fit. The basic assumption, that the same matrix can be applied to model successive breakage events, remains untested (see Table 2). Table 2 Example of probability matrix P, based on Context 1103 Size class after event (% of rim) Size class before event (% of rim) 100 51-100 41-50 0.02 31-40 0.03 21-30 0.21 11-20 0.47 6-10 0.24 1-5 0.03

Recursive samples can be fitted by varying (i) the number of vessels of each type (called nA, nB), (ii) the sampling fraction (called f) and (iii) k, taking as given the numbers of sherds per vessel (SA and SB) suggested by the data. The strategy used is to start with k = 0 (random sampling) and to obtain 'best fit' values of nA, nB, and f. If even the best fit does not seem adequate (using the Kolmogorov-Smirnov goodness-of-fit test, see e.g. Lindgren 1962: 300-4), k is increased and the procedure repeated. Fitting is attempted at the first two levels of complexity (see pp. 8-9), as a link with the KIRBY program is not yet effected. If level (i) is used, one tries to fit the distribution of the percentages of each rim present, while at level (ii) it is usually best to fit the distribution of the number of sherds present per rim. This is because the total numbers of sherds per rim commonly observed, between 5 and 10, all give percentages falling in the 10-20 per cent range, and if few sherds per rim are present (as in Contexts 1156 and 1241) there is a 'hump' in the distribution at this interval. For Context 1103, the distribution of the percentage of each rim present is bimodal (see fig. 2 and Table 3). Seventeen rims are complete or almost so, while twenty-one other rims contribute just over six eves. The 'best' estimates of nA = 33, f =0.64 for k = 0 (there is too little delftware to be tested statistically) give an appallingly bad fit - at best the KolmogorovSmirnov statistic Dmax has a value of 0.39. The fit can be improved by increasing k, decreasing f and increasing nA, but even the 'best' combination (k = 15, nA = 60, - 0.40) has an unacceptably bad fit (see fig 2). Increasing the value of k clearly increases the variance of the distribution, but it seems unlikely that any increase could produce a consistently bimodal distribution. The 'upper' part of the distribution (more than 50 per cent of rim present) has on

Computersimulationto assessthe performanceof measuresof quantity of pottery

11

4-'

) 40QI)

E
'
C
(

30-

/
20J , s

\/
\ ,/

0%

of each rim present

Figure 2 Distribution of percentages of each rim present in Context 1103 Histogram shows actual data; circles and continuous lines, the best-fit frequency curve for k = 0; crosses and broken lines, the best fit frequency curve for any k. Table 3 Distribution of percentage of each vessel present in Context 1103 (excluding delftware), with best fit distributions for k = 0 and any value of k Size class (% of rim) 1-16 17-33 34-50 51-66 61-83 84-100 Actual no. of vessels 7 2 6 2 16 Fitted no. of vessels k= 0 any k 2 2 9 11 7 2 3 1 4 7 4 14

average 6.5 sherds per rim, while the 'lower' part has on average 10, suggesting that there are two samples, one having undergone one breakage event and one two or possibly more. Using this model, the 'best' values of the parameters are nA = 24, f = 0.87 and nA = 14, f 0.08 = = 0 for k k the well both = and Thus the data 1. (Dmax respectively, fitting very 0.08) pottery from Context 1103 can be seen as two random samples one almost entire population of presumably primary deposition and one small sample of probably secondary material. The distribution of percentages of rims present in Context 1156 most closely resembles the 20 per cent sample of a set of experimental random samples, so this value is used as a starting point. The 'best' fit for k = 0 is given by nA = 165, nB - 25, f= 0.20, SA = 6, SB = 11, or nA = 160, nB = 25, f = 0.20, SA = 7, SB - 10, both giving Dmax = 0.065. The fit can be improved marginally by increasing k to k = 2, when nA = 300, nB = 45, f= 0.15, SA = 6, SB = 11 gives Dmax = 0.055 (see fig. 3 and Table 4). However, neither of these results differs significantly (95 per cent confidence level) from the observed distribution.

12

Clive Orton

50-

C C)
O

\ x \ 40-

Q.
30-'

4-

10-

Key as fig. 2 Table 4 Distribution of percentage of each vessel present in Context 1156 with best fit distributions for k = 0 and any value of k Size class (% of rim) 1-14 15-28 29-42 43-57 58-71 72-85 86-100 Actual no. of vessels 79 47 20 9 4 4 3 Fitted no. of vessels k =0 any k 72 36 20 11 1 76 58 29 12 1 2 -

The distribution of the percentages of rims present in Context 1241 most closely resembles the 10 per cent sample of the set of experimental random samples, so this value is used as a 0 is given by nA = 136, nB = 100, f= 0.15, SA = 9, starting point. The 'best' fit for k SB = 11, giving Dmax = 0.080. A better fit can be found by increasing k to k = 1, when 0.13, SA = 9, SB = 11 gives Dmax = 0.055, but again neither result nA = 180, nB= 140, f differs significantly from the observed distribution (see fig. 4 and Table 5). The need for values of k greater than zero (i.e. for other than random sampling) has not been conclusively demonstrated from these groups. In general, both random and recursive samples fit the data adequately. The simulated samples often lack the few really high percentages observed in the excavated groups, because they operate at level (i) and to a lesser extent level (ii). A more realistic starting point (e.g. actually using a Kirby process) would generate the

Computersimulationto assess the performanceof measuresof quantity of pottery

13

Q) Q) 40-

0a E 3SC 20

x- -..

-'
0

10

20

30

50

60

80

90

% of each rim present Figure 4 Distributionof percentagesof each rim presentin Context 1241 Key as fig. 2 Table 5 Distributionof percentageof each vessel present in Context 1241 with best fit distributions for k = 0 and any value of k Size class (%of rim)
1-10 11-20 21-30 31-40

Actual no. of vessels


86 60 29 13

Fitted no. of vessels k= 0 any k


74 60 32 14 76 69 38 13

41-50
51-60
61-70 71-80

9
2
2 1

1
-

5
i
-

81-90
91-1003 -

occasionalvery largerim sherdand hence a largepercentage,fitting the observationsbetter and probablymakinghighervaluesof k less necessarythan they appearfrom the above simulations. However,many more groupswill have to be examinedbefore we can say that randomsampling is adequateas a model for most situations.

Results The simulatedgroups of pottery are examinedby the programANALYSE,which estimatesthe on pp. 1-2, assumproportionof type A vesselsin the originalpopulationusingthe four measures = recalculates ing equal averageweights of type A and type B (WA wB). The programWTANAL

14

Clive Orton

the results for several simulations on different assumptions about relative weights (e.g. WA = 2wB). Three aspects of the results are presented here: (i) bias, (ii) relative variability of the measures and (iii) variability of estimates at different combinations of the parameters. Bias The algebraic work (Orton 1975) suggested that (as large-sample approximations) (a) sherd count gives estimates biased by the proportion
SA(nA + nB) / (sAnA + SBnB),

(b) weight gives estimates biased by the proportion


WA(nA + nB) / (WAnA + WBnB),

(c) vessels represented gives estimates with a variable bias, depending on fas well as SA and SB, and tending to 1 (i.e. unbiased) as f- 100 per cent and tending to
SA(nA + nB)/ (SAnA + SBnB) asf

- 0 per cent.

(d) eves gives unbiased estimates. The outcomes of the simulations are in broad agreement with these results, although some small-sample effects are apparent when nA + nB < = 30, e.g. (i) a small positive bias in all estimates of the proportion of type A when sA >
WB,
SB

and WA

(ii) a negative bias in estimates based on weight when WA > WB and SA < SB (this tends to cancel out (i) when WA > WB and SA > SB). They are probably not important as they occur = only when variability is high. Increasing k to k 1 does not appear to affect bias. On this basis, eves is the best measure, being unbiased. The bias in estimates based on sherd count or weight is predictable, and could be estimated and corrected for reasonably easily. The use of vessels represented gives rise to serious problems, because of the involvement of in the bias. It becomes very difficult to compare assemblages, since irrelevant factors (such as f) become confused with the factors we seek to estimate (e.g. nA /(nA + nB)). Comparison of assemblages quantified by different measures also has its problems, since differences in proportions may be genuine or may reflect differences between the measures. Relative variability At the first level of complexity (p. 8), sherd count is equivalent to eves when sA -= B, and weight is equivalent to eves when WA = WB, and only a few comparisons are possible. In general there is little difference between the measures when the sampling fraction is low (f< 3 per cent), but vessels represented has a lower standard deviation than the others when f = 10 + nB) per cent and often a much lower one when f = 30 per cent, especially when nA /(A is small (< = 3 per cent). Sherd count has a lower standard deviation than weight or eves when

Computer simulation to assess the performance of measures of quantity of pottery


SA < SB,

15

especially when nA < < nB, but a greater s.d. when SA > SB. Weight has a greater standard deviation than eves when WA > wB and lower when WA < wB. These last two observations suggest that as a rule of thumb the standard deviation of estimates based on weight increases with the weight of individual sherds of type A. In general, these considerations do not seem to outweigh the conclusions based on bias, except that when the sampling fraction is high and A = SB, vessels represented can give very good estimates. Initial results at the second level of complexity suggest a general increase in variability over the first level (as one would expect), although this effect is often obscured by sampling variation. Sherd count seems to give a very slightly lower standard deviation than eves when SA = SB; otherwise the pattern of relationships between the measures is much as before. The increase of k to k = 1 seems to increase variability in general, without (on current evidence) favouring any one measure in particular.

50-

30-

0)
0
>K

Q.

E
(I
0 3x

0 I+ 3 0o 30 50

actual

/o 'typeA

pottery

Figure 5 90 per cent empirical confidence limits for estimates of proportion of 'type A' pottery in assemblage, as this proportion varies from 1 % to 50%. Parameters are nA + nB = 100,
f=
30%, SA =
sB

= 5

Circles = estimates based on weight or eves, estimates based on vessels represented, crosses plusses = estimates based on rim sherd count.

16

Clive Orton

Variability of estimates One aim of the work is to produce tables showing confidence limits of estimated percentages for various sizes of assemblage, which could be used in assessing the significance of differences observed between assemblages. It would be impossible to show a full range of tables here, but provisional specimen tables are given as Tables 6 and 7 and illustrated in figs 5 and 6.

50? +

30-

4-'

0
+ 0

104-

-o
0. +

E
+a C!)

3-o0

I-

10

30 30

5 50

actual ?/o'type

A pottery

Figure 6 90 per cent empirical confidence limits for estimates of proportion of 'type A' pottery in assemblage, as this proportion varies from 1% to 50%. Parameters are nA + nB = 100,
f=30%, sA = 10, SB = 5

Key as Fig. 2. Table 6, with an assemblage size of 30 eves or 150 rim sherds, shows a useful level of precision for proportion of type A from 3 per cent upwards. A comparison of the standard deviations with the empirical confidence intervals (which are calculated directly from histograms of the estimates) shows a degree of non-normality in the distributions. Since we have set SA = SB and WA = WB for this table, there are no problems of bias. Table 7 cannot be compared directly with Table 6 as it is (provisionally) based on calculations at a lower level of complexity (pp. 8-9). The assemblage size is again 30 eves, but the number of rim sherds varies from 150 to 250. Again, useful levels of precision are achieved for proportions from 3 per cent upwards, and even for 1 per cent are usable with caution. Table 7 illustrates the

Computer simulation to assess the performance of measures of quantity of pottery

17

Table 6 Estimates and confidence intervals for nA when nA = 1, 3, 10, 30, 50, f= 30 per cent, nA + nB = 100, SA = SB = 5, second level of complexity Weight, eves av. s.d.
0.010 0.031 0.104 0.296 0.495 0.006 0.013 0.025 0.024 0.037

nA/n
0.01 0.03 0.10 0.30 0.50

range
0.-0.020 0.012-0.054 0.06-0.14 0.26-0.34 0.44-0.55

Vessels represented av. s.d. range


0.011 0.032 0.100 0.294 0.495 0.005 0.007 0.016 0.021 0.024 0-0.013 0.024-0.040 0.08-0.12 0.26-0.33 0.47-0.54

Count av. s.d.


0.010 0.029 0.096 0.286 0.496 0.006 0.013 0.022 0.024 0.033

range
0-0.020 0.010-0.050 0.06-0.13 0.25-0.33 0.45-0.54

Confidence intervals are empirical 90 per cent intervals.

Table 7 Estimates and confidence intervals for nA when nA nA +nB = 100, SA = 10, SB = 5, first level of complexity Weight, eves av. s.d.
0.010 0.029 0.097 0.297 0.503 0.005 0.009 0.012 0.025 0.024

1,3, 10, 30, 50, f-

30 per cent,

/n nA
0.01 0.03 0.10 0.30 0.50

range
0.0025-0.0175 0.015-0.041 0.075-0.115 0.25-0.35 0.47-0.55

Vessels represented av. s.d. range


0.012 0.034 0.116 0.333 0.539 0.0004 0.005 0.005 0.015 0.017 0.0113-0.0128 0.024-0.037 0.1075-0.125 0.31-0.36 0.51-0.56

Count av. s.d.


0.020 0.057 0.177 0.458 0.669 0.010 0.016 0.020 0.029 0.021

range
0.005-0.035 0.030-0.076 0.144-0.200 0.41-0.51 0.64-0.71

Confidence intervals are empirical 90 per cent intervals. practical effects of bias when SA : SB (in this case, SA = 2SB), with the true proportion falling outside the confidence interval three times each for sherd count and vessels represented. Table 6 also shows that the standard deviations obtained by using rim sherd count are very close to those that can be calculated in the usual way from the numbers of sherds. This is expected when k = 0, but provides an indirect argument for the existence of values of k > 0. If k = 0, we could in principle increase the precision greatly by counting all sherds and not just rim sherds (typically, in the complete vessels from Aldgate, there are four to five times as many sherds in total as there are rim sherds, and a higher ratio would be expected in more broken pottery). Although some improvement in precision is to be expected, since we are increasing the total information available, this scale of improvement (i.e. a halving or more of standard deviation) is not to be expected, and correlation between the presence of rim and body sherds is the reason. As a reductio ad absurdum we could increase the precision of our estimates by breaking each sherd in half!

Sample size One aim of the work is to establish a minimum sample size below which it is simply not worth quantifying pottery. Preliminary study of the results suggests that a more useful approach is the

18

CliveOrton

minimum quantity of pottery of type A upon which one can base an estimate of its proportion in the population. So far, it appears that this limit is about 1 eve; e.g. if type A forms 10 per cent of our sample, we need an assemblage of 10 eves to estimate with reasonable precision, but if it is 25 per cent we need only 4 eves (10 per cent of 10 = 25 per cent of 4 = 1 eve). This is likely to be a useful rule of thumb. This approach also helps in the interpretation of negative evidence. For example, if a type is 3 per cent of the population, we need a sample of at least 30 eves to be reasonably (95 per cent) sure of detecting it at all; if 10 per cent, a sample of at least 10 eves is needed. Arguments based on the absence of a type which is never very common are suspect unless based on samples of these sizes.

Summary and conclusions The attempt to answer an apparently simple archaeological question - When and how should one quantify pottery? - has led to some interesting statistical innovations. The Kirby process is a generalisation of a standard stochastic process, the Markov Chain, merits further theoretical investigation and is likely to find applications outside archaeology. Recursive sampling is a new technique about which relatively little is known (only one of many possible formulations has been studied here) and which should lead to greater realism in modelling archaeological problems. The complete results will be of practical use, because they will enable archaeologists to assess the validity of arguments based on comparisons of pottery assemblages. The outcomes of all simulations, and the data from Aldgate which have been used as controls, have been stored on magnetic tape and are available on request. This paper was given as a Research Seminar at the Institute of Archaeology, University of London, in November 1981, and I should like to thank all those who commented on it. The figures were drawn by Rob Ellis. The work forms part of a project carried out under SRC Grant GR/B 09377. 1.i.1982 Institute of Archaeology London

References Binford, L. R. 1981. Bones: Ancient Men and Modern Myths. New York: Academic Press. Bloice, B. J. 1971. Note in G. J. Dawson, Montague Close Part 2, London Archaeologist 1(11): 250-1. Cherry, J. F., Gamble, G. and Shennan, S. 1978. Sampling in Contemporary British Archaeology. Oxford: British Archaeological Reports 50. Cox, D. R. and Miller, H. D. 1965. The Theory of Stochastic Processes. London: Methuen. Davies, O. L. and Goldsmith, P. L. (eds). 1972. Statistical Methods in Research and Production. 4th ed. Edinburgh: Oliver & Boyd. Egloff, B. J. 1973. A method for counting ceramic rim sherds. American Antiquity. 38(3): 351-3.

Computer simulation to assess the performance of measures of quantity of pottery Glover, I. C. 1972. Excavations in Timor. PhD thesis, Australian National University.

19

Hinton, D. A. 1977. 'Rudely made earthen vessels' of the twelth to fifteenth centuries AD. In Ceramics and Early Commerce (ed. D. P. S. Peacock). London: Academic Press, pp. 221-38. Hodder, I. and Orton, C. R. 1976. Spatial Analysis in Archaeology. Cambridge University Press. Hulthen, B. 1974. On choice of element for determination of quantity of pottery. Norwegian Archaeological Review. 7(1): 1-5. Kirby, A. and Kirby, M. 1976. Geomorphic processes and the surface survey of archaeological site in semi-arid areas. In Geoarchaeology (ed. D. Davidson and M. Shackley). London: Duckworth, pp. 229-53. Lindgren, D. W. 1962. Statistical Theory. New York: Macmillan. Millett, M. 1979. The dating of Farnham (Alice Holt) pottery. Britannia. 10: 121--77. Millett, M. 1980a. An approach to the functional interpretation of pottery. In Pottery and the Archaeologist (ed. M. Millett), Institute of Archaeology Occasional Paper no. 4, pp. 35-48. Millett, M. 1980b. How much pottery? In Pottery and the Archaeologist (ed. M. Millett), pp. 77-80. Orton, C. R. 1975. Quantitative pottery studies: some progress, problems and prospects. Scientific Archaeology. 16: 30-5. Orton, C. R. and Pearce, J. E. (forthcoming). The pottery. In Excavations at Aldgate (ed. A. Thompson). London and Middlesex Archaeological Society Special Paper. Redman, C. L. 1979. Description and inference with late medieval pottery from Qsar es-Seghir, Morocco. Medieval Ceramics. 3: 63-79. Schiffer, M. B. 1972. Archaeological context and systemic context. American Antiquity. 37(2): 156-65. Solheim, W. G. 1960. The use of sherd weights and counts in the handling of archaeological data. Current Anthropology. 1: 325-9. Thompson, A. 1975. An excavation at Aldgate. London Archaeologist. 2(12): 31'7-19. Vince, A. G. 1977. Some aspects of pottery quantification. Medieval Ceramics. 1: 63-74. Young, C. J. (ed.). 1980. Guidelines for the Processing and Publication of Roman Pottery from Excavations. Directorate of Ancient Monuments and Historic Buildings, Occasional Paper no. 4.

Abstract C R. Orton Computer simulation experiments to assess the performance of measures of quantity of pottery Reasons for quantifying pottery, and the various measures of quantity that are available, are reviewed, leading to a statement of the criteria that such measures should satisfy. Two new models for the breakage and retrieval of pottery - Kirby processes and recursive sampling are presented, and fitted to large groups of post-medieval pottery from a site at Aldgate, City of London. These models are used to assess the performance of four measures of quantity sherd count, weight, vessels represented and vessel-equivalents - under a wide variety of con-

20

Clive Orton

ditions, using computer simulation. Initial results show that only 'vessel-equivalents' is unbiased under a range of conditions, but suggest that 'vessels represented' may often give the lowest sampling error. The simulations are also used to suggest the minimum size of sample that can give reliable results, again under a range of conditions.

Вам также может понравиться