Академический Документы
Профессиональный Документы
Культура Документы
An exploration into the management of high volumes of complex knowledge in the social sciences and humanities.
Sheila M. Embleton Dorin Uritescu Eric S. Wheeler
Sheila M. Embleton
Department of Languages, Literatures and Linguistics, York University
Dorin Uritescu
co-editor of source atlas: Noul Atlas lingvistic romn. Crisana. Department of French, Glendon College, York University
Eric S. Wheeler
ITEC program, York University, Managing partner, Wheeler and Young Inc.
Agenda
n
The problem of high-volume, complex data in social sciences and humanities. Predecessor projects: English, Finnish dialect data Use of Multidimensional Scaling (MDS) to consolidate data Interactive, media-rich presentation
Romanian Online Dialect Atlas 4
Problem
In social sciences/humanities, data is often characterized by: n high volume n multiple variables or dimensions n no a priori model Dialectology provides a good exemplar
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 5
Dialectology
n n
Dialect atlases
n n
Record the details in maps Many maps needed to make an atlas Recovery of individual facts is possible but... Global understanding of the situation is lost in the volume of details
Romanian Online Dialect Atlas 7
English
n
Computer Developed Linguistic Atlas of English Applied MDS to already computerized data
English: results
n n n
2-D map of dialect locations No geographic information used Close correspondence to geography (as expected) Highlighted further problems of handling and understanding highvolumes of data
Northern counties at top Mid and southern counties below Somerset, Devon (South-west) is out of place (in East) Star-bursts, colours, dotted lines all help interpret map data
Romanian Online Dialect Atlas 10
Finnish
11
Kettunen (1940)
n n n n
Project: data computerization (largely done) Stage II: application of MDS (not yet done)
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 12
Map 1 (parts)
13
14
Ambiguity
?
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 15
Resolution
n
Make Editorial decision: X, not Y Mark as AMBIGUOUS X or Y Get more input X (says expert)
16
Lesson
In transforming data from one medium to another, even well-structured data will have unexpected pitfalls: n Design data-transformation carefully n Prototype your system; Find the problems early n Plan to work iteratively
17
Apply innovative contemporary methods in dialect geography to an online set of Romanian dialect data.
18
Romanian language
n
RODA: Part 1
Create online version of The New Romanian Linguistic Atlas. Crisana (Stan & Uritescu. 1996)
n n n
RODA Prototype 1
21
RODA: Part 2
Allow plug-in applications and other analyses of data, e.g. Apply Multidimensional Scaling to dialect data
n n n
Statistical technique Consolidate large amounts of data Complement to traditional analyses of small amounts of data
Romanian Online Dialect Atlas 22
Multidimensional Scaling
23
Multidimensional Scaling
n n
Statistical technique (Torgerson 1952) Used in sociology, psychology, marketing Reveals the scales along which data varies; gives a data-space Uses distances [(dis)similarities] among responses of subjects
Romanian Online Dialect Atlas 24
MDS
Axioms of metric n d(X,X) = 0 n d(X,Y) = d(Y,X) n d(X,Y) > 0 if X!Y n d(X,Y) m d(X,C) + d(C,Y) for all points C Matrix reflects these rules
2003 Embleton, Uritescu, Wheeler
10 12 15
25 0 10
A B C D E
A 0 25 10 12 24
B 0 11 17 10
0 11 15
0 26
25
MDS
n
n+1 points generate an ndimensional space MDS can reduce that highdimensional space to 2 (or 3) dimensions Result: complex data can be viewed as a map
26
MDS
n
to 2 u All 169 features included (and taken in relevant subsets) u Finnish, Romanian provide large data sets that can do the same
27
Online atlas provides a framework for accessing and presenting data Other applications can work within the framework to transform or process the data, such as:
F MDS
Summary
n
Humanities and Social Sciences deal with large, complex data sets Explore methods to access, process, present this kind of data Solutions include:
u MDS
References
n n
n n
n n
Embleton, Sheila M. and Eric S. Wheeler (2000). Computerized Dialect Atlas of Finnish: Dealing with Ambiguity. J. of Quantitative Linguistics 2000. 7.3. pp 227-231. Embleton, Sheila M. and Eric S. Wheeler (1997a). Multidimensional Scaling and the SED Data. in Wolfgang Viereck and Heinrich Ramisch. The Computer Developed Linguistic Atlas of England 2. Tuebingen: Max Niemeyer Verlag. Embleton, Sheila M. and Eric S. Wheeler (1997b). Finnish Dialect Atlas for Quantitative Studies. J. of Quantitative Linguistics 1997. 4.1-3. pp 99-102 Schiffman, Susan S. , M. Lance Reynolds, Forrest W. Young (1981). Introduction to Multidimensional Scaling. Theory, Methods, and Applications. New York: Academic Press. 411pp. Torgerson, W. S. 1952. Multidimensional scaling: 1. theory and method. Psychometrika. 17. 401-419. Stan, Ionel & Uritescu, Dorin. 1996. Noul Atlas lingvistic romn. Crisana. Vol. I. Bucharest: Romanian Academy Press. (2003. Vol. II. Bucharest: Romanian Academy Press) Uritescu, Dorin. 1983. Asupra repartitiei dialectale a graiurilor dacoromne. Graiul din Oas" / "On the Dialect Structure of Daco-Romanian. The Dialect of Oas/, in Materiale si cercetari dialectale II, ClujNapoca: The University of Cluj- Napoca, pp. 231 - 246. Uritescu, Dorin. 1984a. Subdialectul crisean. In: V. Rusu (ed.), Tratat de dialectologie romneasca. Craiova: Scrisul romnesc, 284-320, 916-930. Uritescu, Dorin. 1984b. Graiul din Tara Oasului. In: V. Rusu (ed.), Tratat de dialectologie romneasca. Craiova: Scrisul romnesc, 390-399, 964-967. Wheeler, Eric S. (2002). Zipf's Law and Why It Works Everywhere. Glottometrica 4, 45-48. Wheeler, Eric S. (2003). Multidimensional Scaling to Visualize Text Separation. Glottometrica 6 forthcoming. Wheeler, Eric S. (nd). Multidimensional scaling. chapter in Reinhard Koehler. (ed) forthcoming Handbook in Quantitative Linguistics. Romanian Online Dialect Atlas 31