Вы находитесь на странице: 1из 31

Romanian Online Dialect Atlas

An exploration into the management of high volumes of complex knowledge in the social sciences and humanities.
Sheila M. Embleton Dorin Uritescu Eric S. Wheeler

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

Romanian Online Dialect Atlas


n

Sheila M. Embleton
Department of Languages, Literatures and Linguistics, York University

Dorin Uritescu
co-editor of source atlas: Noul Atlas lingvistic romn. Crisana. Department of French, Glendon College, York University

Eric S. Wheeler
ITEC program, York University, Managing partner, Wheeler and Young Inc.

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

Romanian Online Dialect Atlas


Supported (2003-2006) by a grant from: Social Sciences and Humanities Research Council (Canada)

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

Agenda
n

The problem of high-volume, complex data in social sciences and humanities. Predecessor projects: English, Finnish dialect data Use of Multidimensional Scaling (MDS) to consolidate data Interactive, media-rich presentation
Romanian Online Dialect Atlas 4

2003 Embleton, Uritescu, Wheeler

Problem
In social sciences/humanities, data is often characterized by: n high volume n multiple variables or dimensions n no a priori model Dialectology provides a good exemplar
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 5

Dialectology
n n

Explain the variations in linguistic usage across geography Simple example:


church vs. kirk (< OE cirice)

More realistic problem:


169 features in 313 locations (SED) 213 features in 400+ locations (Finnish)

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

Dialect atlases
n n

Record the details in maps Many maps needed to make an atlas Recovery of individual facts is possible but... Global understanding of the situation is lost in the volume of details
Romanian Online Dialect Atlas 7

2003 Embleton, Uritescu, Wheeler

English
n

Survey of English Dialects (SED)


u 169

features at 313 locations

Computer Developed Linguistic Atlas of English Applied MDS to already computerized data

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

English: results
n n n

2-D map of dialect locations No geographic information used Close correspondence to geography (as expected) Highlighted further problems of handling and understanding highvolumes of data

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

English Dialect Map


n n

Northern counties at top Mid and southern counties below Somerset, Devon (South-west) is out of place (in East) Star-bursts, colours, dotted lines all help interpret map data
Romanian Online Dialect Atlas 10

2003 Embleton, Uritescu, Wheeler

Finnish

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

11

Kettunen (1940)
n n n n

The Dialect Atlas of Finland


213 maps x 530 locations Up to 16 features per map Typically 1-3 features per location ~120,000 data items

Project: data computerization (largely done) Stage II: application of MDS (not yet done)
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 12

Map 1 (parts)

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

13

Special software to facilitate accurate data entry

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

14

Ambiguity

?
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 15

Resolution
n

Make Editorial decision: X, not Y Mark as AMBIGUOUS X or Y Get more input X (says expert)

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

16

Lesson
In transforming data from one medium to another, even well-structured data will have unexpected pitfalls: n Design data-transformation carefully n Prototype your system; Find the problems early n Plan to work iteratively

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

17

Romanian Online Dialect Atlas: Crisana


n

Apply innovative contemporary methods in dialect geography to an online set of Romanian dialect data.

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

18

Romanian language
n

Key to understanding the evolution of all Romance languages


u Early

branch, distinct from FrenchSpanish-Italian line

Exemplar of non-hierarchical, dialect variation, and linguistic continua


u Transition

areas contain mixtures of dialect features and specific features


Romanian Online Dialect Atlas 19

2003 Embleton, Uritescu, Wheeler

RODA: Part 1
Create online version of The New Romanian Linguistic Atlas. Crisana (Stan & Uritescu. 1996)
n n n

Available on internet and CD Default interpretations Interactive interface to data


u custom

select data for a map

Add audio clips to illustrate data


Romanian Online Dialect Atlas 20

2003 Embleton, Uritescu, Wheeler

RODA Prototype 1

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

21

RODA: Part 2
Allow plug-in applications and other analyses of data, e.g. Apply Multidimensional Scaling to dialect data
n n n

Statistical technique Consolidate large amounts of data Complement to traditional analyses of small amounts of data
Romanian Online Dialect Atlas 22

2003 Embleton, Uritescu, Wheeler

Multidimensional Scaling

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

23

Multidimensional Scaling
n n

Statistical technique (Torgerson 1952) Used in sociology, psychology, marketing Reveals the scales along which data varies; gives a data-space Uses distances [(dis)similarities] among responses of subjects
Romanian Online Dialect Atlas 24

2003 Embleton, Uritescu, Wheeler

MDS
Axioms of metric n d(X,X) = 0 n d(X,Y) = d(Y,X) n d(X,Y) > 0 if X!Y n d(X,Y) m d(X,C) + d(C,Y) for all points C Matrix reflects these rules
2003 Embleton, Uritescu, Wheeler

10 12 15

25 0 10

A B C D E

A 0 25 10 12 24

B 0 11 17 10

0 11 15

0 26

Romanian Online Dialect Atlas

25

MDS
n

n+1 points generate an ndimensional space MDS can reduce that highdimensional space to 2 (or 3) dimensions Result: complex data can be viewed as a map

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

26

MDS
n

Can use MDS to consolidate data


u English

312 dimensions reduced

to 2 u All 169 features included (and taken in relevant subsets) u Finnish, Romanian provide large data sets that can do the same

2003 Embleton, Uritescu, Wheeler

Romanian Online Dialect Atlas

27

Interactive, media-rich presentation


Objectives n Make data accessible, useful to a wide research audience Methods n Interactive selection of data n Constructive presentation of data n Addition of audio and other media Online is much more than a book!
2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 28

Framework and Appns


n

Online atlas provides a framework for accessing and presenting data Other applications can work within the framework to transform or process the data, such as:
F MDS

data consolidation F Tools to analyze dialect variants of phonemes (proposed) F Others


2003 Embleton, Uritescu, Wheeler Romanian Online Dialect Atlas 29

Summary
n

Humanities and Social Sciences deal with large, complex data sets Explore methods to access, process, present this kind of data Solutions include:
u MDS

type processing u Online, interactive, rich presentation


n

Example: Romanian Online Dialect Atlas


Romanian Online Dialect Atlas 30

2003 Embleton, Uritescu, Wheeler

References
n n

n n

n n

Embleton, Sheila M. and Eric S. Wheeler (2000). Computerized Dialect Atlas of Finnish: Dealing with Ambiguity. J. of Quantitative Linguistics 2000. 7.3. pp 227-231. Embleton, Sheila M. and Eric S. Wheeler (1997a). Multidimensional Scaling and the SED Data. in Wolfgang Viereck and Heinrich Ramisch. The Computer Developed Linguistic Atlas of England 2. Tuebingen: Max Niemeyer Verlag. Embleton, Sheila M. and Eric S. Wheeler (1997b). Finnish Dialect Atlas for Quantitative Studies. J. of Quantitative Linguistics 1997. 4.1-3. pp 99-102 Schiffman, Susan S. , M. Lance Reynolds, Forrest W. Young (1981). Introduction to Multidimensional Scaling. Theory, Methods, and Applications. New York: Academic Press. 411pp. Torgerson, W. S. 1952. Multidimensional scaling: 1. theory and method. Psychometrika. 17. 401-419. Stan, Ionel & Uritescu, Dorin. 1996. Noul Atlas lingvistic romn. Crisana. Vol. I. Bucharest: Romanian Academy Press. (2003. Vol. II. Bucharest: Romanian Academy Press) Uritescu, Dorin. 1983. Asupra repartitiei dialectale a graiurilor dacoromne. Graiul din Oas" / "On the Dialect Structure of Daco-Romanian. The Dialect of Oas/, in Materiale si cercetari dialectale II, ClujNapoca: The University of Cluj- Napoca, pp. 231 - 246. Uritescu, Dorin. 1984a. Subdialectul crisean. In: V. Rusu (ed.), Tratat de dialectologie romneasca. Craiova: Scrisul romnesc, 284-320, 916-930. Uritescu, Dorin. 1984b. Graiul din Tara Oasului. In: V. Rusu (ed.), Tratat de dialectologie romneasca. Craiova: Scrisul romnesc, 390-399, 964-967. Wheeler, Eric S. (2002). Zipf's Law and Why It Works Everywhere. Glottometrica 4, 45-48. Wheeler, Eric S. (2003). Multidimensional Scaling to Visualize Text Separation. Glottometrica 6 forthcoming. Wheeler, Eric S. (nd). Multidimensional scaling. chapter in Reinhard Koehler. (ed) forthcoming Handbook in Quantitative Linguistics. Romanian Online Dialect Atlas 31

2003 Embleton, Uritescu, Wheeler

Вам также может понравиться