Вы находитесь на странице: 1из 47

TheRelationalModel

SergeAbiteboul g
INRIASaclay,CollgedeFranceetENSCachan

20/03/2012

Organization
Theprinciples Th i i l
Abstraction U i Universality li Independence

Abstraction: Ab t ti therelationalmodel th l ti l d l Universality: mainfunctionalities Independence: theviewsrevisited Optimization Complexityandexpressiveness Conclusion


20/03/2012 2

Theprinciples

3/20/2012

DBMS

Goal:themanagementoflargeamountsofdata
L Largeamountsofdata:database t fd t d t b Softwarethatdoesthis:DBMS

Managementsystems,databases Management systems databases Characteristicsofthedata



3/20/2012

Persistenceovertime(years) Persistence over time (years) Size(giga,tera,etc.). Sharedamongmanyusersandprograms Maybedistributedgeographically Heterogeneousstorage:harddisk,network


4

Mediation
Thedatamanagementsystemactsasamediatorbetweenintelligentusers h d d b ll andobjectsthatstoreinformation
t d ( Fil (t d t,d (Film(t,d, Bogart) Sance(t,s,h))

O et quelle heure puis-je voir un film avec Bogart?

intget(intkey){ inthash=(key%T S);while(table[h as ] U &&ta ash]=NULL&&ta ble[hash] >getKey()=key) hash=(hash

Thequestionsaretranslatedintofirstorderlogicandthenintoprograms q g p g withpreciseandunambiguoussyntaxandsemantics Alicedoesnotwanttowritethisprogram;shedoesnothaveto

3/20/2012

1st principle: abstraction principle:abstraction


Datamodel d l
Definitionlanguagefordescribingthedata Manipulationlanguage(queriesandupdates)

Simpledatastructure
Relations Trees Trees Graphs

Formallanguageforqueries
L i Logics Declarativevs.Procedural Graphicallanguages

3/20/2012

ComplexgraphicalquerieswithMSAccess 6

Towardsabstraction:highleveldatamodels Towards abstraction: high level data models


Therelationalmodel:Codd1970 Dataarerepresentedastables Queriesareexpressedinrelationalcalculus: declarative Inpractice,aricherlanguage:SQL Verysuccessfulbothscientificallyandindustrially Ver s ccessf l both scientificall and ind striall
CommercialsystemssuchasOracle,IBMsDB2 Popular free software like mySQL PopularfreesoftwarelikemySQL DBMSonpersonalcomputerssuchasMSAccess

3/20/2012

2nd principle: universality principle:universality


DBMSsaredesignedtocapturealldataintheworld DBMS d i dt t ll d t i th ld forall kindsofapplications
Powerful languages Powerfullanguages Richfunctionalities:seefurther To avoid to multiply developments Toavoidtomultiplydevelopments

Inreality
Less structured data are often stored in files Lessstructureddataareoftenstoredinfiles Toointenseapplicationsrequirespecialized software Todaymoreandmorespecializedsystems

3/20/2012

Towardsuniversality Towards universality


Weneedservicessuchas
Concurrencyandtransactions

R li bilit ReliabilityandSecurity dS it
Datadistribution More

Scaling
Volumeofdata o u e o data Volumeofrequests

Performance
Responsetime: Throughput:
3/20/2012

Thetimeperoperation Thenumberofoperationspertimeunit
9

Largevarietyofapplicationswithimportant needsfordatamanagement
Twomainclasses OLTP:OnlineTransactionProcessing Transactional

Ecommerce,banking,etc.. Simpletransactions,knowninadvance Very high load in number of transactions per second* Veryhighloadinnumberoftransactionspersecond

OLAP:OnlineAnalyticalProcessing

Decisionmaking

Businessintelligencequeries us ess te ge ce que es Oftenverycomplexqueriesinvolvingaggregatefunctions Multidimensionalqueries:e.g.,date,country,product

3/20/2012

10

3rd principle: Independencephysical/logical/external


Views
External level

ANSISPARCArchitecture(75):3levels ANSI SPARC Architecture (75): 3 levels Separationintothreelevels


Physicallevel:physicalorganizationofdataondisk, diskmanagement,schemas,indexes,transaction,log Logic:logicalorganizationofdatainaschema,query g g g ,q y andupdateprocessing Externally:views,API,programmingenvironments

Logical level

Independence
Physical:Wecanchangethephysicalorganization withoutchangingthelogicallevel ith t h i th l i l l l Logical:Wecanevolvethelogicallevelwithout modifyingtheapplications External:Wecanchangeoraddviewswithout 11 affectingthelogicallevel

Physical level
3/20/2012

Abstraction
Therelationalmodel

20/03/2012

12

Dataareorganizedinrelations Data are organized in relations

20/03/2012

13

Queriesareexpressedinrelationalcalculus Queries are expressed in relational calculus


qHB ={s,h| d,t(Film(t,d, HumphreyBogart) Sance(t,s,h)} Inpractice,usingasyntaxthatiseasiertounderstand: SQL SQL:
select salle,heure from Film Sance Film,Sance where Film.titre =Sance.titre and acteur= HumphreyBogart

3/20/2012

14

Queriesaretranslated Queries are translated inalgebraic expressionsand expressions and evaluatedefficiently

20/03/2012

15

Themainpredecessors The main predecessors


Trees IMS,IBMlate60s,70s Stillveryused Ahierarchyofrecordswith y keys
Supplier(sno, sname,sadd)

Graphs Codasyl Agraphofrecordswithkeys

Part(pno, pname,qty, price)


3/20/2012

Littleabstraction Languages
Navigational Procedural Recordatattime

Supplier(sno, sname,sadd)

Part(pno, pname)

Order(ono, qty,price) t i )
16

Themainsuccessors:semistructureddatamodels The main successors: semistructured data models


Trees XML Exchangeformatforthe Web Standard Querylanguages:Xpath, Xquery Developingveryfast Graphs SemanticWeb&RDF Formatforrepresenting knowledge Standard Querylanguage:SPARQL Developingveryfast

Abstraction
3/20/2012

Logic foundations Logicfoundations Highlevellanguages Wewilldiscussthem

17

Universality:functionalities
Herewithaveryrelationalviewpoint y p

3/20/2012

18

Performanceandscaling Performance and scaling


Thecoreoftheproblem Beabletosupport
Terabytesofdata Millionsofrequestsperday q p y

Forthistwomaintools
Optimization Parallelism

20/03/2012

19

Dependencies
Lawsaboutthedata L b t th d t
Toprotectdata Tooptimizequeries Todesignschemas Toexplaindata

Examples
Sance[titre] Film[titre]Inclusiondependency Onlyknownfilmsareshown Only known films are shown Sance:salle heure titre Functionaldependencies Onlyonemovieisshownatatimeinatheater

Logicalformulas Logical formulas

t,s,h(Sance(t,s,h) d,a(Film(t,d,a))) t,t,s,h(Sance(t,s,h) Sance(t,s,h) t=t) egds

tgds

Someofthemostsophisticatedevelopmentsindbtheory
3/20/2012 20

Dependenciesandschemadesign Dependencies and schema design


Usesimpledependenciesuptocomplexsemanticdata U i l d d i t l ti d t models Helpchooseabetterrelationalschema Help choose a better relational schema
Person John John John John Sue Sue Child Toto Toto Zaza Zaza Lulu Mimi Car BMW 2chevaux BMW 2Chevaux

Updateanomalies Nullvalues N ll l
3/20/2012 21

Concurrencyandtransactions Concurrency and transactions ACID


Atomicity:thesequenceofoperationsisindivisible;incaseoffailure,either h f d bl ff l h alloperationsarecompletedorallarecanceled Consistency:Theconsistencypropertyensuresthatanytransactionthe y yp p y y databaseperformswilltakeitfromoneconsistentstatetoanother.(So, consistencystatesthatonlyconsistentdatawillbewrittentothe database). database) Isolation:WhentwotransactionsAandBareexecutedatthesametime,the changesmadebyAarenotvisibletoBuntiltransactionAiscompleted andvalidated(commit). d lid d ( i) Durability:Oncevalidated,thestateofthedatabasemustbepermanent,and notechnicalproblemshouldleadtocancellingoftransactionoperations p g p

20/03/2012

22

Recoveryfromfailures Recovery from failures


TheDBMSmustresisttofailures Avarietyoftechniques
Journal Backupcopies Shadow pages Shadowpages

Hotstandby:secondsystemrunningsimultaneously Availability:usersshouldnothavetowaitbeyondwhatisseen Availability: users should not have to wait beyond what is seen asreasonableforanapplication

3/20/2012

23

Distributeddata Distributed data


Typicallythecase ll h
Whenintegratingseveraldatasources Organizationswithmanybranches Activitiesinvolvingseveralcompanies Whenusingdistributiontogetbetterperformance

Queryprocessingoverdistributeddata Query processing over distributed data


Datalocalization&globalqueryoptimization Datafragmentation Typicallyhorizontalpartitioning

Distributedtransactions
Twophasecommit TypicallytooheavyforWebapplications

3/20/2012

24

More
Security
Protectcontentagainstunauthorizedusers(humansor programs) ) Confidentiality:accesscontrol,authentication,authorization

Datamonitoring Data monitoring Datacleaning Datamining Datastreaming Spatiotemporaldata Etc.


20/03/2012 25

Independence:views

20/03/2012

26

Views
Definition:
Functionf:Database View

Oneofthemostfundamentaltopicsindatabases
db1 db2 Database states db3 db4 db5
3/20/2012

v1 View states

v2
27

db6

Viewdefinition View definition


Classicalquery
Defineview
<state s=Colorado> t t C l d <resort n=Aspen> <sc> Unisys.com/snow(Aspen) </sc> y ( p ) <sc> Yahoo.com/GetHotels(Aspen)</sc> </resort> </resorts>

Implicitdefinitionand recursion
Datalog Dependencies(tgds) p (g )

state

Mixbetween explicit/implicit:Active explicit/implicit: Active n XML Colorado


n
3/20/2012

resort t

resort

t f 1meter 2meters

n
28 LakeTahoe

Aspen

Tomaterializeornot To materialize or not


Intentional I i l Materialized M i li d

Update:donothing Que y co p e Query:complex

Update:propagate
Base view:costly y viewmaintenance View base: ambiguous

Queryvs.Update The databasetradeoff

Query:simple
29

3/20/2012

Integration:viewoverseveralbases Integration: view over several bases


Intentional:mediatior Materialized:warehouse

Queriesarecomplex

Updatesarecomplex

Definitions Globalasview: v=(db1, ,dbn) Localasview: dbi=i(v) foreachI Arbitrarycomplexconstraints betweenthedatabaseandtheviews g Sometimescalledalignmentsbetweenthem
30

3/20/2012

Optimization

20/03/2012

31

Thereasonsofthesuccess The reasons of the success


Thequeriesarebasedonrelationalcalculus,alogicallanguage, simpleandunderstandablebypeopleespeciallyinvariants suchasSQL such as SQL Acalculusquerycaneasilybetranslatedintoanexpressionof algebrathatitissimpletoevaluate(CoddTheorem) algebra that it is simple to evaluate (Codd Theorem) Relationalalgebraisalimitedmodelofcomputation(itdoesnot allowcomputingarbitraryfunctions).Thatiswhyitispossible p g y ) y p tooptimizealgebraicexpressionsevaluation Finally,forthislanguage,parallelismallowsscalingtoverylarge databases(classAC0)

3/20/2012

32

Rewritingalgebraicexpressions Rewriting algebraic expressions

(a) For each f in film Foreachfinfilm Foreachsinsancedo (b) Iffewtuplespasstheselection (c) Usingtheindex
3/20/2012

complexityin n2 complexityin n complexity constant


33

Apossiblequeryplan(withoutindex) A possible query plan (without index)

20/03/2012

34

Optimization
Usingaccessstructures
Hash B trees Btrees

Usingsophisticatedalgorithm
Join

Costevaluationtoselectanexecutionplan Problem:searchspaceistoolarge Problem: search space is too large Technique:Rewritequeriesbasedonheuristicstoexploreonly partofit part of it

3/20/2012

35

Optimization&scalingusingparallelism Optimization & scaling using parallelism


Theseproblemscangreatly benefitforparallelism Typicallydividethedata T i ll di id th d t Thisisnottrueforallproblems

Filtref

f f
3/20/2012 36

Complexityandexpressivity

20/03/2012

37

Complexity http://www.cs.rice.edu/~vardi/papers/sigmod08.pdf
Complexity:forafixedqueryq, C l i f fi d
Testinggiven(I,t)whethertisinq(I)asafunctionofthe sizeofI size of I FocusonBooleanquerytonotdependonoutputsize

Separatethedependencyonsizesofdata/query Separate the dependency on sizes of data/query


Verydifferentandifmixedthedependencyonquery typ ca y des t e depe de cy o t e data typicallyhidesthedependencyonthedata Datacomplexityasafunctionofthesizeofthedata (queryfixed) Querycomplexityasafunctionofthesizeofthequery (datafixed)
38

3/20/2012

Datacomplexity Data complexity


Relationalcalculusisinlogspace
Thetestcanbeperformedusingaspacelogarithmicinthe sizeofthedata Thisisprimarilybecausethearityoftablesisfixed;soa tupleuseslogspace t l l

logspace NC( ptime)


Goodpotentialforparallelization;seefurther

20/03/2012

39

Querycomplexity Query complexity


Thecomplexityispspace Intuition:anintermediaryresultmaybeverylargeisit isthejoinofmanyrelations
Dependsmoreonthenumberofvariablesusedinthe querythatinitsactualsize Naiveevaluationof(PiA(RjoinS))requiresmorespacethat thatof(PiA(R)cap^PiA(S))

Polynomialinthetreewidth

3/20/2012

40

Parallelcomplexity Parallel complexity


Datacomplexity:Constantparalleltime AC0 Acomplexityclassusedincircuitcomplexity Theproblemsthatmaybesolvedwithcircuitsofconstantdepth andpolynomialsize,withunlimitedfanin AND d l i l i i h li i d f i ANDgatesandOR d OR gates.

20/03/2012

41

Expressivity
Onecannotcomputetransitiveclosure Addafixedpoint
Inflationary:fixpoint Ornot:while

Vardi theorem:withanorderonthedomain fixpoint=ptimeand fixpoint = ptime and while=pspace

3/20/2012

42

Expressivityinabsenceoforder Expressivity in absence of order


Onecannottestifarelationhasanevennumberof tuples AbiteboulVianu
Characterizationofwhatcanbecomputedwithfixpoint andwhile Theorem:fixpoint=whileiff ptime=pspace

3/20/2012

43

Conclusion

20/03/2012

44

Conclusion
Andthen:alwaysquestioneverything
Revisitthemodels,languages,principles

Why?
Toscaletoalwaysmoredataandqueries Tosupportextremeapplicationsthatcannotbesupportedbystandard technology: Google Visatransactions Tofacilitateapplicationdevelopment Tooffermoreintermsofperformance,reliability,security,etc..

3/20/2012

45

Conclusion

Relationalmodel Entriesinrelations =atomic values Dataareregular Data are regular ACID Universal Dataarepersistant Dataarestatic Constraints arestatic(FDs,etc.)
3/20/2012

Beyond Entriesaresetofvalues Missingdata,probabilisiticdata Semistructured Weakerconcurrency Specialized: noSQL p Queriesondataflows Data&behavior:Objectdatabases Activedatabases A ti d t b Triggers

46

Merci !

20/03/2012

47

Вам также может понравиться