Вы находитесь на странице: 1из 10

GlusterCacheTier

Gluster Cache Tier

Authors:
DanLambright<dlambrig@redhat.com>
JosephFernandes<josferna@redhat.com>

Thecachetieringfeaturedescribedhereisafirststeptowardsimplementingthe
dataclassificationvision
[3].Thefeatureaddscachingcapabilitiestovolumesviaanew"tiering"
translator.Thecodeisstructuredsuchthatovertimeitcanevolvetodomorethancaching.The
codeshallfacilitateswappingoutonesetoffunctionality(caching)withanother(e.g.data
placementaccordingtofiletype,archival).Cachingisoneapplicationoftiering.

ThetieringtranslatorisbasedoffDHTandrebalancelogic.
ItuseslargeportionsoftheDHT
translator,structuredinthesamemannerastheNUFAandswitchtranslators.
Thedesign
borrowsideasfromCeph'stieringimplementation[4].

Thetranslatorlogicallysplitsavolumeintotwosubvolumes:"hot"and"cold".The"hot
subvolume"istreatedasacacheforthe"coldsubvolume".

Possibleusecases:

hotsubvolumesareSSDs,coldsubvolumesarenormaldisks
hotsubvolumesarenormaldisks,coldsubvolumesareerasurecoded.
hotsubvolumeisbackedupmorefrequentlythanthecoldtier.

Afilemayresideoneitherahotsubvolumeoracoldsubvolume.Afilecannotbesplitbetween
thetwosubvolumes.DirectoriesexistonallsubvolumesaccordingtoexistingDHTlogic.

AnexistingglustervolumemaybemodifiedviaaCLIcommandtohaveahotcache.For
example,anSSDvolumecouldbedynamicallyaddedtoanexistingerasurecodedvolume.The
SSDonthehottiermaybereplicated/distributedasinanynormalvolume.Whenahotvolume
isaddedtoanexistingcoldvolume,DHTsfixlayoutprocedureisinvokedtosynchronize
directoriesbetweenthehotandcoldsubvolumes,andthedatamovementprocessisstarted.This
processresemblestheremoveandaddbricklogic.

Acachedglustervolumemaybedetached.Inthiscase,thedatamovementprocesschanges
statetomovealldatafromthehottiertothecoldone,inmuchthesamewayaswhenabrickis
removed.WhenalldataismovedacommitCLImaycompletethedetachoperation.

GlusterCacheTier

Onlookups,thehotsubvolumeisfirstchecked.Ifthefileisnotfound,thecoldsubvolumeis
checked.ThisresemblesDHT'sbehaviour,whereanegativeresultreceivedfromthevolume
selectedbythehashtablecausestheothervolumestobeconsulted.

Theadvantagetocheckingthehotvolumefirstistoeliminatetheneedtomaintainaliveindex
mappingthelocationoffilesbetweentiers.Thedisadvantageisitaddsoverheadonnegative
results.Afutureimplementationcouldemployanindextoremovethatoverhead.Suchanindex
wouldnotnecessarilyneedtobepersistent.Itcouldbeahintofthedataslocation.

Oncreates,eitherthehotorcoldsubvolumewouldalwaysbeselected,accordingto
configuration.Iftheoptionisnotset,newfilesarealwayscreatedonthehotsubvolume.

Onwritestoexistingfiles,dataisdirectedtothesubvolumecontainingthefile.Ifthehot
subvolumeisfull(asdefinedbyaconfigurablewatermark)thewriteisforwardedtothecold
subvolume.

Onreadsoroverwrites,ifthefileresidesonthecoldsubvolume,thefilemaybecomea
candidateforpromotiontothehotsubvolume.Afilewouldbeaccessedoneormoretimes
beforebeingmovedtothehotsubvolumeaccordingtoconfigurablepolicy.

Whenacacheisaddedtoanexistingvolume,adatamovementthreadisstartedontheservers.
ThisisdoneinasimilarmannerasremovingabrickthethreadismodifiedDHTmigration
logic.Thedatamoverisresponsibleformigratingfilesbetweenthehotandcoldsubvolumes.
Roughly,afileismigratedaccordingtothreefactors:

1. heathowoftenthefilehasbeenaccessed
2. fullnessifthehotsubvolumescapacityhasbeenreached
3. useractionanadminmayrequesttoflushalldatafromthehottothecoldtier.

Thestepstomovefilesbetweenthesubvolumesleverageexistingrebalancelogic.However,the
existingmigrationlogicshallbegeneralizedtosupportdifferentusecases:thetriggersto
migrateafileshallbedrivenbytheapplication.Thisgeneralizationiscalledthe
DataMovement
Framework
(DM)andisdescribedinmoredetailbelow.

Promotionreferstofilesslatedtomovefromthecoldvolumetothehotvolume.Thedefault
policyistopromoteanyfilethatisaccessedonthecoldtiertothehottier.Afuturepolicymay
betomaintainahitcountperfile,andonlypromotefilesthatexceedsuchacount.

GlusterCacheTier

Demotionreferstofilesslatedtomovefromthehotsubvolumetothecoldsubvolume.Multiple
policiesshallbeacceptedaspluginstotheDMtodrivedemotion.Thedefaultpolicyistotrack
allfilesinthehottierinavolatiledatastructureandperiodicallychooseanysuchfilesthathave
notbeenaccessedwithinsomeperiod.Thatduration,andtherateatwhichfilesaredemoted,
shallbetunableparameters.

OnemechanismwouldbetotrackeachfileresidinginthehotvolumeinanLRUlist.
Thismaybeimpracticalifthehotvolumeislargeduetohighmemoryrequirements,and
acceptableotherwise.
Anothermechanismcouldbetotrackfilesrecentlytouchedwithinaperiodoftimeina
spacefriendlybloomfilter.Somenumberofsuchbloomfilterscouldbemaintainedand
anyfilenotcapturedintheoldestfilterwouldbedemoted.Thissolutionmaybedifficult
totuneasitsunclearhowtoselectthenumberoffiltersandtheirrefreshrateinaway
thatbestutilizesthetotalvolumecapacityandingressofI/O.Howeverthiswouldhave
lessoverheadthananLRUlistandtherefore,beabetterchoiceforlargehotvolumes.

Demotionisalsoinitiatedifthehotvolumesutilizationcrossesawatermark,i.e.iscloseto
beingfull.Thewatermarkshallbeatunableparameterexpressedasapercentageofthetotal
size.

Thedatastructurewhichmaintainsstateonthehotvolumeshallbepersistent.Thiswillavoid
clutteringthehotvolumewithdatathatmaynevergetdemotedinthecaseofrestartonfailures.

Theoverheadofmigrationmaybelarge.Readcachingwillbesupportedinwhichdataonthe
hottierisnotpersistent.Suchtiersneednotsupportreplicationandtheirmigrationoverheadwill
beless.DataonthehottierwouldperiodicallybeinvalidatedbytheDM(deleted)ratherthan
migrated(demoted).Writesmaybewrittentoboththehottierandthecoldtier.Colddatawould
becopiedtothehottierbasedonusagepatterns.

Thefirstversionofthetieringfeaturewillnotworkwithsnapshots(TBD).Acachetiermaybe
detachedfirstfromavolumeinordertotakeasnapshot.

Torebalance(add/removeabrick),thecachemustbepausedbyCLI.Promotion/demotion
activitycannotruninparallelwithrebalancing.

Georeplicationcannotbeenabledatthesametimeatierisbeingdetachedorattached.

GlusterCacheTier

Data Movement Framework (DM)

DataMovementFramework:
TheDataMovementFrameworkwillberesponsibleforthemovementofdatawithin&
outofGluster.TheDataMovementFrameworkwillbepresentoneachnodeofgluster.
DataMovementFrameworkwillbehavingtwocomponents:

1. DatamovementTrigger(DMT):
ThisTriggersthedatamovementofanobject/filefroma
SourceStorageUnit
(SSU)
toa
DestinationStorageUnit(DSU)
.TheStorageunitscanbeconsidered
asGlusterbricks/volumeorCephorOSSstorageunits.TheDMTcreatesa
Data
MovementRequest(DMR)
.
TheTwotriggersofdatamovementareasfollows:
1. IOPathTrigger:
ThisTriggerissetintheIOPathi.ein
GlusterfsdandistriggeredwhenafilemeetsaspecifiedData
MovementRule.ForExample,thefileisnotaccessedforalong
time,andthe
TieringRule
saysthatitneedstobemadereadonly
andpushedtoaslowertier.Oncethefileisselected,aData
movementrequestissubmittedtotheDatamovementservice.
2. ScannerTrigger:
ThisTriggerwalksthroughthespecifiedSource
StorageUnit/sandselectsobject/filesdependingonthespecified
DataMovementRule.Oncetheobject/fileisselecteditcreatesand
DataMovementRequestsubmitstotheDataMovementService.
ThebehavioroftheTriggersisdefinedbytheTriggerplugins:
1. DMSelectionPlugin:
Thispluginwilldefinetheselectioncriteria
ofafile/objectforDatamovement.ItreferstotheDatamovement
Rules.Forexampleonerulemaysaymove*.txtfromonetierto
other.Anotherrulemaydefinerebalancerrulesi.ecorrectnessof
layoutorproperuseofspaceandsoon.Thispluginisusedby
bothIOPathandScannerTrigger.
2. DMTraversePlugin:
Thisplugindefinesthetraversal
mechanism/algorithmthatisbeenusedtowalkthroughthestorage
unit.i.eOnetraversepluginmayactuallywalkthefilesystem,the
othermayjustuseapfrepopulatedlistoffiles/objects(Likea
bloomfilter)anduseitorsomethingelse.Thisisusedonlybythe
scannertrigger.
4

GlusterCacheTier

TheFilesystemwalktofindfilesthatmeetthecriteriafor
movementisnotanefficientandperformantmethod,aswehaveto
visiteverydirectoryanddentryinthethenamespace.Wewould
needamoreefficientmethodtofindfilesthatmeetthemovement
criteria.
1. UsingaMemorybasedBloomFilter/RedBlackTree:
Pros:
1. Fastaswehavetheentriesinthememory
2. NoExtrastoragespacerequired
Cons:
1. NonScalable:LimitedEntriesi.edoesntscalable
2. Nonpersistent
3. Noncrashconsistent
2. UsingaDataBase/Store(SQLorNoSQL)
Pros:
1. Persistent
2. CrashConsistent(Tunable)
3. Scalability(dependsontheDBused)
4. ChoiceofDataStore:SQLorNoSQL
Cons:
1. Usingathirdpartysoftware
2. StorageSpacerequirement
3. Needstobeconsistentacrossreplicas
4. Lessperformantascomparedtomemorystore
The Data Store can be fed synchronously (directly from a server
xlator
like changelog) or asynchronously (via parsing thechangelogfiles
andthenfeedingtheDataStore)
Synchronous Input will be fast and the freshness of data will be
great,
but now there is a overhead on the FOP path to write to the data
store.
TheoverheadcanbereducedviaDataStorewriteoptimizations.
Asynchronous Input will be slow and the freshness of data will
suffer,
but the FOP path will not have the overhead of feeding the data
store.
5

GlusterCacheTier

The Change log lag in the FOP path also should be considered
here.
Forfreshnessthechangelogcanmaintainacachedviewbasedon
bloomfilterorredblacktree,forrecentlychangedfiles.
The Scanner will be agnostic to the underneath data store i.e
memoryor
data store or both (changelog + data store + read cache) because
thescannerwillonlydealwithAPIs
The Data Base/Store API can be used by variety of data
maintenance service like Bitrot scanners,
Compliance,
Encryption,DedupescannersandalsobyBackupISVs

Note:
InatieringscenarioDataMovementTriggersarethe
Promotion/Demotion
mechanism,TheDMSelectionpluginprovidestheselectionlogicfor
promotion/demotioni.eidentifyinganobject/filetobepromoted/demoted.

DataMovementRequest(DMR)
consistofthefollowinginformation,
Sourceobject/fileID:GFIDorFileNameorObjectID
SourceStorageUnitID:BrickIDorVolumeIDorCeph/OSSStorage
UnitID
DestinationStorageUnitID:BrickIDorVolumeIDorCeph/OSS
StorageUnitID
DataMovementRuleID:ThisreferstotheDatamovementrulethatis
applicableforthefile/objectthatistobemoved.
DataMovementOpID:Representingtheoperationidgeneratedbythe
management(glusterd)foraspecificDatamovementTaskthathasbeen
initiated.Ifthedatamovementisnotinitiatedbyglusterdthenthiswillbe
NULLor0i.ethedatamovementisaresultoftheIOTrigger.
Note:
TheseParameterswillhelptheDatamovementServiceindeciding
toloadtheappropriateplugins

2. DataMovementService(DMS):
Thisisresponsibleforactualmovementofdata.The
DatamovementTriggerssubmitDataMovementRequeststothisserviceanditprocesses
6

GlusterCacheTier

theserequestsbyusingtheappropriatepluginsassuggestedbytheDataMovement
Request.TherewillbeoneDMSrunningoneachnodeofgluster.
DatamovementServicehasthefollowingcomponents:
1. DMRPipelines/Queues:
Thisrepresentsthequeue/swheretheDMR
firstlandstotheDataMovementServicewhentheDataMovement
Triggerssubmitthem.Thismaybesinglequeueorabunchofqueues
dependingonthewayDMSisconfigured.I.eonequeueforallthe
movementjobsoronequeueperDataMovementIDetc.
2. DMThread/TaskCoordinator:
ThisisthefirstconsumeroftheDMRs
thataresubmittedinDMRQueue.Thismaybeasingleora
multithreadedentity,dependingonhowitisconfigured.i.eoneinstance
ofDMThreadCoordinatorforallDatamovementjobsoroneinstanceper
DataMovementIDetc.DMThreadcoordinatorpicksaDMRfromDMR
QueueandthenspawnsaDMThreadtoprocesstheDMR.Itsalso
updatestheDMStatwiththenewDMR,i.esayingthatanobject/fileis
addedtothedatamovementjob.

GlusterCacheTier

3. DMThread:
TheDataMovementThreadistheonewhichwillactually

dothatmovementofdatafromsourcetothedestination,usingthe
followingplugins.
a. PreMovementActionPlugin:
Thispluginisusedforanytask
thatneedstobedonebeforethemovementoffile/object.i.eMark
thefile/objectwithsomespecialflagsornotifysomeservicethat
thefileismarkedformovementorcreatethesourcedata
checksumetc.
b. PostMovementActionPlugin:
Thispluginisusedforanytask
thatneedstobedoneafterthemovementofthefile/objecti.e
validatethedatausingsourceanddestinationchecksumsordelete
thefilefromthesourceornotifysomeservicethatthedatais
migratedetc.
c. DataMovementPlugin:
Thispluginisusedfortheactualdata
movementi.ewecanhaveaiteratorwhichreadsdatasequentially
8

GlusterCacheTier

fromthesourceandthenwritesequentiallytothedestinationor
offloadthemovementtasktosomeotherserviceand
synchronouslyorasynchronouslymonitorthestatusthisits
done(Willdiscussaboutthisindetailslater).
d. ReadPlugin:
Thisplugindefinesthewaytoreadfromthesource.
Sincethesourcemaybeanystorageuniti.eglusterbrick/volume
orceph/ossstorageunit,thereadmechanismwilldiffer.
e. WritePlugin:
Thisplugindefinesthewaytowritetothe
destination.Sincethesourcemaybeanystorageuniti.egluster
brick/volumeorceph/ossstorageunit,thereadmechanismwill
differ.
ThedecisiontousetheappropriatepluginistakenbytheDMThread
usingtheinformationintheDMR.Allthepluginsaredefinedbythe
usecasespecifiedbytheDMR.Ifthereisnoactiontobedonethanthis
pluginneednotbedefinedatall.
Apartfromthedatamovement,theDMthreadalsonotifiestheDMStat
abouttheprogressofthedatamovement.Thismaybesynchronousor
asynchronous.

4. DMStat:
ThiscomponentskeepstrackoftheDatamovementtasksthat
aredonebyDMS.Itsprovidesinformationwithstatusandstatisticsofthe
datamovementtasksthataregoingonthenode.Itsprovidesdifferent
mechanismstoqueryandrespondbackadesiredformat(forexample
JASONobject).ThiscanbeusedbytheglusterCLIandglusterdtoget
thestatusoftheongoingorcompleteddatamovementtasks.TheDM
Statmayalsohaveapersistentcopyofthestatusandstatistics(thiscanbe
configuredaccordingly).

AspecialUsecase:BLOCKlevelDataMovement
Untilnow,wehavediscussedmovingobject/filesonebyonefromthesourceand
destination,bytraversingthroughthesourcestorageunitsandreading/writingper
object/filebasis.Thisisausecasewhenweareconsideringmovementofselective
file/objectsandwhentheStorageUnitsareheterogeneousinnatureeg:
1. DAS(DirectAttachStorage)andthesourceanddestinationareindifferentnode
2. SANstoragefromdifferenthardwarevendors
3. SSUisaglusterbrick/volumeandtheDSUisceph/OSSstorageunit.
Nowifwehavethefollowingscenario,
1. AllDataneedstobeevacuatedfromSSUtoDSU
9

GlusterCacheTier

2. DASstorageandtheSSU&DSUareonthesamenodeORSANStoragefrom
acommonhardwarevendor(ordifferentbutcompatiblevendors)
Intheabovecasewecanuseblockleveldatamovementbyoffloadingthedata
movementtotheblockstorageleveleitherviaLVMutilitiesorutilitiesprovidedbythe
SANhardwarevendorforblockleveldatamovement.
Theadvantageofthisapproachisthat
1. TheBlockDataMovementisfastwhencomparedtofile/objectdatamovement
2. TheDatamovementwillhappeninternaltotheSANArrayorintheSAN
networkandtheIPnetworkwhichisusedbyglusterforIOwillnotbeaffected.
TheBLOCKlevelDatamovementcanbeaccommodatedintheframeworkthatis
proposedabovei.e
1. DMTriggerwillbeglusterd,whichwillidentifytheblocklevelcompatibleSSU
andDSUandraisetheappropriateDMRandsubmittotheDMS
2. TheDMThreadCoordinatorwillpickitfromtheDMRQueueandspawnthe
DMThread.TheDMThreadwillusetheblockutilitiesfromLVMorSAN
Vendor/sandissuethedatamovement(viaCLIorLib),throughthe
appropriateDatamovementPlugins.TheDMStatwillbeupdatedbytheDM
ThreadCoordinatorandtheDMThreadonregularbasisontheprogressofthe
datamovement.

SomeLinks:
http://wiki.linuxservertech.com/index.php?action=artikel&cat=3&id=158&artlang=en

[1]
Bloomfilteronceph

[2]
Cephtieringfeature

[3]

Dataclassificationfeaturepage
.

[4]
http://ceph.com/docs/master/rados/operations/cachetiering/

10

Вам также может понравиться