Вы находитесь на странице: 1из 196

Booting Up

Linuxbootonx86basedhardwareissetintomotionwhentheBIOSloadstheMasterBoot Record(MBR)fromthebootdevice.CoderesidentintheMBRlooksatthepartitiontableandreads aLinuxbootloadersuchasGRUB,LILO,orSYSLINUXfromtheactivepartition.Thefinalstage ofthebootloaderloadsthecompressedkernelimageandpassescontroltoit.Thekerneluncompresses itselfandturnsontheignition.


Figure 2.1. Linux boot sequence on x86-based hardware.

x86basedprocessorshavetwomodesofoperation,realmodeandprotectedmode.Inreal

mode,youcanaccessonlythefirst1MBofmemory,thattoowithoutanyprotection.Protectedmode issophisticatedandletsyoutapintomanyadvancedfeaturesoftheprocessorsuchaspaging.The CPUhastopassthroughrealmodeenroutetoprotectedmode.Thisroadisaonewaystreet, however.Youcan'tswitchbacktorealmodefromprotectedmode. Thefirstlevelkernelinitializationsaredoneinrealmodeassembly.Subsequentstartupis performedinprotectedmodebythefunctionstart_ker nel()definedininit/main.c,thesource fileyoumodifiedinthepreviouschapter.start_ker nel()beginsbyinitializingtheCPUsubsystem. Memoryandprocessmanagementareputinplacesoonafter.PeripheralbusesandI/Odevicesare startednext.Asthelaststepinthebootsequence,theinitprogram,theparentofallLinuxprocesses, isinvoked.Initexecutesuserspacescriptsthatstartnecessarykernelservices.Itfinallyspawns terminalsonconsolesanddisplaystheloginprompt.

Kernel Mode and User Mode


Someoperatingsystems,suchasMSDOS,alwaysexecuteinasingleCPUmode,butUNIX likeoperatingsystemsusedualmodestoeffectivelyimplementtimesharing.OnaLinuxmachine,the CPUiseitherinatrustedkernelmodeorinarestrictedusermode.Alluserprocessesexecuteinuser mode,whereasthekernelitselfexecutesinkernelmode.Kernelmodecodehasunrestrictedaccessto theentireprocessorinstructionsetandtothefullmemoryandI/Ospace.Ifausermodeprocessneeds theseprivileges,ithastochannelrequeststhroughdevicedriversorotherkernelmodecodeviasystem calls.Usermodecodeisallowedtopagefault,however,whereaskernelmodecodeisn't. Kernelcodethatservicessystemcallsissuedbyuserapplicationsrunsonbehalfofthe correspondingapplicationprocessesandissaidtoexecuteinprocesscontext.Interrupthandlers,on theotherhand,runasynchronouslyininterruptcontext.Processescontextsarenottiedtoany interruptcontextandviceversa. Kernelcoderunninginprocesscontextispreemptible.Aninterruptcontext,however,always runstocompletionandisnotpreemptible.Becauseofthis,therearerestrictionsonwhatcanbedone frominterruptcontext.Codeexecutingfrominterruptcontextcannotdothefollowing:

Process Context and Interrupt Context

Gotosleeporrelinquishtheprocessor Acquireamutex Performtimeconsumingtasks

Accessuserspacevirtualmemory

jiff iesholdsthenumberoftimesthesystemtimerhaspoppedsincethesystembooted.The kernelincrementsthejif f iesvariable,HZtimeseverysecond. Thetime_after()macrocomparesthecurrentvalueofjiffieswiththerequestedtimeout, takingcaretoaccountforwraparoundduetooverflows.Relatedfunctionsavailablefordoingsimilar comparisonsaretime_before(),time_before_eq(),andtime_after_eq(). jiff iesisdefinedasvolatile,whichasksthecompilernottooptimizeaccesstothevariable. Thisensuresthatjif f ies,whichisupdatedbythetimerinterrupthandlerduringeachtick ,isreread duringeachpassthroughtheloop.Twootherfunctionsthatfacilitatesleepwaitingare wait_event_timeout()andmsleep().Bothofthemareimplementedwiththehelpof schedule_timeout().wait_event_timeout()isusedwhenyourcodedesirestoresume executionifaspecifiedconditionbecomestrueorifatimeoutoccurs.msleep()sleepsforthe specifiednumberofmilliseconds.

HZ and Jiffies

Acodeareathataccessessharedresourcesiscalledacriticalsection.Spinlocksand mutexes(shortformutualexclusion)arethetwobasicmechanismsusedtoprotectcriticalsectionsin thekernel.Let'slookateachinturn.Aspinlockensuresthatonlyasinglethreadentersacritical sectionatatime.Anyotherthreadthatdesirestoenterthecriticalsectionhastoremainspinningat thedooruntilthefirstthreadexits.Notethatweusethetermthreadtorefertoathreadofexecution, ratherthanakernelthread.

Spinlocks and Mutexes

kmalloc()isamemoryallocationfunctionthatreturnscontiguousmemoryfrom ZONE_NORMAL.Theprototypeisasfollows:
void *kmalloc(int count, int flags);

Allocating Memory

Wherecountisthenumberofbytestoallocate,andflagsisamodespecifier.Allsupported flagsarelistedininclude/linux./gfp.h(gfpstandsforgetfreepages),butthesearethecommonly usedones: 1.GFP_KERNELUsedbyprocesscontextcodetoallocatememory.Ifthisflagisspecified,

kmalloc()isallowedtogotosleepandwaitforpagestogetfreedup. 2.GFP_ATOMICUsedbyinterruptcontextcodetogetholdofmemory.Inthismode, kmalloc()isnotallowedtosleepwaitforfreepages,sotheprobabilityofsuccessfulallocationwith GFP_ATOMICislowerthanwithGFP_KERNEL. Becausememoryreturnedbykmalloc()retainsthecontentsfromitspreviousincarnation,there couldbeasecurityriskifit'sexposedtouserspace.Togetzeroedkmallocedmemor y,use kzalloc(). Ifyouneedtoallocatelargememorybuffers,andyoudon'trequirethememorytobephysically contiguous,usevmalloc()ratherthankmalloc():
void *vmalloc(unsigned long count);

Herecountistherequestedallocationsize.Thefunctionreturnskernelvirtualaddresses. vmalloc()enjoysbiggerallocationsizelimitsthankmalloc()butisslowerandcan'tbecalledfrom interruptcontext.Moreover,youcannotusethephysicallydiscontiguousmemoryreturnedby vmalloc()toperformDirectMemoryAccess(DMA).Highperformancenetworkdriverscommonly usevmalloc()toallocatelargedescriptorringswhenthedeviceisopened.

Table 2.1. Summary of Data Structures Data Structure HZ loops_per_jiffy timer_list Location include/asm-your-arch/param.h init/main.c include/linux/timer.h Description Number of times the system timer ticks in 1 second Number of times the processor executes an internal delay-loop in 1 jiffy Used to hold the address of a routine that you want to execute at some point in the future Timestamp A busy-locking mechanism to ensure that only a single thread enters a critical section

timeval spinlock_t

include/linux/time.h include/linux/spinlock_types.h

semaphore

include/asm-your- arch/semaphore.h A sleep-locking mechanism that allows a predetermined number of users to enter a critical section

mutex rwlock_t page

include/linux/mutex.h include/linux/spinlock_types.h include/linux/mm_types.h

The new interface that replaces semaphore Reader-writer spinlock Kernel's representation of a physical memory page

Table 2.2. Summary of Kernel Programming Interfaces Kernel Interface time_after() time_after_eq() time_before() time_before_eq() schedule_timeout() wait_event_timeout() DEFINE_TIMER() init_timer() add_timer() mod_timer() timer_pending() udelay() Location include/linux/jiffies.h Description Compares the current value of jiffies with a specified future value

kernel/timer.c include/linux/wait.h include/linux/timer.h kernel/timer.c include/linux/timer.h kernel/timer.c include/linux/timer.h include/asm-yourarch/delay.h arch/yourarch/lib/delay.c include/asm-x86/msr.h kernel/time.c include/asm-yourarch/system.h

Schedules a process to run after a specifiedtimeout has elapsed Resumes execution if a specified condition becomes true or if a timeout occurs Statically defines a timer Dynamically defines a timer Schedules the timer for execution after the timeout has elapsed Changes timer expiration Checks if a timer is pending at the moment Busy-waits for the specified number of microseconds Gets the value of the TSC on Pentiumcompatible processors Obtains wall time Disables interrupts on the local CPU

rdtsc() do_gettimeofday() local_irq_disable()

local_irq_enable() local_irq_save() local_irq_restore() spin_lock() spin_unlock() spin_lock_irqsave()

include/asm-yourarch/system.h include/asm-yourarch/system.h include/asm-yourarch/system.h include/linux/spinlock.h kernel/spinlock.c include/linux/spinlock.h

Enables interrupts on the local CPU Saves interrupt state and disables interrupts Restores interrupt state to what it was when the matching local_irq_save() was called Acquires a spinlock. Releases a spinlock

include/linux/spinlock.h Saves interrupt state, disables interrupts and kernel/spinlock.c preemption on local CPU, and locks their critical section to regulate access by other CPUs

spin_unlock_irqrestore() include/linux/spinlock.h Restores interrupt state and preemption and kernel/spinlock.c releases the lock DEFINE_MUTEX() mutex_init() mutex_lock() Kernel Interface mutex_unlock() include/linux/mutex.h include/linux/mutex.h kernel/mutex.c Location kernel/mutex.c Statically declares a mutex Dynamically declares a mutex Acquires a mutex Description Releases a mutex Statically declares a semaphore Dynamically declares a semaphore Acquires a semaphore Releases a semaphore Atomic operators to perform lightweight operations

DECLARE_MUTEX() include/asm-yourarch/semaphore.h init_MUTEX() up() down() atomic_inc() atomic_inc_and_test() atomic_dec() atomic_dec_and_test() clear_bit() include/asm-yourarch/semaphore.h

arch/yourarch/kernel/semaphore.c arch/yourarch/kernel/semaphore.c include/asm-yourarch/atomic.h

set_bit() test_bit() test_and_set_bit() read_lock() read_unlock() read_lock_irqsave() read_lock_irqrestore() write_lock() write_unlock() write_lock_irqsave() write_lock_irqrestore() down_read() up_read() down_write() up_write() read_seqbegin() read_seqretry() write_seqlock() write_sequnlock() kmalloc() kzalloc() kfree() vmalloc() include/linux/spinlock.h kernel/spinlock.c Reader-writer variant of spinlocks

kernel/rwsem.c

Reader-writer variant of semaphores

include/linux/seqlock.h

Seqlock operations

include/linux/slab.h mm/slab.c include/linux/slab.h mm/util.c mm/slab.c mm/vmalloc.c

Allocates physically contiguous memory from ZONE_NORMAL Obtains zeroed kmalloced memory Releases kmalloced memory Allocates virtually contiguous memory that is not guaranteed to be physically contiguous.

Chapter 3. Kernel Facilities Kernel Threads Creating a Kernel Thread

Tocreateakernelthread,useker nel_thread():
ret = kernel_thread(mykthread, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);

Theflagsspecifytheresourcestobesharedbetweentheparentandchildthreads. CLONE_FILESspecifiesthatopenfilesaretobeshared,andCLONE_SIGHANDrequests thatsignalhandlersbeshared. Thethreadstartsbyinvokingdaemonize(),whichperformsinitialhousekeepingand changestheparentofthecallingthreadtoakernelthreadcalledkthreadd.EachLinuxthreadhasa singleparent.Ifaparentprocessdieswithoutwaitingforitschildtoexit,thechildbecomesa zombieprocessandwastesresources.Reparentingthechildtokthreadd,avoidsthisandensures propercleanupwhenthethreadexits. Becausedaemonize()blocksallsignalsbydefault,useallow_signal()toenabledelivery ifyourthreaddesirestohandleaparticularsignal.Therearenosignalhandlersinsidethekernel,so usesignal_pending()tocheckforsignalsandtakeappropriateaction.

Process States and Wait Queues

coderegionthatputsmykthreadtosleepwhilewaitingforevents:

add_wait_queue(&myevent_waitqueue, &wait); for (;;) {

/* ... */ set_current_state(TASK_INTERRUPTIBLE); schedule(); /* Relinquish the processor */ /* Point A */ /* ... */ } set_current_state(TASK_RUNNING); remove_wait_queue(&myevent_waitqueue, &wait);

Waitqueuesholdthreadsthatneedtowaitforaneventorasystemresource.Threadsina waitqueuegotosleepuntiltheyarewokenupbyanotherthreadoraninterrupthandlerthatis responsiblefordetectingtheevent.Queuinganddequeuingarerespectivelydoneusing add_wait_queue()andremove_wait_queue(),andwakingupqueuedtasksisaccomplished viawake_up_inter r uptible(). Akernelthread(oranormalprocess)canbeinanyofthefollowingprocessstates:running, interruptible,uninterruptible,zombie,stopped,traced,ordead.Thesestatesaredefinedin include/linux/sched.h:

Aprocessinther unningstate(TASK_RUNNING)isintheschedulerrunqueueand isacandidateforgettingCPUtimeallottedbythescheduler. Ataskintheinter r uptiblestate(TASK_INTERRUPTIBLE)iswaitingforan eventtooccurandisnotintheschedulerrunqueue.Whenthetaskgetswokenup,orifa signalisdeliveredtoit,itreenterstherunqueue.

Theuninter r uptiblestate(TASK_UNINTERRUPTIBLE)issimilartothe interruptiblestateexceptthatreceiptofasignalwillnotputthetaskbackintotherun queue.

Astoppedtask(TASK_STOPPED)hasstoppedexecutionduetoreceiptofcertain signals. Ifanapplicationsuchasstraceisusingtheptracesupportinthekerneltointerceptatask , it'llbeinthetracedstate(TASK_TRA CED). Ataskinthezombiestate(EXIT_ZOMBIE)hasterminated,butitsparentdidnot waitforthetasktocomplete.AnexitingtaskiseitherintheEXIT_ZOMBIEstateor thedead(EXIT_DEAD)state.

Youcanuseset_cur rent_state()tosettherunstateofyourkernelthread.

HelperInter faces Oneexampleistheimplementationofthedoublylinkedlistlibrary.Manydriversneedto maintainandmanipulatelinkedlistsofdatastructures.Thekernel'slistinterfaceroutineseliminate theneedforchasinglistpointersanddebuggingmessyproblemsrelatedtolistmaintenance.Let's learntousehelperinterfacessuchaslists,hlists,workqueues,completionfunctions,notifierblocks, andkthreads. LinkedLists Toweavedoublylinkedlistsofdatastructures,usethefunctionsprovidedin include/linux/list.h.Essentially,youembedastructlist_headinsideyourdatastructure:
#include <linux/list.h> struct list_head { struct list_head *next, *prev; }; struct mydatastructure { struct list_head mylist; /* Embed */ /* ... */ /* Actual Fields */ };

mylististhelinkthatchainsdifferentinstancesofmydatastr ucture.Ifyouhavemultiple list_headsembeddedinsidemydatastr ucture,eachofthemconstitutesalinkthatrenders mydatastr uctureamemberofanewlist.Youcanusethelistlibrarytoaddordeletemembership fromindividuallists.


Table 3.1. Linked List Manipulation Functions
Function INIT_LIST_HEAD() list_add() list_add_tail() list_del() list_replace() list_entry() list_for_each_entry()/ list_for_each_entry_safe() list_empty() list_splice() Purpose Initializes the list head Adds an element after the list head Adds an element to the tail of the list Deletes an element from the list Replaces an element in the list with another Loops through all nodes in the list Simpler list iteration interfaces Checks whether there are any elements in the list Joins one list with another

CHAPTER 4. Char Drivers


Thecontactdetailsofthedriverareexportedtouserspaceviathe/devdirectory:
bash> ls -l /dev
total 0 crw------- 1 root root 5, 1 Jul 16 10:02 console ... lrwxrwxrwx 1 root root 3 Oct 6 10:02 cdrom -> hdc ... brw-rw---- 1 root disk 3, 0 Oct 6 2007 hda brw-rw---- 1 root disk 3, 1 Oct 6 2007 hda1 ... crw------- 1 root tty 4, 1 Oct 6 10:20 tty1 crw------- 1 root tty 4, 2 Oct 6 10:02 tty2

Thefirstcharacterineachlineofthelsoutputdenotesthedrivertype:csignifiesachar driver,bstandsforablockdriver,andldenotesasymboliclink.Thenumbersinthefifthcolumnare

calledmajornumbers,andthoseinthesixthcolumnareminornumbers.Amajornumberbroadly identifiesthedriverORthemajornumberidentifiesthedriverassociatedwiththedevice,whereasa minornumberpinpointstheexactdeviceservicedbythedriverORTheminornumberisusedbythe kerneltodetermineexactlywhichdeviceisbeingreferredto.


THE INTERNAL REPRESENTATION OF DEVICE NUMBER

Thedev_ttype(definedin<linux/types.h>)isusedtoholddevicenumbersboththe

majorandminorparts.dev_tisa32bitquantitywith12bitssetasideforthemajornumberand 20fortheminornumber. Makeuseofasetofmacrosfoundin<linux/kdev_t.h>.Toobtainthemajororminorpartsofa dev_t,use: If,instead,youhavethemajorandminornumbersandneedtoturnthemintoadev_t,use:


MKDEV(int major, int minor); MAJOR(dev_t dev); MINOR(dev_t dev);

ALLOCATING AND FREEING DEVICE NUMBER

Yourdriverwillneedtodowhensettingupachardeviceistoobtainoneormoredevice

numberstoworkwith.Thenecessaryfunctionforthistaskisregister_chrdev_region,whichis declaredin<linux/fs.h>:
int register_chrdev_region(dev_t first, unsigned int count, char *name);

f irstisthebeginningdevicenumberoftherangeyouwouldliketoallocate.Theminor numberportionoff irstisoften0,butthereisnorequirementtothateffect.countisthetotal numberofcontiguousdevicenumbersyouarerequesting. Notethat,ifcountislarge,therangeyourequestcouldspillovertothenextmajornumber;but everythingwillstillworkproperlyaslongasthenumberrangeyourequestisavailable.Finally, nameisthenameofthedevice. it will appear in /proc/devices and sysfs. Thekernelwillhappilyallocateamajornumberforyouonthefly,butyoumustrequestthis allocationbyusingadifferentfunction:
int alloc_chrdev_region(dev_t *dev, unsigned int firstminor, unsigned int count, char *name);

Withthisfunction,devisanoutputonlyparameterthatwill,onsuccessfulcompletion,hold thefirstnumberinyourallocatedrange.f irstminorshouldbetherequestedfirstminornumberto

use;itisusually0.Thecountandnameparametersworklikethosegivento request_chrdev_region. Regardlessofhowyouallocateyourdevicenumbers,youshouldfreethemwhentheyarenolongerin use.Devicenumbersarefreedwith:


void unregister_chrdev_region(dev_t first, unsigned int count);

Theusualplacetocallunregister_chrdev_regionwouldbeinyourmodulescleanup function.Tocheckthemajornumberallotedindynamicallocationuse/proc/devicesandsearch foryournamementionedinfunctioninthelist.Heresthecodeweuseinscullssourcetogetamajor number:


if (scull_major) { dev = MKDEV(scull_major, scull_minor); result = register_chrdev_region(dev, scull_nr_devs, "scull"); } else { result = alloc_chrdev_region(&dev, scull_minor, scull_nr_devs, "scull"); scull_major = MAJOR(dev); } if (result < 0) { printk(KERN_WARNING "scull: can't get major %d\n", scull_major); return result; }

ThiscodeisdefinedinsidetheINITFUN CTION.Thiscodeallocatethemajornumberto yourdevice.Usethefollowingcodecreatethenodeinshellprompt...


mknod /dev/scull0 c major_no minor_no

SOME IMPORTANT DATA STRUCTURES

Let'stakeastepfurtherandpeekinsideachardriver.Fromacodeflowperspective,char drivershavethefollowing:

Aninitialization(orinit())routinethatisresponsibleforinitializingthedeviceandseamlessly tyingthedrivertotherestofthekernelviaregistrationfunctions. Asetofentrypoints(ormethods)suchasopen(),read(),ioctl(),llseek(),andwrite(),which directlycorrespondtoI/Osystemcallsinvokedbyuserapplicationsovertheassociated/devnode. Interruptroutines,bottomhalves,timerhandlers,helperkernelthreads,andothersupport infrastructure.

Thesearelargelytransparenttouserapplications.Fromadataflowperspective,chardriversownthe followingkeydatastructures:
1. Aperdevicestr ucture.Thisistheinformationrepositoryaroundwhichthedriverrevolves. struct scull_dev { struct scull_qset *data; /* Pointer to first quantum set */ int quantum; /* the current quantum size */ int qset; /* the current array size */ unsigned long size; /* amount of data stored here */ unsigned int access_key; /* used by sculluid and scullpriv */ struct semaphore sem; /* mutual exclusion semaphore */ struct cdev cdev; /* Char device structure */ }; 2. str uctcdev,akernelabstractionforcharacterdrivers.Thisstructureisusuallyembedded

insidetheperdevicestructurereferredpreviously.
static void scull_setup_cdev(struct scull_dev *dev, int index){ int err, devno = MKDEV(scull_major, scull_minor + index); cdev_init(&dev->cdev, &scull_fops); dev->cdev.owner = THIS_MODULE; dev->cdev.ops = &scull_fops; err = cdev_add (&dev->cdev, devno, 1); /* Fail gracefully if need be */ if (err) printk(KERN_NOTICE "Error %d adding scull%d", err, index); }

3.str uctf ile_operations,whichcontainstheaddressesofalldriverentrypoints.


struct file_operations scull_fops = { .owner = THIS_MODULE, .llseek = scull_llseek, .read = scull_read, .write = scull_write, .ioctl = scull_ioctl, .open = scull_open, .release = scull_release, }; 4. str uctf ile,whichcontainsinformationabouttheassociated/devnode.structfile,defined

in<linux/fs.h>
5. inodestr uctureisusedbythekernelinternallytorepresentfiles.Therefore,itisdifferent

fromthefilestructurethatrepresentsanopenfiledescriptor. Therecanbenumerousfilestructuresrepresentingmultipleopendescriptorsonasinglefile,

buttheyallpointtoasingleinodestructure.Onlytwofieldsofthisstructureareofinterestfor writingdrivercode:
dev_t i_rdev;

Forinodesthatrepresentdevicefiles,thisfieldcontainstheactualdevicenumber.
struct cdev *i_cdev;

Structcdevisthekernelsinternalstructurethatrepresentschardevices;thisfieldcontainsa pointertothatstructurewhentheinodereferstoachardevicefile.
CHAR DEVICE REGISTRATION

Thekernelusesstructuresoftypestr uctcdevtorepresentchardevicesinternally.Before thekernelinvokesyourdevicesoperations,youmustallocateandregisteroneormoreofthese structures.Todoso,yourcodeshouldinclude<linux/cdev.h>,wherethestructureandits associatedhelperfunctionsaredefined. Therearetwowaysofallocatingandinitializingoneofthesestructures.Ifyouwish toobtainastandalonecdevstructureatruntime,youmaydosowithcodesuchas:


struct cdev *my_cdev = cdev_alloc( ); my_cdev->ops = &my_fops;

Oncethecdevstructureissetup,thefinalstepistotellthekernelaboutitwithacallto: intcdev_add(str uctcdev*dev,dev_tnum,unsignedintcount); Here,devisthecdevstr ucture,numisthefirstdevicenumbertowhichthisdevice responds,andcountisthenumberofdevicenumbersthatshouldbeassociatedwiththedevice.To removeachardevicefromthesystem,call:


void cdev_del(struct cdev *dev);

Clearly,youshouldnotaccessthecdevstructureafterpassingittocdev_del.
void scull_setup_cdev(struct scull_dev *dev,int index) { int err,devno; devno = MKDEV(scull_major,scull_minor + index); cdev_init(&dev->cdev,&scull_fops); dev->cdev.owner = THIS_MODULE; dev->cdev.ops = &scull_fops; err = cdev_add(&dev->cdev,devno,1); if(err) printk(KERN_NOTICE "Error %d adding scull %d",err,index); dev->max_size = 1024*8; dev->cur_size = 0; }

DEVICE REGISTRATION IN SCULL

OLDERWAY:

int register_chrdev(unsigned int major, const char *name, struct file_operations *fops); majoristhemajornumberofinterest,nameisthenameofthedriver(itappearsin /proc/devices),andfopsisthedefaultf ile_operationsstructure.Acalltoregister_chrdev registersminornumbers0255forthegivenmajor,andsetsupadefaultcdevstructureforeach.If youuseregister_chrdev,theproperfunctiontoremoveyourdevice(s)fromthesystemis: int unregister_chrdev(unsigned int major, const char *name); majorandnamemustbethesameasthosepassedtoregister_chrdev,orthecallwillfail.
OPEN AND RELEASE

Theclassicwaytoregisterachardevicedriveriswith:

TheOpenMethod Theopenmethodisprovidedforadrivertodoanyinitializationinpreparationforlater operations.Inmostdrivers,openshouldperformthefollowingtasks: Checkfordevicespecificerrors(suchasdevicenotreadyorsimilarhardwareproblems) Initializethedeviceifitisbeingopenedforthefirsttime Updatethef_oppointer,ifnecessary Allocateandfillanydatastructuretobeputinf ilp>private_data Theprototypefortheopenmethodis:


int (*open)(struct inode *inode, struct file *filp);

Theinodeargumenthastheinformationweneedintheformofitsi_cdevfield,which containsthecdevstructurewesetupbefore.Theonlyproblemisthatwedonotnormallywantthe cdevstructureitself,wewantthescull_devstructurethatcontainsthatcdevstructure.The container_ofmacro,definedin<linux/kernel.h>:


container_of(pointer, container_type, container_field);

Thismacrotakesapointertoafieldoftypecontainer_f ield,withinastructureoftype container_type,andreturnsapointertothecontainingstructure.Inscull_open,thismacrois usedtofindtheappropriatedevicestructure:


struct scull_dev *dev; /* device information */ dev = container_of(inode->i_cdev, struct scull_dev, cdev); filp->private_data = dev; /* for other methods */

Onceithasfoundthescull_devstructure,scullstoresapointertoitintheprivate_data fieldofthefilestructureforeasieraccessinthefuture.

int scull_open(struct inode *inode,struct file *filp) { struct scull_dev *dev; dev = container_of(inode->i_cdev,struct scull_dev,cdev); filp->private_data = dev; // For other methods if((filp->f_flags &O_ACCMODE) == O_WRONLY) { scull_trim(dev); // Ignore errors } return 0; } The Release Method

Theroleofthereleasemethodisthereverseofopen.Sometimesyoullfindthatthemethod implementationiscalleddevice_closeinsteadofdevice_release.Eitherway,thedevicemethodshould performthefollowingtasks: Deallocateanythingthatopenallocatedinfilp>private_data Shutdownthedeviceonlastclose Thebasicformofscullhasnohardwaretoshutdown,sothecoderequiredis minimal:


int scull_release(struct inode *inode, struct file *filp) { return 0; }

Youmaybewonderingwhathappenswhenadevicefileisclosedmoretimesthanitisopened. Afterall,thedupandforksystemcallscreatecopiesofopenfileswithoutcallingopen;eachofthose copiesisthenclosedatprogramtermination. Theanswerissimple:noteveryclosesystemcallcausesthereleasemethodtobeinvoked.Only thecallsthatactuallyreleasethedevicedatastructureinvokethemethodhenceitsname.Thekernel keepsacounterofhowmanytimesafilestructureisbeingused.Neitherforknordupcreatesanew filestructure(onlyopendoesthat);theyjustincrementthecounterintheexistingstructure.Theclose systemcallexecutesthereleasemethodonlywhenthecounterforthefilestructuredropsto0,which happenswhenthestructureisdestroyed.Thisrelationshipbetweenthereleasemethodandtheclose systemcallguaranteesthatyourdriverseesonlyonereleasecallforeachopen.

scull's MEMORY USAGE

Theregionofmemoryusedbyscull,alsocalledadevice,isvariableinlength.The

moreyouwrite,themoreitgrows;trimmingisperformedbyoverwritingthedevicewithashorter file. ThesculldriverintroducestwocorefunctionsusedtomanagememoryintheLinuxkernel. Thesefunctions,definedin<linux/slab.h>,are: Acalltokmallocattemptstoallocatesizebytesofmemory;thereturnvalueisapointerto thatmemoryorNULLiftheallocationfails.Theflagsargumentisusedtodescribehowthe memoryshouldbeallocated. Allocatedmemoryshouldbefreedwithkfree.Youshouldneverpassanythingtokfreethat wasnotobtainedfromkmalloc.Itis,however,legaltopassaNULLpointertokfree.


int scull_trim(struct scull_dev *dev) { kfree(dev->data); dev->data = '\0'; return 0; } void *kmalloc(size_t size, int flags); void kfree(void *ptr);

scull_trimisalsousedinthemodulecleanupfunctiontoreturnmemoryusedbysculltothe

system.
READ AND WRITE
The read and write methods both perform a similar task, that is, copying data from and to application code. Therefore, their prototypes are pretty similar, ssize_t read(struct file *filp, char __user *buff, size_t count, loff_t *offp); ssize_t write(struct file *filp, const char __user *buff, size_t count, loff_t *offp);

f ilpisthefilepointerandcountisthesizeoftherequesteddatatransfer.Thebuf f argumentpointstotheuserbufferholdingthedatatobewrittenortheemptybufferwherethe newlyreaddatashouldbeplaced.Finally,of fpisapointertoalongoffsettypeobjectthat indicatesthefilepositiontheuserisaccessing.Thereturnvalueisasignedsizetype; Thebuf fargumenttothereadandwritemethodsisauserspacepointer.Therefore,it cannotbedirectlydereferencedbykernelcode.Thereareafewreasonsforthisrestriction: Dependingonwhicharchitectureyourdriverisrunningon,andhowthekernelwasconfigured, theuserspacepointermaynotbevalidwhilerunninginkernelmodeatall.Theremaybenomapping

forthataddress,oritcouldpointtosomeother,randomdata. Evenifthepointerdoesmeanthesamethinginkernelspace,userspacememoryispaged,andthe memoryinquestionmightnotberesidentinRAMwhenthesystemcallismade.Attemptingto referencetheuserspacememorydirectlycouldgenerateapagefault,whichissomethingthatkernel codeisnotallowedtodo.Theresultwouldbeanoops,whichwouldresultinthedeathofthe processthatmadethesystemcall. Thepointerinquestionhasbeensuppliedbyauserprogram,whichcouldbebuggyormalicious.If yourdrivereverblindlydereferencesausersuppliedpointer,itprovidesanopendoorwayallowinga userspaceprogramtoaccessoroverwritememoryanywhereinthesystem.Ifyoudonotwishtobe responsibleforcompromisingthesecurityofyouruserssystems,youcannoteverdereferenceauser spacepointerdirectly. Yourdrivermustbeabletoaccesstheuserspacebufferinordertogetitsjobdone.Thisaccess mustalwaysbeperformedbyspecial,kernelsuppliedfunctions,however,inordertobesafe.We introducesomeofthosefunctions(whicharedefinedin<asm/uaccess.h>) Thecodeforreadandwr iteinscullneedstocopyawholesegmentofdatatoorfromthe useraddressspace.Thiscapabilityisofferedbythefollowingkernelfunctions,whichcopyan arbitraryarrayofbytesandsitattheheartofmostreadandwriteimplementations:
unsigned long copy_to_user(void __user *to, const void *from, unsigned long count); unsigned long copy_from_user(void *to, const void __user *from, unsigned long count);

Theroleofthetwofunctionsisnotlimitedtocopyingdatatoandfromuserspace:Theyalso checkwhethertheuserspacepointerisvalid.Ifthepointerisinvalid,nocopyisperformed;ifan invalidaddressisencounteredduringthecopy,ontheotherhand,onlypartofthedataiscopied.In bothcases,thereturnvalueistheamountofmemorystilltobecopied.Thescullcodelooksforthis errorreturn,andreturnsEFAULTtotheuserifitsnot0. Asfarastheactualdevicemethodsareconcerned,thetaskofthereadmethodistocopy datafromthedevicetouserspace(usingcopy_to_user),whilethewr itemethodmustcopy datafromuserspacetothedevice(usingcopy_from_user).Eachreadorwritesystemcall requeststransferofaspecificnumberofbytes,butthedriverisfreetotransferlessdata. Boththereadandwr itemethodsreturnanegativevalueifanerroroccurs.Areturnvalue greaterthanorequalto0,instead,tellsthecallingprogramhowmanybyteshavebeensuccessfully transferred.Ifsomedataistransferredcorrectlyandthenanerrorhappens,thereturnvaluemustbe

thecountofbytessuccessfullytransferred,andtheerrordoesnotgetreporteduntilthenexttimethe functioniscalled.
The Read Method

Thereturnvalueforreadisinterpretedbythecallingapplicationprogram: Ifthevalueequalsthecountargumentpassedtothereadsystemcall,therequestednumberof byteshasbeentransferred.Thisistheoptimalcase. Ifthevalueispositive,butsmallerthancount,onlypartofthedatahasbeentransferred.This mayhappenforanumberofreasons,dependingonthedevice.Mostoften,theapplicationprogram retriestheread.Forinstance,ifyoureadusingthefreadfunction,thelibraryfunctionreissuesthe systemcalluntilcompletionoftherequesteddatatransfer. Ifthevalueis0,endoffilewasreached(andnodatawasread). Anegativevaluemeanstherewasanerror.Thevaluespecifieswhattheerrorwas,accordingto <linux/er r no.h>.TypicalvaluesreturnedonerrorincludeEINTR(interruptedsystemcall)or EFAULT(badaddress). Inacaseofthereisnodata,butitmayarrivelater.Inthiscase,thereadsystemcallshouldblock. Ifthecurrentreadpositionisgreaterthanthedevicesize,thereadmethodofscullreturns0to signalthattheresnodataavailable(inotherwords,wereatendoffile).Thissituationcanhappen ifprocessAisreadingthedevicewhileprocessBopensitforwriting,thustruncatingthedevicetoa lengthof0.ProcessAsuddenlyfindsitselfpastendoffile,andthenextreadcallreturns0.
ssize_t scull_read(struct file *filp,char __user *buf,size_t count,loff_t *f_pos) { struct scull_dev *dev = filp->private_data; ssize_t retval = 0; if(*f_pos > (dev->cur_size)) goto out; if(*f_pos + count > dev->cur_size) count = (dev->cur_size) - *f_pos; if(copy_to_user(buf,dev->data,count)) { retval = -EFAULT; goto out; } *f_pos = *f_pos + count; retval = count; out: return retval; }

The Write Method

wr ite,likeread,cantransferlessdatathanwasrequested,accordingtothefollowing

rulesforthereturnvalue: Ifthevalueequalscount,therequestednumberofbyteshasbeentransferred. Ifthevalueispositive,butsmallerthancount,onlypartofthedatahasbeentransferred.The programwillmostlikelyretrywritingtherestofthedata. Ifthevalueis0,nothingwaswritten.Thisresultisnotanerror,andthereisnoreasontoreturn anerrorcode.Onceagain,thestandardlibraryretriesthecalltowrite. Anegativevaluemeansanerroroccurred;asforread,validerrorvaluesarethosedefinedin <linux/er r no.h>.


ssize_t scull_write(struct file *filp,char __user *buf,size_t count,loff_t *f_pos) { struct scull_dev *dev = filp->private_data; ssize_t retval=0; dev->data = kmalloc(8192*sizeof(char *),GFP_KERNEL); if(!dev->data) goto out; memset(dev->data,0,8192); if(*f_pos > (dev->max_size)) goto out; if(*f_pos + count > dev->max_size) count = (dev->max_size) - *f_pos; if(copy_from_user(dev->data,buf,count)) { retval = -EFAULT; goto out; } *f_pos = *f_pos + count; retval = count; if(dev->cur_size < *f_pos) dev->cur_size = *f_pos; out: return retval; }

COMPLETE PROGRAM OF CHARACTER DRIVER


#include <linux/init.h> #include <linux/module.h> #include <linux/fs.h> #include <linux/kernel.h> #include <linux/kdev_t.h> #include <linux/slab.h> #include <asm/uaccess.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/cdev.h> #include "scull.h" #include "myioctl.h" MODULE_LICENSE("Dual BSD/GPL"); #define NO_OF_DEVICES 4 int scull_major=0,scull_minor=0; dev_t dev1; struct scull_dev dev[NO_OF_DEVICES]; /* File operations structure. Defined in linux/fs.h */ static struct file_operations scull_fops= { .owner = THIS_MODULE, /* Owner */ .read = scull_read, /* Read method */ .write = scull_write, /* Write method */ .open = scull_open, /* Open method */ .release = scull_release, /* Release method */ .ioctl = scull_ioctl, /* Ioctl method */ }; /* * Freeing the data area */ int scull_trim(struct scull_dev *dev) { kfree(dev->data); dev->data = '\0'; return 0; } /* * Open CDEV device */ int scull_open(struct inode *inode,struct file *filp) { struct scull_dev *dev; dev = container_of(inode->i_cdev,struct scull_dev,cdev); filp->private_data = dev; if((filp->f_flags &O_ACCMODE) == O_WRONLY) { scull_trim(dev); } return 0;

} /* * Release CDEV device */ int scull_release(struct inode *inode,struct file *filp) { return 0; } /* * Read data from specified CDEV device */ ssize_t scull_read(struct file *filp,char __user *buf,size_t count,loff_t *f_pos) { struct scull_dev *dev = filp->private_data; ssize_t retval = 0; if(*f_pos > (dev->cur_size)) goto out; if(*f_pos + count > dev->cur_size) count = (dev->cur_size) - *f_pos; if(copy_to_user(buf,dev->data,count)) { retval = -EFAULT; goto out; } *f_pos = *f_pos + count; retval = count; out: return retval; } /* * Write data to specified CDEV devices */ ssize_t scull_write(struct file *filp,char __user *buf,size_t count,loff_t *f_pos) { struct scull_dev *dev = filp->private_data; ssize_t retval=0; dev->data = kmalloc(8192*sizeof(char *),GFP_KERNEL); if(!dev->data) goto out; memset(dev->data,0,8192); if(*f_pos > (dev->max_size)) goto out; if(*f_pos + count > dev->max_size) count = (dev->max_size) - *f_pos; if(copy_from_user(dev->data,buf,count)) { retval = -EFAULT; goto out; } *f_pos = *f_pos + count; retval = count;

if(dev->cur_size < *f_pos) dev->cur_size = *f_pos; out: return retval; } /* * Ioctl to specified CDEV devices */ int scull_ioctl(struct inode *inode,struct file *filp,unsigned int cmd,unsigned long arg) { int err=0,retval=0; struct scull_dev *dev = filp->private_data; if(_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) return -ENOTTY; if(_IOC_NR(cmd) > SCULL_IOC_MAXNR) return -ENOTTY; if(_IOC_DIR(cmd) & _IOC_READ) err = !access_ok(VERIFY_WRITE,(void *)arg,_IOC_SIZE(cmd)); else if(_IOC_DIR(cmd) & _IOC_WRITE) err = !access_ok(VERIFY_READ,(void *)arg,_IOC_SIZE(cmd)); if(err) return -EFAULT; switch(cmd) { case SCULL_IOCTCURSIZE : retval = __put_user(dev->cur_size,(int *)arg); break; case SCULL_IOCTMAXSIZE : if(!capable(CAP_SYS_ADMIN)) return -EPERM; retval = __get_user(dev->max_size,(int *)arg); break; } return retval; } void scull_setup_cdev(struct scull_dev *dev,int index) { int err,devno; devno = MKDEV(scull_major,scull_minor + index); cdev_init(&dev->cdev,&scull_fops); dev->cdev.owner = THIS_MODULE; dev->cdev.ops = &scull_fops; err = cdev_add(&dev->cdev,devno,1); if(err) printk(KERN_NOTICE "Error %d adding scull %d",err,index); dev->max_size = 1024*8; dev->cur_size = 0; } static int scull_init(void) { int result,i; result = alloc_chrdev_region(&dev1,scull_minor,NO_OF_DEVICES,"chrdev"); if(result < 0)

printk(KERN_WARNING "can't get major %d",scull_major); else { printk(KERN_ALERT "get major %d",scull_major); scull_major = MAJOR(dev1);

} for(i = 0;i < NO_OF_DEVICES;i++) scull_setup_cdev(&dev[i],i); return 0; } static void scull_exit(void) { int i; for(i = 0;i < NO_OF_DEVICES;i++) cdev_del(&dev[i].cdev); unregister_chrdev_region(dev1,4); } module_init(scull_init); module_exit(scull_exit);

CHAPTER 5 :Advance Char Driver Operations


IOCTL

Theioctldrivermethodhasaprototypethatdifferssomewhatfromtheuserspace

version: int (*ioctl) (struct inode *inode, struct file *filp,unsigned int cmd, unsigned long arg); Theinodeandf ilppointersarethevaluescorrespondingtothefiledescriptorfdpassedon bytheapplicationandarethesameparameterspassedtotheopenmethod.Thecmdargumentis passedfromtheuserunchanged,andtheoptionalargargumentispassedintheformofan unsignedlong,regardlessofwhetheritwasgivenbytheuserasanintegerorapointer.Ifthe invokingprogramdoesntpassathirdargument,theargvaluereceivedbythedriveroperationis undefined.Becausetypecheckingisdisabledontheextraargument,thecompilercantwarnyouifan invalidargumentispassedtoioctl,andanyassociatedbugwouldbedifficulttospot.
CHOOSING THE IOCTL CMMANDS

Theioctlcommandnumbersshouldbeuniqueacrossthesysteminordertopreventerrors

causedbyissuingtherightcommandtothewrongdevice.Ifeachioctlnumberisunique,the

applicationgetsanEINVALerrorratherthansucceedingindoingsomethingunintended. Tohelpprogrammerscreateuniqueioctlcommandcodes,thesecodeshavebeensplitupinto severalbitfields.ThefirstversionsofLinuxused16bitnumbers:thetopeightwerethemagic numbersassociatedwiththedevice,andthebottomeightwereasequentialnumber,uniquewithin thedevice. Theapprovedwaytodefineioctlcommandnumbersusesfourbitfields,whichhavethe followingmeanings.Newsymbolsintroducedinthislistaredefinedin<linux/ioctl.h>. type Themagicnumber.Justchooseonenumber(afterconsultingioctlnumber.txt)anduseit throughoutthedriver.Thisfieldiseightbitswide(_IOC_T YPEBITS). number Theordinal(sequential)number.Itseightbits(_IOC_NRBITS)wide. direction Thedirectionofdatatransfer,iftheparticularcommandinvolvesadatatransfer.Thepossible valuesare_IOC_NONE(nodatatransfer),_IOC_READ,_IOC_WRITE,and _IOC_READ|_IOC_WRITE(dataistransferredbothways).Datatransferisseenfromthe applicationspointofview;_IOC_READmeansreadingfromthedevice,sothedrivermustwrite touserspace. size Thesizeofuserdatainvolved.Thewidthofthisfieldisarchitecturedependent,butisusually13 or14bits.Youcanfinditsvalueforyourspecificarchitectureinthemacro_IOC_SIZEBITS. Theheaderfile<asm/ioctl.h>,whichisincludedby<linux/ioctl.h>,definesmacrosthathelp setupthecommandnumbersasfollows:_IO(type,nr)(foracommandthathasnoargument), _IOR(type,nr,datatype)(forreadingdatafromthedriver),_IOW(type,nr,datatype)(for writingdata),and_IOWR(type,nr,datatype)(forbidirectionaltransfers).Thetypeand numberfieldsarepassedasarguments,andthesizefieldisderivedbyapplyingsizeoftothe datatypeargument.

Hereishowsomeioctlcommandsaredefinedinscull.Inparticular,thesecommandssetand getthedriversconfigurableparameters.
/* Use 'k' as magic number */ #define SCULL_IOC_MAGIC 'k' /* Please use a different 8-bit number in your code */ #define SCULL_IOCRESET _IO(SCULL_IOC_MAGIC, 0) /* * S means "Set" through a ptr, * T means "Tell" directly with the argument value * G means "Get": reply by setting through a pointer * Q means "Query": response is on the return value * X means "eXchange": switch G and S atomically * H means "sHift": switch T and Q atomically */ #define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC, 1, int) #define SCULL_IOCSQSET _IOW(SCULL_IOC_MAGIC, 2, int) #define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC, 3) #define SCULL_IOCTQSET _IO(SCULL_IOC_MAGIC, 4) #define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC, 5, int) #define SCULL_IOCGQSET _IOR(SCULL_IOC_MAGIC, 6, int) #define SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC, 7) #define SCULL_IOCQQSET _IO(SCULL_IOC_MAGIC, 8) #define SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC, 9, int) #define SCULL_IOCXQSET _IOWR(SCULL_IOC_MAGIC,10, int) #define SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC, 11) #define SCULL_IOCHQSET _IO(SCULL_IOC_MAGIC, 12) #define SCULL_IOC_MAXNR 14 USING THE IOCTL ARGUMENTS

Howtousetheextraargument.Ifitisaninteger,itseasy:itcanbeuseddirectly.Ifitisa pointer,however,somecaremustbetaken.Whenapointerisusedtorefertouserspace,wemust ensurethattheuseraddressisvalid.Anattempttoaccessanunverifiedusersuppliedpointercan leadtoincorrectbehavior,akerneloops,systemcorruption,orsecurityproblems. Thecopy_from_userandcopy_to_userfunctions,whichcanbeusedtosafelymovedata to andfromuserspace.Thosefunctionscanbeusedinioctlmethodsaswell,butioctlcallsoften involvesmalldataitemsthatcanbemoreefficientlymanipulatedthroughothermeans.Tostart, addressverification(withouttransferringdata)isimplementedbythefunctionaccess_ok,whichis declaredin<asm/uaccess.h>:


int access_ok(int type, const void *addr, unsigned long size);

ThefirstargumentshouldbeeitherVERIFY_READorVERIFY_WRITE,depending on

whethertheactiontobeperformedisreadingtheuserspacememoryareaorwritingit.Theaddr argumentholdsauserspaceaddress,andsizeisabytecount.Ifioctl,forinstance,needstoreadan integervaluefromuserspace,sizeissizeof(int).Ifyouneedtobothreadandwriteatthegiven address,useVERIFY_WRITE,sinceitisasupersetofVERIFY_READ. Unlikemostkernelfunctions,access_okreturnsabooleanvalue:1forsuccess(accessisOK) and0forfailure(accessisnotOK).Ifitreturnsfalse,thedrivershouldusuallyreturnEFAULT tothecaller. Thereareacoupleofinterestingthingstonoteaboutaccess_ok.First,itdoesnotdothe completejobofverifyingmemoryaccess;itonlycheckstoseethatthememoryreferenceisinaregion ofmemorythattheprocessmightreasonablyhaveaccessto.Inparticular,access_okensuresthat theaddressdoesnotpointtokernelspacememory.Second,mostdrivercodeneednotactuallycall access_ok.Thememoryaccessroutinesdescribedlatertakecareofthatforyou.
int err = 0, tmp; int retval = 0; /* * extract the type and number bitfields, and don't decode * wrong cmds: return ENOTTY (inappropriate ioctl) before access_ok( ) */ if (_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) return -ENOTTY; if (_IOC_NR(cmd) > SCULL_IOC_MAXNR) return -ENOTTY; /* * the direction is a bitmask, and VERIFY_WRITE catches R/W * transfers. `Type' is user-oriented, while * access_ok is kernel-oriented, so the concept of "read" and * "write" is reversed */ if (_IOC_DIR(cmd) & _IOC_READ) err = !access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd)); else if (_IOC_DIR(cmd) & _IOC_WRITE) err = !access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd)); if (err) return -EFAULT;

put_user(datum,ptr) __put_user(datum,ptr) Thesemacroswritethedatumtouserspace;theyarerelativelyfastandshouldbecalled insteadofcopy_to_userwheneversinglevaluesarebeingtransferred.put_usercheckstoensure thattheprocessisabletowritetothegivenmemoryaddress.Itreturns0onsuccess,and

EFAULTonerror.__put_userperformslesschecking(itdoesnotcallaccess_ok),butcanstill failifthememorypointedtoisnotwritablebytheuser.Thus,__put_usershouldonlybeusedif thememoryregionhasalreadybeenverifiedwithaccess_ok. Asageneralrule,youcall__put_usertosaveafewcycleswhenyouareimplementinga readmethod,orwhenyoucopyseveralitemsand,thus,callaccess_okjustoncebeforethefirst datatransfer. get_user(local,ptr) __get_user(local,ptr) Thesemacrosareusedtoretrieveasingledatumfromuserspace.Theybehavelikeput_user and__put_user,buttransferdataintheoppositedirection.__get_usershouldonlybeusedif theaddresshasalreadybeenverifiedwithaccess_ok.
INTRODUCTION TO SLEEPING

Whenaprocessisputtosleep,itismarkedasbeinginaspecialstateandremovedfromthe Acoupleofrulesthatyoumustkeepinmindtobeabletocodesleepsinasafemanner.

schedulersrunqueue.TheprocesswillnotbescheduledonanyCPUand,therefore,willnotrun. 1. Neversleepwhenyouarerunninginanatomiccontext.Anatomiccontextissimplyastate wheremultiplestepsmustbeperformedwithoutanysortofconcurrentaccess.Whatthat means,withregardtosleeping,isthatyourdrivercannotsleepwhileholdingaspinlock , seqlock ,orRCUlock.Youalsocannotsleepifyouhavedisabledinterrupts. 2. Yourprocesscannotsleepunlessitisassuredthatsomebodyelse,somewhere,willwakeitup. Itpossibleforyoursleepingprocesstobefoundis,instead,accomplishedthroughadata structurecalledawaitqueue.Awaitqueueisjustwhatitsoundslike:alistofprocesses,all waitingforaspecificevent. InLinux,awaitqueueismanagedbymeansofawaitqueuehead,astructureof typewait_queue_head_t,whichisdefinedin<linux/wait.h>.Awaitqueueheadcan bedefinedandinitializedstaticallywith: ordynamiclyasfollows:
DECLARE_WAIT_QUEUE_HEAD(name); wait_queue_head_t my_queue; init_waitqueue_head(&my_queue);

SIMPLE SLEEPING

ThesimplestwayofsleepingintheLinuxkernelisamacrocalledwait_event(withafew variants);itcombineshandlingthedetailsofsleepingwithacheckontheconditionaprocessis waitingfor.Theformsofwait_eventare:


wait_event(queue, condition) wait_event_interruptible(queue, condition) wait_event_timeout(queue, condition, timeout) wait_event_interruptible_timeout(queue, condition, timeout)

Inalloftheaboveforms,queueisthewaitqueueheadtouse.Noticethatitispassedby value.Theconditionisanarbitrarybooleanexpressionthatisevaluatedbythemacrobeforeand aftersleeping;untilconditionevaluatestoatruevalue,theprocesscontinuestosleep. Ifyouusewait_event,yourprocessisputintoanuninterruptiblesleepwhich,aswehave mentionedbefore,isusuallynotwhatyouwant.Thepreferredalternativeis wait_event_inter r uptible,whichcanbeinterruptedbysignals.Thisversionreturnsaninteger valuethatyoushouldcheck;anonzerovaluemeansyoursleepwasinterruptedbysomesortofsignal, andyourdrivershouldprobablyreturnERESTARTSYS.Thefinalversions (wait_event_timeoutandwait_event_inter ruptible_timeout)waitforalimitedtime; afterthattimeperiod(expressedinjiffies) Thebasicfunctionthatwakesupsleepingprocessesiscalledwake_up.Itcomesinseveral forms(butwelookatonlytwoofthemnow):
void wake_up(wait_queue_head_t *queue); void wake_up_interruptible(wait_queue_head_t *queue);

wake_upwakesupallprocesseswaitingonthegivenqueue.Theotherform(wake_up_ inter r uptible)restrictsitselftoprocessesperforminganinterruptiblesleep.


static DECLARE_WAIT_QUEUE_HEAD(wq); static int flag = 0; ssize_t sleepy_read (struct file *filp, char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) going to sleep\n", current->pid, current->comm); wait_event_interruptible(wq, flag != 0); flag = 0; printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm); return 0; /* EOF */ } ssize_t sleepy_write (struct file *filp, const char __user *buf, size_t count, loff_t *pos) {

printk(KERN_DEBUG "process %i (%s) awakening the readers...\n", current->pid, current->comm); flag = 1; wake_up_interruptible(&wq); return count; /* succeed, to avoid retrial */ }

Sincewait_event_inter r uptiblechecksforaconditionthatmustbecometrue,weuse flagtocreatethatcondition. Itisinterestingtoconsiderwhathappensiftwoprocessesarewaitingwhensleepy_wr ite iscalled.Sincesleepy_readresetsflagto0onceitwakesup,youmightthinkthatthesecond processtowakeupwouldimmediatelygobacktosleep.


BLOCKING AND NONBLOCKING OPERATIONS

Inthecaseofablockingoperation,whichisthedefault,thefollowingbehaviorshouldbe

implementedinordertoadheretothestandardsemantics: Ifaprocesscallsreadbutnodatais(yet)available,theprocessmustblock.Theprocessis awakenedassoonassomedataarrives,andthatdataisreturnedtothecaller,evenifthereisless thantheamountrequestedinthecountargumenttothemethod. Ifaprocesscallswriteandthereisnospaceinthebuffer,theprocessmustblock ,anditmustbe onadifferentwaitqueuefromtheoneusedforreading.Whensomedatahasbeenwrittentothe hardwaredevice,andspacebecomesfreeintheoutputbuffer,theprocessisawakenedandthewrite callsucceeds,althoughthedatamaybeonlypartiallywrittenifthereisntroominthebuffer forthecountbytesthatwererequested. Boththesestatementsassumethattherearebothinputandoutputbuffers.Theinputbufferis requiredtoavoidlosingdatathatarriveswhennobodyisreading.Incontrast,datacantbeloston write,becauseifthesystemcalldoesntacceptdatabytes,theyremainintheuserspacebuffer.Even so,theoutputbufferisalmostalwaysusefulforsqueezingmoreperformanceoutofthehardware.

A BLOCKING I/O EXAMPLE(PIPE)

Withinadriver,aprocessblockedinareadcallisawakenedwhendataarrives;usuallythe hardwareissuesaninterrupttosignalsuchanevent,andthedriverawakenswaitingprocessesas partofhandlingtheinterrupt.Thescullpipedriverworksdifferently,sothatitcanberunwithout

requiringanyparticularhardwareoraninterrupthandler.Wechosetouseanotherprocessto generatethedataandwakethereadingprocess;similarly,readingprocessesareusedtowakewriter processesthatarewaitingforbufferspacetobecomeavailable. Thedevicedriverusesadevicestructurethatcontainstwowaitqueuesandabuffer.

struct scull_pipe { wait_queue_head_t inq; wait_queue_head_t outq; char *buffer, *end; int buffersize; int maxsize; char *rp, *wp; struct cdev cdev; };

Thereadimplementationmanagesbothblockingandnonblockinginputandlookslikethis:
ssize_t scull_read(struct file *filp,char __user *buf,size_t count,loff_t *f_pos) { int flag=0; struct scull_pipe *dev = filp->private_data; while (dev->rp==dev->wp) { if (wait_event_interruptible(dev->inq,(dev->rp!=dev->wp))) return -1; } if (dev->wp > dev->rp) count = mini(count,(size_t)(dev->wp-dev->rp)); else count = mini(count,(size_t)(dev->end-dev->rp)); if(dev->rp+count==dev->end) flag=1; if (copy_to_user(buf, dev->rp,(count+flag))) { return -1; } dev->rp += count; if (dev->rp==dev->end) { dev->rp = dev->buffer; if(dev->wp==dev->end) dev->wp=dev->buffer; } wake_up_interruptible(&dev->outq); return (count+flag);

Theifstatementthatcontainsthewait_event_interr uptiblecallchecksforthiscase. Thisstatementensurestheproperandexpectedreactiontosignals,whichcouldhavebeenresponsible forwakinguptheprocess(sincewewereinaninterruptiblesleep).Ifasignalhasarrivedandithas notbeenblockedbytheprocess,theproperbehavioristoletupperlayersofthekernelhandlethe event.Tothisend,thedriverreturnsERESTARTSYStothecaller.


ADVANCED SLEEPING
HOW A PROCESS SLEEPS

Ifyoulookinside<linux/wait.h>,youseethatdatastructurebehindthe wait_queue_head_ttypeisquitesimple;itconsistsofaspinlockandalinkedlist.Whatgoes ontothatlistisawaitqueueentry,whichisdeclaredwiththetypewait_queue_t.Thisstructure containsinformationaboutthesleepingprocessandexactlyhowitwouldliketobewokenup. Thefirststepinputtingaprocesstosleepisusuallytheallocationandinitializationofa wait_queue_tstr ucture,followedbyitsadditiontotheproperwaitqueue.thewakeupwillbe abletofindtherightprocesses. Thenextstepistosetthestateoftheprocesstomarkitasbeingasleep.Thereareseveraltask statesdefinedin<linux/sched.h>.TASK_RUNNINGmeansthattheprocessisabletorun, althoughitisnotnecessarilyexecutingintheprocessoratanyspecificmoment.Therearetwostates thatindicatethataprocessisasleep:TASK_INTERRUPTIBLEand TASK_UNINTERRUPTIBLE;theycorrespond,ofcourse,tothetwotypesofsleep. Thecalltouseis:
void set_current_state(int new_state);

Inoldercode,youoftenseesomethinglikethisinstead:
current->state = TASK_INTERRUPTIBLE;

Butchangingcurrentdirectlyinthatmannerisdiscouraged;suchcodebreakseasilywhen datastructureschange. Givinguptheprocessoristhefinalstep,butthereisonethingtodofirst:youmustcheckthe conditionyouaresleepingforfirst. Consequently,downinsidecodethatsleeps,youtypicallyseesomethingsuchas:


if (!condition) schedule( );

Thecalltoscheduleis,ofcourse,thewaytoinvoketheschedulerandyieldtheCPU. Wheneveryoucallthisfunction,youaretellingthekerneltoconsiderwhichprocessshouldbe runningandtoswitchcontroltothatprocessifnecessary.Soyouneverknowhowlongitwillbe beforeschedulereturnstoyourcode.


MANUAL SLEEPS

Programmerscanstillcodeamanualsleepinthatmanneriftheywantto;<linux/sched.h> containsalltherequisitedefinitions. Thefirststepisthecreationandinitializationofawaitqueueentry.Thatisusually donewiththismacro:


DEFINE_WAIT(my_wait);

inwhichnameisthenameofthewaitqueueentryvariable.Youcanalsodothingsintwosteps:
wait_queue_t my_wait; init_wait(&my_wait);

ButitisusuallyeasiertoputaDEFINE_WAITlineatthetopoftheloopthat implementsyoursleep. Thenextstepistoaddyourwaitqueueentrytothequeue,andsettheprocessstate. Bothofthosetasksarehandledbythisfunction:


void prepare_to_wait(wait_queue_head_t *queue,wait_queue_t *wait, int state);

Here,queueandwaitarethewaitqueueheadandtheprocessentry,respectively. stateisthenewstatefortheprocess;itshouldbeeitherTASK_INTERRUPTIBLE(for interruptiblesleeps,whichisusuallywhatyouwant)orTASK_UNINTERRUPTIBLE(for uninterruptiblesleeps). Aftercallingprepare_to_wait,theprocesscancallscheduleafterithascheckedtobe sureitstillneedstowait.Onceschedulereturns,itiscleanuptime.Thattask ,too,ishandledbya specialfunction:


void finish_wait(wait_queue_head_t *queue, wait_queue_t *wait);

Thereafter,yourcodecantestitsstateandseeifitneedstowaitagain. Thespacefreefunctionuseddetermineshowmuchspaceleftinthepipe.
static int spacefree(struct scull_pipe *dev) { if (dev->rp == dev->wp) return dev->maxsize - 1; return ((dev->rp + dev->maxsize - dev->wp) % dev->maxsize) - 1; }

Thescull_wr itemethodissimilartothereadfunction.
static ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,loff_t *f_pos) { int flag=0; struct scull_pipe *dev = filp->private_data; while (spacefree(dev) == 0) { if (wait_event_interruptible(dev->outq, (spacefree(dev)!= 0))) return -1; } count = mini(count, (size_t)spacefree(dev)); if (dev->wp >= dev->rp) count = mini(count, (size_t)(dev->end - dev->wp)); else count = mini(count, (size_t)(dev->rp - dev->wp - 1)); if(dev->wp+count==dev->end) flag=1; if (copy_from_user(dev->wp, buf,(count+flag))) { return -1; } dev->wp += count; wake_up_interruptible(&dev->inq); if (dev->wp == dev->end) if(dev->rp!=dev->buffer) dev->wp = dev->buffer; return (count+flag); }

Thiscodelookssimilartothereadmethod,exceptthatwehavepushedthecodethatsleeps intoaseparatefunctioncalledscull_getwritespace.Itsjobistoensurethatthereisspaceinthe bufferfornewdata,sleepingifneedbeuntilthatspacecomesavailable.Oncethespaceisthere, scull_p_wr itecansimplycopytheusersdatathere,adjustthepointers,andwakeupanyprocesses thatmayhavebeenwaitingtoreaddata.


/* Wait for space for writing; caller must hold device semaphore. On * error the semaphore will be released before returning. */ static int scull_getwritespace(struct scull_pipe *dev, struct file *filp) { while (spacefree(dev) = = 0) { /* full */ DEFINE_WAIT(wait); up(&dev->sem); if (filp->f_flags & O_NONBLOCK) return -EAGAIN; PDEBUG("\"%s\" writing: going to sleep\n",current->comm); prepare_to_wait(&dev->outq, &wait, TASK_INTERRUPTIBLE);

if (spacefree(dev) = = 0) schedule( ); finish_wait(&dev->outq, &wait); if (signal_pending(current)) return -ERESTARTSYS; /* signal: tell the fs layer to handle it */ if (down_interruptible(&dev->sem)) return -ERESTARTSYS; } return 0; }

Ifspaceisavailablewithoutsleeping,thisfunctionsimplyreturns.Otherwise,itmustdrop thedevicesemaphoreandwait.ThecodeusesDEFINE_WAITtosetupawaitqueueentryand prepare_to_waittogetreadyfortheactualsleep. Tofinishup,wecallfinish_wait.Thecalltosignal_pendingtellsuswhetherwewere awakenedbyasignal;ifso,weneedtoreturntotheuserandletthemtryagainlater.Otherwise,we reacquirethesemaphore,andtestagainforfreespaceasusual.


EXCLUSIVE WAIT

Wehaveseenthatwhenaprocesscallswake_uponawaitqueue,allprocesseswaitingon thatqueuearemaderunnable.Itispossibletoknowaheadoftimethatonlyoneoftheprocesses beingawakenedwillsucceedinobtainingthedesiredresource,andtherestwillsimplyhavetosleep again. Ifthenumberofprocessesinthewaitqueueislarge,thisthunder ingherdbehaviorcan seriouslydegradetheperformanceofthesystem.Inresponsetorealworldthunderingherdproblems, thekerneldevelopersaddedanexclusivewaitoptiontothekernel.Anexclusivewaitactsvery muchlikeanormalsleep,withtwoimportantdifferences: WhenawaitqueueentryhastheWQ_FLAG_EXCLUSIVEflagset,itisaddedtotheend ofthewaitqueue.Entrieswithoutthatflagare,instead,addedtothebeginning. Whenwake_upiscalledonawaitqueue,itstopsafterwakingthefirstprocessthathasthe WQ_FLAG_EXCLUSIVEflagset. Theendresultisthatprocessesperformingexclusivewaitsareawakenedoneatatime,inan orderlymanner,anddonotcreatethunderingherds.Puttingaprocessintoaninterruptiblewaitisa simplematterofcallingprepare_to_it_exclusive:
void prepare_to_wait_exclusive(wait_queue_head_t *queue, wait_queue_t *wait, int state);

Thiscall,whenusedinplaceofprepare_to_wait,setstheexclusiveflaginthewait queueentryandaddstheprocesstotheendofthewaitqueue.Notethatthereisnowaytoperform exclusivewaitswithwait_eventanditsvariants.


The details of waking up

Theactualbehaviorthatresultswhenaprocessisawakenediscontrolledbyafunctioninthe waitqueueentry.Thedefaultwakeupfunctionsetstheprocessintoarunnablestateand,possibly, performsacontextswitchtothatprocessifithasahigherpriority.Devicedriversshouldneverneed tosupplyadifferentwakefunction.


wake_up(wait_queue_head_t *queue); wake_up_interruptible(wait_queue_head_t *queue);

Wake_upawakenseveryprocessonthequeuethatisnotinanexclusivewait,and exactlyoneexclusivewaiter,ifitexists.wake_up_interruptibledoesthesame,withthe exceptionthatitskipsoverprocessesinanuninterruptiblesleep.Thesefunctionscan,before returning,causeoneormoreoftheprocessesawakenedtobescheduled.


wake_up_nr(wait_queue_head_t *queue, int nr); wake_up_interruptible_nr(wait_queue_head_t *queue, int nr);

Thesefunctionsperformsimilarlytowake_up,excepttheycanawakenuptonrexclusive waiters,insteadofjustone.Notethatpassing0isinterpretedasaskingforalloftheexclusive waiterstobeawakened,ratherthannoneofthem.


wake_up_all(wait_queue_head_t *queue); wake_up_interruptible_all(wait_queue_head_t *queue);

Thisformofwake_upawakensallprocesseswhethertheyareperforminganexclusivewaitor not.
wake_up_interruptible_sync(wait_queue_head_t *queue);

Normally,aprocessthatisawakenedmaypreemptthecurrentprocessandbescheduledinto theprocessorbeforewake_upreturns.Inotherwords,acalltowake_upmaynotbeatomic.Ifthe processcallingwake_upisrunninginanatomiccontext,thisreschedulingdoesnothappen. Normally,thatprotectionisadequate.If,however,youneedtoexplicitlyasktonotbescheduledout oftheprocessoratthistime,youcanusethesyncvariantofwake_up_interr uptible.This functionismostoftenusedwhenthecallerisabouttorescheduleanyway.Veryfewdriverseverneed tocallanythingexceptwake_up_interr uptible.


/* CHAR DRIVER WITH PIPE */

#include<linux/init.h> #include<linux/module.h> #include<linux/types.h> #include<linux/kdev_t.h> #include<linux/fs.h> #include<linux/slab.h> #include<linux/cdev.h> #include<asm/uaccess.h> #include <linux/wait.h> #include<linux/sched.h> MODULE_LICENSE("Dual BSD/GPL"); /* FUNCTION DECLARATIONS */ int scull_open(struct inode *inode, struct file *filp); int scull_close(struct inode *inode, struct file *filp); ssize_t scull_read(struct file *filp,char __user *buf,size_t count,loff_t *f_pos); size_t mini (size_t count,size_t siz); static ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,loff_t *f_pos); int major=0; int minor=0; int i; /* FILE OPERATION STRUCTURE */ struct file_operations scull_fops = { .owner = THIS_MODULE, .open = scull_open, .read = scull_read, .write = scull_write, .release=scull_close, }; /* PIPE DEVICE STRUCTURE */ struct scull_pipe { wait_queue_head_t inq; wait_queue_head_t outq; char *buffer, *end; int buffersize; int maxsize; char *rp, *wp; struct cdev cdev; }; struct scull_pipe dev[4]; /* MAXIMUM FOUR DEVICES */ static void scull_setup_cdev(struct scull_pipe *dev,int i) { int err, devno = MKDEV(major, minor+i); dev->maxsize=1024*8; cdev_init(&dev->cdev, &scull_fops); dev->cdev.owner = THIS_MODULE; dev->cdev.ops = &scull_fops; err = cdev_add (&dev->cdev, devno, 1); init_waitqueue_head(&(dev->inq)); init_waitqueue_head(&(dev->outq)); dev->buffer=kmalloc(dev->maxsize,GFP_KERNEL); if(dev->buffer==NULL) {

printk(KERN_ALERT "error"); } memset(dev->buffer,0,dev->maxsize); dev->rp=dev->buffer; dev->wp=dev->buffer; dev->end=dev->buffer+dev->maxsize-1; if (err) printk(KERN_NOTICE "Error %d adding scull", err); } /* MODULE INITIALIZATION FUNCTION */ static int scull_init(void) { int result,devno; /* GET MAJOR NUMBER */ if(major) { devno=MKDEV(major,minor); result=register_chrdev_region(major,1,"pipe"); } else { result=alloc_chrdev_region(&devno,minor,1,"pipe"); major=MAJOR(devno); } if(result<0) return -1; for(i=0;i<4;i++) scull_setup_cdev(&dev[i],i); return 0; } /* OPEN FUNCTION */ int scull_open(struct inode *inode, struct file *filp) { struct scull_pipe *ptr; ptr=container_of(inode->i_cdev,struct scull_pipe, cdev); filp->private_data=ptr; return 0; } /* CLOSE FUNCTION */ int scull_close(struct inode *inode, struct file *filp) { return 0; } /* READ FUNCTION */ ssize_t scull_read(struct file *filp,char __user *buf,size_t count,loff_t *f_pos) { int flag=0; struct scull_pipe *dev = filp->private_data; while (dev->rp==dev->wp) { if (wait_event_interruptible(dev->inq,(dev->rp!=dev->wp))) return -1; } if (dev->wp > dev->rp)

count = mini(count,(size_t)(dev->wp-dev->rp)); else count = mini(count,(size_t)(dev->end-dev->rp)); if(dev->rp+count==dev->end) flag=1; if (copy_to_user(buf, dev->rp,(count+flag))) { return -1; } dev->rp += count; if (dev->rp==dev->end) { dev->rp = dev->buffer; if(dev->wp==dev->end) dev->wp=dev->buffer; } wake_up_interruptible(&dev->outq); return (count+flag); } static int spacefree(struct scull_pipe *dev) { if (dev->rp == dev->wp) return dev->maxsize - 1; return ((dev->rp + dev->maxsize - dev->wp) % dev->maxsize) - 1; } /* WRITE FUNCTION */ static ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,loff_t *f_pos) { int flag=0; struct scull_pipe *dev = filp->private_data; while (spacefree(dev) == 0) { if (wait_event_interruptible(dev->outq, (spacefree(dev)!= 0))) return -1; } count = mini(count, (size_t)spacefree(dev)); if (dev->wp >= dev->rp) count = mini(count, (size_t)(dev->end - dev->wp)); else count = mini(count, (size_t)(dev->rp - dev->wp - 1)); if(dev->wp+count==dev->end) flag=1; if (copy_from_user(dev->wp, buf,(count+flag))) { return -1; } dev->wp += count; wake_up_interruptible(&dev->inq); if (dev->wp == dev->end) if(dev->rp!=dev->buffer) dev->wp = dev->buffer; return (count+flag); } size_t mini (size_t count,size_t siz) {

if(count>siz) return siz; else return count; } /* MODULE EXIT FUNCTION */ static void scull_exit(void) { for(i=0;i<4;i++) cdev_del(&dev[i].cdev); unregister_chrdev(major,"pipe"); } module_init(scull_init); module_exit(scull_exit);

CHAPTER 16:

BLOCK DRIVERS
Ablockdr iverprovidesaccesstodevicesthattransferrandomlyaccessibledatainfixedsize blocksdiskdrives,primarily.Blockdrivershaveadistinctinterfaceandtheirownparticular challenges. Ablockisafixedsizechunkofdata,thesizebeingdeterminedbythekernel.Asector,in contrast,isasmallblockwhosesizeisusuallydeterminedbytheunderlyinghardware.Thekernel expectstobedealingwithdevicesthatimplement512bytesectors.Ifyourdeviceusesadifferent size,thekerneladaptsandavoidsgeneratingI/Orequeststhatthehardwarecannothandle.The

kernelpresentsyouwithasectornumber,itisworkinginaworldof512bytesectors.Ifyouareusing adifferenthardwaresectorsize,youhavetoscalethekernelssectornumbersaccordingly.
REGISTRATION

Blockdriversmustuseasetofregistrationinterfacestomaketheirdevicesavailabletothe

kernel.
Block Driver Registration

Thefirststeptakenbymostblockdriversistoregisterthemselveswiththekernel.The Theargumentsarethemajornumberthatyourdevicewillbeusingandtheassociated
int register_blkdev(unsigned int major, const char *name);

functionforthistaskisregister_blkdev(whichisdeclaredin<linux/fs.h>): name(whichthekernelwilldisplayin/proc/devices).Ifmajorispassedas0,thekernelallocates anewmajornumberandreturnsittothecaller.Asalways,anegativereturnvaluefrom register_blkdevindicatesthatanerrorhasoccurred. Thecorrespondingfunctionforcancelingablockdriverregistrationis: Here,theargumentsmustmatchthosepassedtoregister_blkdev,orthefunctionreturns EINVALandnotunregisteranything.register_blkdevisentirelyoptional.Infuturekernels, register_blkdevmayberemovedaltogether.


static int sbull_init(void) { int result; major=register_blkdev(major,"block"); if(major<0) return -1; for(i=0;i<4;i++) { result=sbull_setup(&dev[i]); if(result<0) return -1; } return 0; } DISK REGISTRATION int unregister_blkdev(unsigned int major, const char *name);

Whileregister_blkdevcanbeusedtoobtainamajornumber,itdoesnotmakeanydisk

drivesavailabletothesystem.Thereisaseparateregistrationinterfacethatyoumustusetomanage individualdrives.

Block device operations

Chardevicesmaketheiroperationsavailabletothesystembywayofthefile_operations structure.Asimilarstructureisusedwithblockdevices;itisstructblock_device_operations, whichisdeclaredin<linux/fs.h>.


int (*open)(struct inode *inode, struct file *filp); int (*release)(struct inode *inode, struct file *filp);

Functionsthatworkjustliketheirchardriverequivalents;theyarecalledwheneverthedevice isopenedandclosed.Ablockdrivermightrespondtoanopencallbyspinningupthedevice,locking thedoor(forremovablemedia),etc.Ifyoulockmediaintothedevice,youshouldcertainlyunlockit inthereleasemethod.


int (*ioctl)(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);

Methodthatimplementstheioctlsystemcall.Blockdriverioctlmethodsarefairlyshort.
int (*media_changed) (struct gendisk *gd);

Methodcalledbythekerneltocheckwhethertheuserhaschangedthemediainthedrive, returninganonzerovalueifso.Thismethodisonlyapplicabletodrivesthatsupportremovablemedia
int (*revalidate_disk) (struct gendisk *gd);

Therevalidate_diskmethodiscalledinresponsetoamediachange;itgivesthedrivera chancetoperformwhateverworkisrequiredtomakethenewmediareadyforuse.
struct module *owner;

Apointertothemodulethatownsthisstructure;itshouldusuallybeinitializedto THIS_MODULE. oper ationsarehandledbytherequestfunction.


struct block_device_operations sbull_fops = { .owner = THIS_MODULE, .open = open1, .release=close1, }; The gendisk structure

Therearenofunctionsthatactuallyreadorwritedata.IntheblockI/Osubsystem,these

str uctgendisk(declaredin<linux/genhd.h>)isthekernelsrepresentationofan

individualdiskdevice.Infact,thekernelalsousesgendiskstructurestorepresentpartitions,but driverauthors neednotbeawareofthat.Thereareseveralfieldsinstr uctgendiskthatmustbe initializedbyablockdriver:


int major; int first_minor; int minors;

Fieldsthatdescribethedevicenumber(s)usedbythedisk.Ataminimum,adrivemustuseat leastoneminornumber.Ifyourdriveistobepartitionable,youwanttoallocateoneminornumber foreachpossiblepartitionaswell.Acommonvalueforminorsis16,whichallowsforthefull diskdeviceand15partitions.Somediskdriversuse64minornumbersforeachdevice.


char disk_name[32];

Fieldthatshouldbesettothenameofthediskdevice.
struct block_device_operations *fops;

Setofdeviceoperationsfromtheprevioussection.
struct request_queue *queue; Structure used by the kernel to manage I/O requests for this device. int flags;

A(littleused)setofflagsdescribingthestateofthedrive.Ifyourdevicehasremovablemedia, youshouldsetGENHD_FL_REMOVABLE.CDROMdrivescansetGENHD_FL_CD. If,forsomereason,youdonotwantpartitioninformationtoshowupin/proc/partitions,set GENHD_FL_SUPPRESS_PARTITION_INFO.


sector_t capacity;

Thecapacityofthisdrive,in512bytesectors.
void *private_data;

Blockdriversmayusethisfieldforapointertotheirowninternaldata. Thekernelprovidesasmallsetoffunctionsforworkingwithgendiskstructures.str uct gendiskisadynamicallyallocatedstructurethatrequiresspecialkernelmanipulationtobe initialized;driverscannotallocatethestructureontheirown. Instead,youmustcall:


struct gendisk *alloc_disk(int minors);

Theminorsargumentshouldbethenumberofminornumbersthisdiskuses;notethatyou

cannotchangetheminorsfieldlaterandexpectthingstoworkproperly.Whenadiskisnolonger needed,itshouldbefreedwith: void del_gendisk(struct gendisk *gd); Allocatingagendiskstructuredoesnotmakethediskavailabletothesystem.Todothat, youmustinitializethestructureandcalladd_disk:


void add_disk(struct gendisk *gd);

INITIALIZATION IN sbull( Device Structure)

sbullallowsamajornumbertobespecifiedatcompileormoduleloadtime.Ifnonumberisspecified, oneisallocateddynamically.Sinceacalltoregister_blkdevisrequiredfordynamicallocation, sbulldoesso:


sbull_major = register_blkdev(sbull_major, "sbull"); if (sbull_major <= 0) { printk(KERN_WARNING "sbull: unable to get major number\n"); return -EBUSY; }

Thesbulldeviceisdescribedbyaninternalstructure:
struct sbull_dev { int size; /* Device size in sectors */ u8 *data; /* The data array */ short users; /* How many users */ short media_change; /* Flag a media change? */ spinlock_t lock; /* For mutual exclusion */ struct request_queue *queue; /* The device request queue */ struct gendisk *gd; /* The gendisk structure */ };

Severalstepsarerequiredtoinitializethisstructureandmaketheassociateddeviceavailable tothesystem.
memset (dev, 0, sizeof (struct sbull_dev)); dev->size = nsectors*hardsect_size; dev->data = vmalloc(dev->size); if (dev->data = = NULL) { printk (KERN_NOTICE "vmalloc failure.\n"); return; } spin_lock_init(&dev->lock);

Itsimportanttoallocateandinitializeaspinlockbeforethenextstep,whichisthe

allocationoftherequestqueue.
dev->queue = blk_init_queue(sbull_request, &dev->lock);

sbull_requestisourrequestfunctionthefunctionthatactuallyperformsblockreadand wr iterequests.Whenweallocatearequestqueue,wemustprovideaspinlockthatcontrolsaccess tothatqueue.Thelockisprovidedbythedriverratherthanthegeneralpartsofthekernelbecause, often,the requestqueueandotherdriverdatastructuresfallwithinthesamecriticalsection;they tendtobeaccessedtogether.blk_init_queuecanfail,soyoumustcheckthereturnvaluebefore continuing. Initialize,andinstallthecorrespondinggendiskstructure(Doneinsbull_setupfunction).


dev->gd = alloc_disk(SBULL_MINORS); if (! dev->gd) { printk (KERN_NOTICE "alloc_disk failure\n"); goto out_vfree; } dev->gd->major = sbull_major; dev->gd->first_minor = which*SBULL_MINORS; dev->gd->fops = &sbull_ops; dev->gd->queue = dev->queue; dev->gd->private_data = dev; snprintf (dev->gd->disk_name, 32, "sbull%c", which + 'a'); set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE)); add_disk(dev->gd);

SBULL_MINORSisthenumberofminornumberseachsbulldevicesupports.
THE BLOCK DEVICE OPERATIONS

Theopenandreleasemethods Toimplementthesimulatedmediaremoval,sbullmustknowwhenthelastuserhasclosed thedevice.Acountofusersismaintainedbythedriver.Itisthejoboftheopenandclosemethods tokeepthatcountcurrent.Whenaninodereferstoablockdevice,thefieldi_bdev>bd_disk containsapointertotheassociatedgendiskstructure;thispointercanbeusedtogettoadrivers internaldatastructuresforthedevice.


static int sbull_open(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data;

filp->private_data = dev; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; }

Thetaskofthereleasemethodis,incontrast,todecrementtheusercountand,ifindicated, startthemediaremovaltimer:
static int sbull_release(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; }

SUPPORTING REMOVABLE MEDIA

Theblock_device_operationsstructureincludestwomethodsforsupportingremovable

media.Ifyouarewritingadriverforanonremovabledevice,youcansafelyomitthesemethods.The media_changedmethodiscalled(fromcheck_disk_change)toseewhetherthemediahasbeen changed;itshouldreturnanonzerovalueifthishashappened.


int sbull_media_changed(struct gendisk *gd) { struct sbull_dev *dev = gd->private_data; return dev->media_change; }

Therevalidatemethodiscalledafteramediachange;itsjobistodowhateverisrequiredto preparethedriverforoperationsonthenewmedia.Afterthecalltorevalidate,thekernelattempts torereadthepartitiontableandstartoverwiththedevice.Thesbullimplementationsimplyresets themedia_changeflagandzeroesoutthedevicememorytosimulatetheinsertionofablankdisk.


int sbull_revalidate(struct gendisk *gd) { struct sbull_dev *dev = gd->private_data; if (dev->media_change) { dev->media_change = 0; memset (dev->data, 0, dev->size); } return 0; }

THE IOCTL METHOD(Not used in our program, used only in real device)

Blockdevicescanprovideanioctlmethodtoperformdevicecontrolfunctions.Thesbullioctlmethod handlesonlyonecommandarequestforthedevicesgeometry:
int sbull_ioctl (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg) { long size; struct hd_geometry geo; struct sbull_dev *dev = filp->private_data; switch(cmd) { case HDIO_GETGEO: /* * Get geometry: since we are a virtual device, we have to make * up something plausible. So we claim 16 sectors, four heads, * and calculate the corresponding number of cylinders. We set the * start of data at sector four. */ size = dev->size*(hardsect_size/KERNEL_SECTOR_SIZE); geo.cylinders = (size & ~0x3f) >> 6; geo.heads = 4; geo.sectors = 16; geo.start = 4; if (copy_to_user((void __user *) arg, &geo, sizeof(geo))) return -EFAULT; return 0; } return -ENOTTY; /* unknown command */ }

Thekernelisnotconcernedwithablockdevicesgeometry;itseesitsimplyasalineararrayof sectors.
REQUEST PROCESSING

Introductiontotherequestmethod Theblockdriverrequestmethodhasthefollowingprototype:
void request(request_queue_t *queue);

Thisfunctioniscalledwheneverthekernelbelievesitistimeforyourdrivertoprocesssome reads,writes,orotheroperationsonthedevice.Therequestfunctiondoesnotneedtoactually completealloftherequestsonthequeuebeforeitreturns; indeed,itprobablydoesnotcompleteanyofthemformostrealdevices.Itmust,however,makeastart onthoserequestsandensurethattheyareall,eventually,processedbythedriver.

Everydevicehasarequestqueue.Thisisbecauseactualtransferstoandfromadiskcantake placefarawayfromthetimethekernelrequeststhem,andbecausethekernelneedstheflexibilityto scheduleeachtransferatthemostpropitiousmoment(groupingtogether,forinstance,requeststhat affectsectorsclosetogetheronthedisk).Andtherequestfunction,youmayremember,isassociated witharequestqueuewhenthatqueueiscreated.


dev->queue = blk_init_queue(sbull_request, &dev->lock);

Thus,whenthequeueiscreated,therequestfunctionisassociatedwithit.Wealsoprovideda spinlockaspartofthequeuecreationprocess.Wheneverourrequestfunctioniscalled,thatlockis heldbythekernel.Asaresult,therequestfunctionisrunninginanatomiccontext.Thequeuelock alsopreventsthekernelfromqueuinganyotherrequestsforyourdevicewhileyourrequestfunction holdsthelock.Undersomeconditions,youmaywanttoconsiderdroppingthatlockwhiletherequest functionruns.Ifyoudoso,however,youmustbesurenottoaccesstherequestqueue,oranyother datastructureprotectedbythelock ,whilethelockisnotheld.Youmustalsoreacquirethelockbefore therequestfunctionreturns.


A simple request method

Bydefault,sbullusesamethodcalledsbull_request,whichismeanttobeanexampleof

thesimplestpossiblerequestmethod.
static void sbull_full_request(request_queue_t *q) { struct request *req; int sectors_xferred; while ((req = elv_next_request(q)) != NULL) { struct sbull_dev *dev = req->rq_disk->private_data; if (! blk_fs_request(req)) { printk (KERN_NOTICE "Skip non-fs request\n"); end_request(req, 0); continue; } sectors_xferred = sbull_xfer_request(dev, req); end_request(req,1); } }

Thekernelprovidesthefunctionelv_next_requesttoobtainthefirstincompleterequeston

thequeue;thatfunctionreturnsNULLwhentherearenorequeststobeprocessed.Notethat elv_next_requestdoesnotremovetherequestfromthequeue.Ifyoucallittwicewithno

interveningoperations,itreturnsthesamerequeststructurebothtimes. Ablockrequestqueuecancontainrequeststhatdonotactuallymoveblockstoand fromadisk. Mostblockdriversdonotknowhowtohandlesuchrequestsandsimplyfailthem; sbullworksinthiswayaswell.Thecalltoblock_fs_requesttellsuswhetherwearelookingata filesystemrequestonethatmovesblocksofdata.Ifarequestisnotafilesystemrequest,wepassit toend_request:


void end_request(struct request *req, int succeeded);

Whenwedisposeofnonfilesystemrequests,wepasssucceededas0toindicatethatwedid notsuccessfullycompletetherequest.Otherwise,wecallsbull_transfertoactuallymovethedata, usingasetoffieldsprovidedintherequeststructure: sector_tsector; Theindexofthebeginningsectoronourdevice.Rememberthatthissectornumber,likeallsuch numberspassedbetweenthekernelandthedriver,isexpressedin512bytesectors.Ifyourhardware usesadifferentsectorsize,youneedtoscalesectoraccordingly. unsignedlongnr_sectors; Thenumberof(512byte)sectorstobetransferred. char*buffer; Apointertothebuffertoorfromwhichthedatashouldbetransferred.

rq_data_dir(str uctrequest*req); Thismacroextractsthedirectionofthetransferfromtherequest;azeroreturnvaluedenotesa readfromthedevice,andanonzeroreturnvaluedenotesawritetothedevice. Thesbulldrivercanimplementtheactualdatatransferwithasimplememcpycall,ourdatais alreadyinmemory.Thefunctionthatperformsthiscopyoperation(sbull_transfer)alsohandles thescalingofsectorsizesandensuresthatwedonottrytocopybeyondtheendofourvirtualdevice:


static void sbull_transfer(struct sbull_dev *dev, unsigned long sector,unsigned long nsect, char *buffer, int write)

{ unsigned long offset = sector*KERNEL_SECTOR_SIZE; unsigned long nbytes = nsect*KERNEL_SECTOR_SIZE; if ((offset + nbytes) > dev->size) { printk (KERN_NOTICE "Beyond-end write (%ld %ld)\n", offset, nbytes); return; } if (write) memcpy(dev->data + offset, buffer, nbytes); else memcpy(buffer, dev->data + offset, nbytes); }

Withthecode,sbullimplementsacomplete,simpleRAMbaseddiskdevice. sbullexecutesrequestssynchronously,oneatatime.Highperformancediskdevicesare capableofhavingnumerousrequestsoutstandingatthesametime.weprocessonlythefirstrequestin thequeue,wecanneverhavemultiplerequestsbeingfulfilledatagiventime.


REQUEST QUEUES

Queuecreationanddeletion ArequestqueueisadynamicdatastructurethatmustbecreatedbytheblockI/Osubsystem. Thefunctiontocreateandinitializearequestqueueis:


request_queue_t *blk_init_queue(request_fn_proc *request, spinlock_t *lock);

Therequestfunctionforthisqueueandaspinlockthatcontrolsaccesstothequeue. Toreturnarequestqueuetothesystem(atmoduleunloadtime,generally),call blk_cleanup_queue:


void blk_cleanup_queue(request_queue_t *);

Queueingfunctions Youmustholdthequeuelockbeforeyoucallthesefunctions. Thefunctionthatreturnsthenextrequesttoprocessiselv_next_request: str uctrequest*elv_next_request(request_queue_t*queue); ItreturnsapointertothenextrequesttoprocessorNULLifnomorerequestsremaintobe processed.elv_next_requestleavestherequestonthequeuebutmarksitasbeingactive;thismark preventstheI/Oschedulerfromattemptingtomergeotherrequestswiththisoneonceyoustartto executeit. Toactuallyremovearequestfromaqueue,useblkdev_dequeue_request:
void blkdev_dequeue_request(struct request *req);

Shouldyouneedtoputadequeuedrequestbackonthequeueforsomereason,youcancall:
void elv_requeue_request(request_queue_t *queue, struct request *req);

Queuecontrolfunctions Functionsthatcanbeusedbyadrivertocontrolhowarequestqueueoperates.These functionsinclude:


void blk_stop_queue(request_queue_t *queue); void blk_start_queue(request_queue_t *queue);

Ifyourdevicehasreachedastatewhereitcanhandlenomoreoutstandingcommands,youcan callblk_stop_queuetotelltheblocklayer.Afterthiscall,yourrequestfunctionwillnotbecalled untilyoucallblk_start_queue.


void blk_queue_bounce_limit(request_queue_t *queue, u64 dma_addr);

Functionthattellsthekernelthehighestphysicaladdresstowhichyourdevicecanperform DMA.Ifarequestcomesincontainingareferencetomemoryabovethelimit.Makeuseofthe predefinedsymbolsBLK_BOUN CE_HIGH(usebouncebuffersforhighmemorypages), BLK_BOUN CE_ISA(thedrivercanDMAonlyintothe16MBISAzone),or BLK_BOUN CE_ANY(thedrivercanperformDMAtoanyaddress).Thedefaultvalueis BLK_BOUN CE_HIGH.
void blk_queue_max_sectors(request_queue_t *queue, unsigned short max);

Usedtosetthemaximumsizeofanyrequestin(512byte)sectors;thedefaultis255.
void blk_queue_max_phys_segments(request_queue_t *queue, unsigned short max);

Howmanysegmentsyourdriverispreparedtocopewith;thismaybethesizeofastaticly allocatedscatterlist.
void blk_queue_max_hw_segments(request_queue_t *queue, unsigned short max);

Themaximumnumberofsegmentsthatthedeviceitselfcanhandle.
void blk_queue_max_segment_size(request_queue_t *queue, unsigned int max);

Howlargeanyindividualsegmentofarequestcanbeinbytes;thedefaultis65,536bytes.
blk_queue_segment_boundary(request_queue_t *queue, unsigned long mask);

Somedevicescannothandlerequeststhatcrossaparticularsizememoryboundary;ifyour deviceisoneofthose,usethisfunctiontotellthekernelaboutthatboundary.Thedefaultmaskis 0xffffffff.

void blk_queue_dma_alignment(request_queue_t *queue, int mask);

Functionthattellsthekernelaboutthememoryalignmentconstraintsyourdeviceimposeson DMAtransfers.Thedefaultmaskis0x1ff,whichcausesallrequeststobealignedon512byte boundaries.


void blk_queue_hardsect_size(request_queue_t *queue, unsigned short max);

Tellsthekernelaboutyourdeviceshardwaresectorsize.Allcommunicationsbetweenthe blocklayerandthedrivercontinuestobeexpressedin512bytesectors.
THE ANATOMY OF A REQUEST

EachrequeststructurerepresentsoneblockI/Orequest,althoughitmayhavebeenformed

throughamergerofseveralindependentrequestsatahigherlevel.Thesectorstobetransferredfor anyparticularrequestmaybedistributedthroughoutmainmemory,althoughtheyalwayscorrespond toasetofconsecutivesectorsontheblockdevice.Therequestisrepresentedasasetofsegments,each ofwhichcorrespondstooneinmemorybuffer.Thekernelmayjoinmultiplerequeststhatinvolve adjacentsectorsonthedisk ,butitnevercombinesreadandwr iteoperationswithinasinglerequest structure.Thekernelalsomakessurenottocombinerequestsiftheresultwouldviolateanyofthe requestqueuelimitsdescribedintheprevioussection. Arequeststructureisimplemented,essentially,asalinkedlistofbiostructurescombinedwithsome housekeepinginformationtoenablethedrivertokeeptrackofitsposition.Thebiostructureisa lowleveldescriptionofaportionofablockI/Orequest.
The bio structure

Thebiostructure,whichisdefinedin<linux/bio.h>,contains: Thefirst(512byte)sectortobetransferredforthisbio. Thesizeofthedatatobetransferred,inbytes.Alsousebio_sectors(bio),amacrothat

sector_t bi_sector;

unsigned int bi_size;

givesthesizeinsectors.
unsigned long bi_flags;

Asetofflagsdescribingthebio;theleastsignificantbitissetifthisisawriterequest.

unsigned short bio_phys_segments;

unsigned short bio_hw_segments;

ThenumberofphysicalsegmentscontainedwithinthisBIOandthenumberofsegmentsseen bythehardwareafterDMAmappingisdone,respectively. Thecoreofabio,however,isanarraycalledbi_io_vec,whichismadeupofthefollowing structure:


struct bio_vec { struct page *bv_page; unsigned int bv_len; unsigned int bv_offset; };

bio_for_each_segment,whichsimplyloopsthrougheveryunprocessedentryinthe bi_io_vecarray.Thismacroshouldbeusedasfollows:
int segno; struct bio_vec *bvec; bio_for_each_segment(bvec, bio, segno) { /* Do something with this segment }

bvecpointstothecurrentbio_vecentry,andsegnoisthecurrentsegmentnumber.These valuescanbeusedtosetupDMAtransfers.Ifyouneedtoaccessthepagesdirectly,youshouldfirst ensurethataproperkernelvirtualaddressexists;tothatend.


char *__bio_kmap_atomic(struct bio *bio, int i, enum km_type type); void __bio_kunmap_atomic(char *buffer, enum km_type type); static int sbull_xfer_bio(struct sbull_dev *dev, struct bio *bio) { int i; struct bio_vec *bvec; sector_t sector = bio->bi_sector; bio_for_each_segment(bvec, bio, i) { char *buffer = __bio_kmap_atomic(bio, i, KM_USER0);

sbull_transfer(dev, sector, bio_cur_sectors(bio), buffer, bio_data_dir(bio) == WRITE); sector += bio_cur_sectors(bio); __bio_kunmap_atomic(bio, KM_USER0); } return 0; }

Theblocklayeralsomaintainsasetofpointerswithinthebiostructuretokeeptrack ofthecurrentstateofrequestprocessing.
struct page *bio_page(struct bio *bio);

Returnsapointertothepagestructurerepresentingthepagetobetransferrednext.
int bio_offset(struct bio *bio);

Returnstheoffsetwithinthepageforthedatatobetransferred.
int bio_cur_sectors(struct bio *bio);

Returnsthenumberofsectorstobetransferredoutofthecurrentpage.
char *bio_data(struct bio *bio);

Returnsakernellogicaladdresspointingtothedatatobetransferred.Thisaddressis availableonlyifthepageinquestionisnotlocatedinhighmemory;callingitinothersituationsisa bug.


char *bio_kmap_irq(struct bio *bio, unsigned long *flags); void bio_kunmap_irq(char *buffer, unsigned long *flags);

bio_kmap_irqreturnsakernelvirtualaddressforanybuffer,regardlessofwhetherit residesinhighorlowmemory.Anatomickmapisused,soyourdrivercannotsleepwhilethis mappingisactive.Usebio_kunmap_irqtounmapthebuffer.


Request structure fields sector_t hard_sector;

Thefirstsectorthathasnotbeentransferredisstoredinhard_sector.Useonlywithinthe blocksubsystem;driversshouldnotmakeuseofthem.
unsigned long hard_nr_sectors;

Totalnumberofsectorsyettotransferisinhard_nr_sectors.Useonlywithintheblock subsystem;driversshouldnotmakeuseofthem.
unsigned int hard_cur_sectors;

Numberofsectorsremaininginthecurrentbioishard_cur_sectors.Useonlywithinthe blocksubsystem;driversshouldnotmakeuseofthem.
struct bio *bio;

bioisthelinkedlistofbiostructuresforthisrequest.userq_for_each_bioinstead.
unsigned short nr_phys_segments;

Thenumberofdistinctsegmentsoccupiedbythisrequestinphysicalmemoryafteradjacentpages havebeenmerged.
struct list_head queuelist;

Thelinkedliststructurethatlinkstherequestintotherequestqueue.Ifyouremovetherequest fromthequeuewithblkdev_dequeue_request,youmayusethislistheadtotracktherequestin aninternallistmaintainedbyyourdriver.

Barrier Request

Someapplicationsrequireguaranteesthatcertainoperationswillcompletebeforeothersare started.Ifthewrongoperationsarereordered,theresultcanbesevere,undetecteddatacorruption. Blocklayeraddressesthisproblemwiththeconceptofabarrierrequest.Ifarequestismarkedwith theREQ_HARDBARRERflag,itmustbewrittentothedrivebeforeanyfollowingrequestis initiated.Ifyourdriverhonorsbarrierrequests,thefirststepistoinformtheblocklayerofthisfact.


void blk_queue_ordered(request_queue_t *queue, int flag);

Toindicatethatyourdriverimplementsbarrierrequests,settheflagparametertoanonzerovalue. Theactualimplementationofbarrierrequestsissimplyamatteroftestingfortheassociated flagintherequeststructure.Amacrohasbeenprovidedtoperformthistest:


int blk_barrier_rq(struct request *req);

Ifthismacroreturnsanonzerovalue,therequestisabarrierrequest.
Nonretryable requests

Blockdriversoftenattempttoretryrequeststhatfailthefirsttime.Thisbehaviorcanleadto amorereliablesystemandhelptoavoiddataloss.Thekernel,however,sometimesmarksrequestsas notbeingretryable.Suchrequestsshouldsimplyfailasquicklyaspossibleiftheycannotbeexecuted onthefirsttry. Ifyourdriverisconsideringretryingafailedrequest,itshouldfirstmakeacallto:


int blk_noretry_request(struct request *req);

Ifthismacroreturnsanonzerovalue,yourdrivershouldsimplyaborttherequestwithan errorcodeinsteadofretryingit.
REQUEST COMPLETION FUNCTIONS

WhenyourdevicehascompletedtransferringsomeorallofthesectorsinanI/Orequest,it intend_that_request_f irst(str uctrequest*req,intsuccess,intcount); Thisfunctiontellstheblockcodethatyourdriverhasfinishedwiththetransferofcount

mustinformtheblocksubsystemwith:

sectorsstartingwhereyoulastleftoff.IftheI/Owassuccessful,passsuccessas1;otherwisepass0. Thereturnvaluefromend_that_request_f irstisanindicationofwhetherallsectorsin thisrequesthavebeentransferredornot.Areturnvalueof0meansthatallsectorshavebeen transferredandthattherequestiscomplete.Atthatpoint,youmustdequeuetherequestwith blkdev_dequeue_requestandpassitto:


void end_that_request_last(struct request *req); void end_request(struct request *req, int uptodate) { if (!end_that_request_first(req, uptodate, req->hard_cur_sectors)) { add_disk_randomness(req->rq_disk); blkdev_dequeue_request(req); end_that_request_last(req); } }

Thefunctionadd_disk_randomnessusesthetimingofblockI/Orequeststocontribute entropytothesystemsrandomnumberpool;itshouldbecalledonlyifthediskstimingistruly random.


Working with bios

Ifthesbulldriverisloadedwiththerequest_modeparametersetto1,itregistersabioaware requestfunctioninsteadofthesimplefunction.
static void sbull_full_request(request_queue_t *q) { struct request *req; int sectors_xferred; struct sbull_dev *dev = q->queuedata; while ((req = elv_next_request(q)) != NULL) { if (! blk_fs_request(req)) { printk (KERN_NOTICE "Skip non-fs request\n"); end_request(req, 0); continue; } sectors_xferred = sbull_xfer_request(dev, req); if (! end_that_request_first(req, 1, sectors_xferred)) { blkdev_dequeue_request(req); end_that_request_last(req); } } }

Thisfunctionsimplytakeseachrequest,passesittosbull_xfer_request,thencompletesitwith end_that_request_f irstand,ifnecessary,end_that_request_last.


static int sbull_xfer_request(struct sbull_dev *dev, struct request *req) { struct bio *bio; int nsect = 0; rq_for_each_bio(bio, req) { sbull_xfer_bio(dev, bio); nsect += bio->bi_size/KERNEL_SECTOR_SIZE; } return nsect; }

rq_for_each_biosimplystepsthrougheachbiostructureintherequest,givingusapointerthat wecanpasstosbull_xfer_bioforthetransfer.
static int sbull_xfer_bio(struct sbull_dev *dev, struct bio *bio) { int i; struct bio_vec *bvec; sector_t sector = bio->bi_sector; /* Do each segment independently. */

bio_for_each_segment(bvec, bio, i) { char *buffer = __bio_kmap_atomic(bio, i, KM_USER0); sbull_transfer(dev, sector, bio_cur_sectors(bio), buffer, bio_data_dir(bio) = = WRITE); sector += bio_cur_sectors(bio); __bio_kunmap_atomic(bio, KM_USER0); } return 0; /* Always "succeed" */ }

Thisfunctionsimplystepsthrougheachsegmentinthebiostructure,getsakernelvirtualaddressto accessthebuffer,thencallsthesamesbull_transferfunctionwesawearliertocopythedataover.
Block request and DMA

Ablockdrivercancertainlystepthroughthebiostructures,createaDMAmappingforeach one,andpasstheresulttothedevice.
int blk_rq_map_sg(request_queue_t *queue, struct request *req, struct scatterlist *list);

Thereturnvalueisthenumberofentriesinthelist.Thefunctionalsopassesback ,initsthird argument,ascatterlistsuitableforpassingtodma_map_sg. Yourdrivermustallocatethestorageforthescatterlistbeforecallingblk_rq_map_sg.The listmustbeabletoholdatleastasmanyentriesastherequesthasphysicalsegments;thestruct requestfieldnr_phys_segmentsholdsthatcount,maximumnumberofphysicalsegmentsspecified withblk_queue_max_phys_segments.


Doing without a request queue

Theblocklayersupportsanoqueuemodeofoperation.Tomakeuseofthismode,your drivermustprovideamakerequestfunction,ratherthanarequestfunction.
typedef int (make_request_fn) (request_queue_t *q, struct bio *bio);

Themake_requestfunctiontakesasitsmainparameterabiostructure,whichrepresentsoneor morebufferstobetransferred.Themake_requestfunctioncandooneoftwothings:itcaneither performthetransferdirectly,oritcanredirecttherequesttoanotherdevice. Sincethereisnorequeststructuretoworkwith,however,yourfunctionshouldsignal completiondirectlytothecreatorofthebiostructurewithacalltobio_endio:


void bio_endio(struct bio *bio, unsigned int bytes, int error);

bytesisthenumberofbytesyouhavetransferredsofar.Errorsareindicatedbyprovidinganonzero valuefortheer rorparameter;thisvalueisnormallyanerrorcodesuchasEIO.The make_requestshouldreturn0,regardlessofwhethertheI/Oissuccessful.

static int sbull_make_request(request_queue_t *q, struct bio *bio) { struct sbull_dev *dev = q->queuedata; int status; status = sbull_xfer_bio(dev, bio); bio_endio(bio, bio->bi_size, status); return 0; }

BLOCK DRIVER CODE


Program 1: // BLOCK DRIVER WITH REQUEST #include<linux/init.h> #include<linux/module.h> #include<linux/fs.h> #include<linux/genhd.h> #include<linux/vmalloc.h> #include<linux/blkdev.h> MODULE_LICENSE("Dual BSD/GPL"); unsigned int major=0; unsigned int minor=0; unsigned int hardsect_size=512; unsigned int nsectors=16912; unsigned int KERNEL_SECTOR_SIZE=512; static void sbull_request(request_queue_t *q); static int sbull_open(struct inode *inode, struct file *filp); static int sbull_close(struct inode *inode, struct file *filp); int i; struct block_device_operations sbull_fops = { .owner = THIS_MODULE, .open = sbull_open, .release=sbull_close, }; struct sbull_dev { int size; u8 *data; spinlock_t lock; struct request_queue *queue; struct gendisk *gd; }; struct sbull_dev dev[4]; static void sbull_transfer(struct sbull_dev *dev, unsigned long sector,unsigned long nsect, char *buffer, int write) { unsigned long offset = sector*KERNEL_SECTOR_SIZE; unsigned long nbytes = nsect*KERNEL_SECTOR_SIZE; if ((offset + nbytes) > dev->size) {

printk (KERN_NOTICE "Beyond-end write (%ld %ld)\n", offset, nbytes); return; } if (write) memcpy(dev->data + offset, buffer, nbytes); else memcpy(buffer, dev->data + offset, nbytes); } static void sbull_request(request_queue_t *q) { struct request *req; while ((req = elv_next_request(q))!= NULL) { struct sbull_dev *dev = req->rq_disk->private_data; if (!blk_fs_request(req)) { printk (KERN_NOTICE "Skip non-fs request\n"); end_request(req, 0); continue; } sbull_transfer(dev, req->sector, req->current_nr_sectors,req->buffer, rq_data_dir(req)); end_request(req, 1); } } static int sbull_setup(struct sbull_dev *dev) { memset(dev,0,sizeof(struct sbull_dev)); dev->size=nsectors*hardsect_size; dev->data=vmalloc(dev->size); if (dev->data == NULL) { printk (KERN_NOTICE "vmalloc failure.\n"); return -1; } spin_lock_init(&dev->lock); dev->queue=blk_init_queue(sbull_request, &dev->lock); dev->gd=alloc_disk(1); if (!dev->gd) { printk (KERN_NOTICE "alloc_disk failure\n"); return -1; } dev->gd->major=major; dev->gd->first_minor=minor+i; dev->gd->fops = &sbull_fops; dev->gd->queue = dev->queue; dev->gd->private_data=dev; snprintf(dev->gd->disk_name,32,"sbull%d",i); set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE)); add_disk(dev->gd); return 0; } static int sbull_init(void) { int result;

major=register_blkdev(major,"block"); if(major<0) return -1; for(i=0;i<4;i++) { result=sbull_setup(&dev[i]); if(result<0) return -1; } return 0; } static int sbull_open(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; filp->private_data = dev; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; } static int sbull_close(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; } static void sbull_exit(void) { for(i=0;i<4;i++) del_gendisk(dev[i].gd); //blk_cleanup_queue(dev.queue); unregister_blkdev(major,"block"); } module_init(sbull_init); module_exit(sbull_exit); Program 2: //BLOCK DRIVER WITH BIO REQUEST #include<linux/init.h> #include<linux/module.h> #include<linux/fs.h> #include<linux/genhd.h> #include<linux/vmalloc.h> #include<linux/blkdev.h> #include<linux/bio.h> MODULE_LICENSE("Dual BSD/GPL"); unsigned int major=0; unsigned int minor=0; unsigned int hardsect_size=512; unsigned int nsectors=16912; unsigned int KERNEL_SECTOR_SIZE=512; static int open1(struct inode *inode, struct file *filp); static int close1(struct inode *inode, struct file *filp); static void sbull_full_request(request_queue_t *q);

struct block_device_operations sbull_fops = { .owner = THIS_MODULE, .open = open1, .release=close1, }; struct sbull_dev { int size; u8 *data; spinlock_t lock; struct request_queue *queue; struct gendisk *gd; }; struct sbull_dev dev; static void sbull_transfer(struct sbull_dev *dev, unsigned long sector,unsigned long nsect, char *buffer, int write) { unsigned long offset = sector*KERNEL_SECTOR_SIZE; unsigned long nbytes = nsect*KERNEL_SECTOR_SIZE; if ((offset + nbytes) > dev->size) { printk (KERN_NOTICE "Beyond-end write (%ld %ld)\n", offset, nbytes); return; } if (write) memcpy(dev->data + offset, buffer, nbytes); else memcpy(buffer, dev->data + offset, nbytes); } static int sbull_xfer_bio(struct sbull_dev *dev, struct bio *bio) { int i; struct bio_vec *bvec; sector_t sector = bio->bi_sector; bio_for_each_segment(bvec, bio, i) { char *buffer = __bio_kmap_atomic(bio, i, KM_USER0); sbull_transfer(dev, sector, bio_cur_sectors(bio), buffer, bio_data_dir(bio) == WRITE); sector += bio_cur_sectors(bio); __bio_kunmap_atomic(bio, KM_USER0); } return 0; } static int sbull_xfer_request(struct sbull_dev *dev, struct request *req) { struct bio *bio; int nsect = 0; rq_for_each_bio(bio, req) { sbull_xfer_bio(dev, bio); nsect += bio->bi_size/KERNEL_SECTOR_SIZE; }

return nsect; } static void sbull_full_request(request_queue_t *q) { struct request *req; int sectors_xferred; while ((req = elv_next_request(q)) != NULL) { struct sbull_dev *dev = req->rq_disk->private_data; if (! blk_fs_request(req)) { printk (KERN_NOTICE "Skip non-fs request\n"); end_request(req, 0); continue; } sectors_xferred = sbull_xfer_request(dev, req); end_request(req,1); } } static int sbull_setup(struct sbull_dev *dev) { memset(dev,0,sizeof(struct sbull_dev)); dev->size=nsectors*hardsect_size; dev->data=vmalloc(dev->size); if (dev->data == NULL) { printk (KERN_NOTICE "vmalloc failure.\n"); return -1; } spin_lock_init(&dev->lock); dev->queue=blk_init_queue(sbull_full_request, &dev->lock); //sbull makes its queue dev->gd=alloc_disk(1); if (!dev->gd) { printk (KERN_NOTICE "alloc_disk failure\n"); return -1; } dev->gd->major=major; dev->gd->first_minor=minor; dev->gd->fops = &sbull_fops; dev->gd->queue = dev->queue; dev->gd->private_data=dev; snprintf(dev->gd->disk_name,32,"sbull1"); set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE)); add_disk(dev->gd); return 0; } static int hello(void) { int result; major=register_blkdev(major,"block"); if(major<0) return -1; result=sbull_setup(&dev); if(result<0)

return -1; return 0; } static int open1(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; filp->private_data = dev; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; } static int close1(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; } static void exit1(void) { del_gendisk(dev.gd); //blk_cleanup_queue(dev.queue); unregister_blkdev(major,"block"); } module_init(hello); module_exit(exit1); Program 3: //BLOCK DRIVER WITHOUT BIO REQUEST #include<linux/init.h> #include<linux/module.h> #include<linux/fs.h> #include<linux/genhd.h> #include<linux/vmalloc.h> #include<linux/blkdev.h> #include<linux/bio.h> MODULE_LICENSE("Dual BSD/GPL"); unsigned int major=0; unsigned int minor=0; unsigned int hardsect_size=512; unsigned int nsectors=16912; unsigned int KERNEL_SECTOR_SIZE=512; static int open1(struct inode *inode, struct file *filp); static int close1(struct inode *inode, struct file *filp); struct block_device_operations sbull_fops = { .owner = THIS_MODULE, .open = open1, .release=close1, }; struct sbull_dev {

int size; u8 *data; spinlock_t lock; struct request_queue *queue; struct gendisk *gd; }; struct sbull_dev dev; static void sbull_transfer(struct sbull_dev *dev, unsigned long sector,unsigned long nsect, char *buffer, int write) { unsigned long offset = sector*KERNEL_SECTOR_SIZE; unsigned long nbytes = nsect*KERNEL_SECTOR_SIZE; if ((offset + nbytes) > dev->size) { printk (KERN_NOTICE "Beyond-end write (%ld %ld)\n", offset, nbytes); return; } if (write) memcpy(dev->data + offset, buffer, nbytes); else memcpy(buffer, dev->data + offset, nbytes); } static int sbull_xfer_bio(struct sbull_dev *dev, struct bio *bio) { int i; struct bio_vec *bvec; sector_t sector = bio->bi_sector; bio_for_each_segment(bvec, bio, i) { char *buffer = __bio_kmap_atomic(bio, i, KM_USER0); sbull_transfer(dev, sector, bio_cur_sectors(bio), buffer, bio_data_dir(bio) == WRITE); sector += bio_cur_sectors(bio); __bio_kunmap_atomic(bio, KM_USER0); } return 0; } static int sbull_make_request(request_queue_t *q, struct bio *bio) { struct sbull_dev *dev = q->queuedata; int status; status = sbull_xfer_bio(dev, bio); bio_endio(bio, bio->bi_size, status); return 0; } static int sbull_setup(struct sbull_dev *dev) { memset(dev,0,sizeof(struct sbull_dev)); dev->size=nsectors*hardsect_size; dev->data=vmalloc(dev->size); if (dev->data == NULL) { printk (KERN_NOTICE "vmalloc failure.\n"); return -1; } spin_lock_init(&dev->lock);

dev->queue=blk_alloc_queue(GFP_KERNEL); if(dev->queue==NULL) return -1; blk_queue_make_request(dev->queue,sbull_make_request); dev->gd=alloc_disk(1); if (!dev->gd) { printk (KERN_NOTICE "alloc_disk failure\n"); return -1; } dev->queue->queuedata=dev; dev->gd->major=major; dev->gd->first_minor=minor; dev->gd->fops = &sbull_fops; dev->gd->queue = dev->queue; dev->gd->private_data=dev; snprintf(dev->gd->disk_name,32,"sbull1"); set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE)); add_disk(dev->gd); return 0; } static int hello(void) { int result; major=register_blkdev(major,"block"); if(major<0) return -1; result=sbull_setup(&dev); if(result<0) return -1; return 0; } static int open1(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; filp->private_data = dev; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; } static int close1(struct inode *inode, struct file *filp) { struct sbull_dev *dev = inode->i_bdev->bd_disk->private_data; spin_lock(&dev->lock); spin_unlock(&dev->lock); return 0; } static void exit1(void) { del_gendisk(dev.gd); //blk_cleanup_queue(dev.queue); unregister_blkdev(major,"block"); } module_init(hello); module_exit(exit1);

CHAPTER 17 :

NETWORK DRIVER
Ablockdeviceregistersitsdisksandmethodswiththekernel,andthentransmitsand receivesblocksonrequest,bymeansofitsrequestfunction.Similarly,anetworkinterfacemust registeritselfwithinspecifickerneldatastructuresinordertobeinvokedwhenpacketsareexchanged withtheoutsideworld. Adiskexistsasaspecialfileinthe/devdirectory,whereasanetworkinterfacehasnosuch entrypoint.Thenormalfileoperations(read,write,andsoon)donotmakesensewhenappliedto networkinterfaces.Networkinterfacesexistintheirownnamespaceandexportadifferentsetof operations. Blockdriversoperateonlyinresponsetorequestsfromthekernel,whereasnetworkdrivers receivepacketsasynchronouslyfromtheoutside.Thus,whileablockdriverisaskedtosendabuffer towardthekernel,thenetworkdeviceaskstopushincomingpacketstowardthekernel. ThenetworkinterfacesfitinwiththerestoftheLinuxkernelandprovidesexamplesinthe formofamemorybasedmodularizednetworkinterface,whichiscalled(youguessedit)snull.The interfaceusestheEthernethardwareprotocolandtransmitsIPpackets. Thetermoctettorefertoagroupofeightbits,whichisgenerallythesmallestunitunderstood bynetworkingdevicesandprotocols. Aheaderisasetofbytes(err,octets)prependedtoapacketasitispassedthroughthevarious layersofthenetworkingsubsystem.WhenanapplicationsendsablockofdatathroughaTCPsocket, thenetworkingsubsystembreaksthatdataupintopacketsandputsaTCPheader,describingwhere eachpacketfitswithinthestream,atthebeginning.ThelowerlevelsthenputanIPheader,usedto routethepackettoitsdestination,infrontoftheTCPheader.IfthepacketmovesoveranEthernet likemedium,anEthernetheader,interpretedbythehardware,goesinfrontoftherest.Network

driversneednotconcernthemselveswithhigherlevelheaders(usually),buttheyoftenmustbe involved inthecreationofthehardwarelevelheader. HOW snull IS DESIGNED Sampleinterfacesshouldremainindependentofrealhardware. snullisnotaloopbackinterface;however,itsimulatesconversationswithrealremotehosts snullisthatitsupportsonlyIPtraffic.Thisisaconsequenceoftheinternalworkingsofthe interfacesnullhastolookinsideandinterpretthepacketstoproperlyemulateapairofhardware interfaces.
ASSIGNING IP NUMBERS

inordertobetterdemonstratethetaskofwritinganetworkdriver.

Thesnullmodulecreatestwointerfaces.Theseinterfacesaredifferentfromasimpleloopback ,

inthatwhateveryoutransmitthroughoneoftheinterfacesloopsbacktotheotherone,nottoitself. Itlookslikeyouhavetwoexternallinks,butactuallyyourcomputerisreplyingtoitself. ThiseffectcantbeaccomplishedthroughIPnumberassignmentsalone,becausethekernel Tobeabletoestablishacommunicationthroughthesnullinterfaces,thesourceand destinationaddressesneedtobemodifiedduringdatatransmission.


CONNECTING TO THE KERNEL
Device registration

wouldntsendoutapacketthroughinterfaceAthatwasdirectedtoitsowninterfaceB.

Thedrivershouldprobeforitsdeviceanditshardwarelocation(I/OportsandIRQline) butnotregisterthem.Thewayanetworkdriverisregisteredbyitsmoduleinitializationfunctionis differentfromcharandblockdrivers.Sincethereisnoequivalentofmajorandminornumbersfor networkinterfaces,anetworkdriverdoesnotrequestsuchanumber.Instead,thedriverinsertsadata structureforeachnewlydetectedinterfaceintoagloballistofnetworkdevices. Eachinterfaceisdescribedbyastructnet_deviceitem,whichisdefinedin <linux/netdevice.h>.Thesnulldriverkeepspointerstotwoofthesestructures(forsn0and sn1)inasimplearray:


struct net_device *snull_devs[2];

Thenet_devicestructure,likemanyotherkernelstructures,containsakobjectandis, therefore,referencecountedandexportedviasysfs.Aswithothersuchstructures,itmustbe

allocateddynamically.Thekernelfunctionprovidedtoperformthisallocationisalloc_netdev, whichhasthefollowingprototype: struct net_device *alloc_netdev(int sizeof_priv,const char *name,

void (*setup)(struct net_device *));

Here,sizeof_pr ivisthesizeofthedriversprivatedataarea;withnetworkdevices, thatareaisallocatedalongwiththenet_devicestructure.Infact,thetwoareallocatedtogetherin onelargechunkofmemory,butdriverauthorsshouldpretendthattheydontknowthat.nameisthe nameofthisinterface,asisseenbyuserspace;thisnamecanhaveaprintfstyle%dinit.Thekernel replacesthe%dwiththenextavailableinterfacenumber.Finally,setupisapointertoan initializationfunctionthatiscalledtosetuptherestofthenet_devicestructure. snullallocatesitstwodevicestructuresinthisway:


snull_devs[0] = alloc_netdev(sizeof(struct snull_priv), "sn%d", snull_init); snull_devs[1] = alloc_netdev(sizeof(struct snull_priv), "sn%d", snull_init); if (snull_devs[0] = = NULL || snull_devs[1] = = NULL) goto out;

Asalways,wemustcheckthereturnvaluetoensurethattheallocationsucceeded. Thenetworkingsubsystemprovidesanumberofhelperfunctionswrappedaround alloc_netdevforvarioustypesofinterfaces.Themostcommonisalloc_etherdev,whichis definedin <linux/etherdevice.h>:


struct net_device *alloc_etherdev(int sizeof_priv);

Thisfunctionallocatesanetworkdeviceusingeth%dforthenameargument.Itprovidesits owninitializationfunction(ether_setup)thatsetsseveralnet_devicefieldswithappropriate valuesforEthernetdevices.Thus,thereisnodriversuppliedinitializationfunctionfor alloc_etherdev;thedrivershouldsimplydoitsrequiredinitializationdirectlyafterasuccessful allocation. snullcouldusealloc_etherdevwithouttrouble;wechosetousealloc_netdevinstead. Oncethenet_devicestructurehasbeeninitialized,completingtheprocessisjustamatterof passingthestructuretoregister_netdev.Insnull,thecalllooksasfollows:


for (i = 0; i < 2; i++) if ((result = register_netdev(snull_devs[i]))) printk("snull: error %i registering device \"%s\"\n", result, snull_devs[i]->name);

Assoonasyoucallregister_netdev,yourdrivermaybecalledtooperateonthedevice. Thus,weshouldnotregisterthedeviceuntileverythinghasbeencompletelyinitialized.
Initializing Each Device

wepassedovertheintermediatestepofcompletelyinitializingthatstructure.Notethatstruct net_deviceisalwaysputtogetheratruntime;itcannotbesetupatcompiletimeinthesame mannerasaf ile_operationsorblock_device_operationsstructure.Thisinitializationmust becompletebeforecallingregister_netdev.Thenet_devicestructureislargeandcomplicated. ThekerneltakescareofsomeEthernetwidedefaultsthroughtheether_setupfunction(calledby alloc_etherdev). Sincesnullusesalloc_netdev,ithasaseparateinitializationfunction.Thecoreofthis function(snull_init)isasfollows:


ether_setup(dev); /* assign some of the fields */ dev->open = snull_open; dev->stop = snull_release; dev->set_config = snull_config; dev->hard_start_xmit = snull_tx; dev->do_ioctl = snull_ioctl; dev->get_stats = snull_stats; dev->rebuild_header = snull_rebuild_header; dev->hard_header = snull_header; dev->tx_timeout = snull_tx_timeout; dev->watchdog_timeo = timeout; /* keep the default flags, just add NOARP */ dev->flags |= IFF_NOARP; dev->features |= NETIF_F_NO_CSUM; dev->hard_header_cache = NULL; /* Disable caching */

Theabovecodeisafairlyroutineinitializationofthenet_devicestructure;itismostlya matterofstoringpointerstoourvariousdriverfunctions.Thesingleunusualfeatureofthecodeis settingIFF_NOARPintheflags.ThisspecifiesthattheinterfacecannotusetheAddressResolution Protocol(ARP).ARPisalowlevelEthernetprotocol;itsjobistoturnIPaddressesintoEthernet mediumaccesscontrol(MAC)addresses.Theassignmenttohard_header_cacheisthereforasimilar reason:itdisablesthecachingofthe(nonexistent)ARPrepliesonthisinterface. Theinitializationcodealsosetsacoupleoffields(tx_timeoutandwatchdog_timeo)that relatetothehandlingoftransmissiontimeouts.net_devicefield,privroleissimilartothatofthe pr ivate_datapointerthatweusedforchardrivers.Unlikefops>private_data,thispriv pointerisallocatedalongwiththenet_devicestructure.Whenadriverneedstogetaccesstothe privatedatapointer,itshouldusethenetdev_pr ivfunction.
struct snull_priv *priv = netdev_priv(dev);

Thesnullmoduledeclaresasnull_pr ivdatastructuretobeusedforpriv:
struct snull_priv {

struct net_device_stats stats; int status; struct snull_packet *ppool; struct snull_packet *rx_queue; int rx_int_enabled; int tx_packetlen; u8 *tx_packetdata; struct sk_buff *skb; spinlock_t lock; };

Thestructureincludes,amongotherthings,aninstanceofstructnet_device_stats, whichisthestandardplacetoholdinterfacestatistics.Thefollowinglinesinsnull_initallocate andinitializedev>pr iv:


priv = netdev_priv(dev); memset(priv, 0, sizeof(struct snull_priv)); spin_lock_init(&priv->lock); snull_rx_ints(dev, 1); /* enable receive interrupts */ Module Unloading

Themodulecleanupfunctionsimplyunregisterstheinterfaces,performswhateverinternal cleanupisrequired,andreleasesthenet_devicestructurebacktothesystem:
void snull_cleanup(void) { int i; for (i = 0; i < 2; i++) { if (snull_devs[i]) { unregister_netdev(snull_devs[i]); snull_teardown_pool(snull_devs[i]); free_netdev(snull_devs[i]); } } return; }

Thecalltounregister_netdevremovestheinterfacefromthesystem;free_netdevreturns thenet_devicestructuretothekernel.Ifareferencetothatstructureexistssomewhere,itmay continuetoexist,butyourdriverneednotcareaboutthat.Onceyouhaveunregisteredtheinterface, thekernelnolongercallsitsmethods. Ourinternalcleanup(doneinsnull_teardown_pool)cannothappenuntilthedevicehas beenunregistered.Itmust,however,happenbeforewereturnthenet_devicestructuretothe system;oncewehavecalledfree_netdev,wecannotmakeanyfurtherreferencestothedeviceorour privatearea.

THE net_device STRUCTURE IN DETAIL


Global Information

Thefirstpartofstructnet_deviceiscomposedofthefollowingfields:

char name[IFNAMSIZ];

Thenameofthedevice.Ifthenamesetbythedrivercontainsa%dformatstring, register_netdevreplacesitwithanumbertomakeauniquename;assignednumbersstartat0.
unsigned long state;

Devicestate.Thefieldincludesseveralflags. Pointertothenextdeviceinthegloballinkedlist.Thisfieldshouldntbetouchedbythedriver. Aninitializationfunction.Ifthispointerisset,thefunctioniscalledbyregister_netdevto Thefollowingfieldscontainlowlevelhardwareinformationforrelativelysimpledevices.

struct net_device *next;

int (*init)(struct net_device *dev);

completetheinitializationofthenet_devicestructure.Driversdonotusethisfunctionanylonger.
Hardware Information

unsigned long rmem_end; unsigned long rmem_start; unsigned long mem_end; unsigned long mem_start;

Devicememoryinformation.Thesefieldsholdthebeginningandendingaddressesofthe

sharedmemoryusedbythedevice.Ifthedevicehasdifferentreceiveandtransmitmemories,themem fieldsareusedfortransmitmemoryandthermemfieldsforreceivememory.
unsigned long base_addr;

TheI/Obaseaddressofthenetworkinterface.Thisfieldassignedbythedriverduringthe

deviceprobe.Theifconf igcommandcanbeusedtodisplayormodifythecurrentvalue.The base_addrcanbeexplicitlyassignedonthekernelcommandlineatsystemboot(viathenetdev= parameter)oratmoduleloadtime.


unsigned char irq;

Theassignedinterruptnumber.Thevalueofdev>irqisprintedbyifconf igwhen

interfacesarelisted.Thisvaluecanusuallybesetatbootorloadtimeandmodifiedlaterusing ifconf ig.

unsigned char if_port;

Theportinuseonmultiportdevices.Thisfieldisused,withdevicesthatsupportbothcoaxial (IF_PORT_10BASE2)andtwistedpair(IF_PORT_100BASET)Ethernetconnections.
unsigned char dma;

TheDMAchannelallocatedbythedevice.Thefieldmakessenseonlywithsome peripheralbuses,suchasISA.
Interface Information

Mostoftheinformationabouttheinterfaceiscorrectlysetupbytheether_setupfunction .Ethernetcardscanrelyonthisgeneralpurposefunctionformostofthesefields,buttheflagsand dev_addrfieldsaredevicespecificandmustbeexplicitlyassignedatinitializationtime.


void ltalk_setup(struct net_device *dev);

SetsupthefieldsforaLocalTalkdevice
void fc_setup(struct net_device *dev);

Initializesfieldsforfiberchanneldevices.
void fddi_setup(struct net_device *dev);

ConfiguresaninterfaceforaFiberDistributedDataInterface(FDDI)network.
void tr_setup(struct net_device *dev);

Handlessetupfortokenringnetworkinterfaces. *Ifyoursissomethingradicallynewanddifferent,however,youneedtoassignthefollowing fieldsbyhand:


unsigned short hard_header_len;

Thehardwareheaderlength.Thenumberofoctetsthatleadthetransmittedpacketbeforethe IPheader,orotherprotocolinformation.Thevalueofhard_header_lenis14(ETH_HLEN)for Ethernetinterfaces.


unsigned mtu;

Themaximumtransferunit(MTU).Thisfieldisusedbythenetworklayertodrivepacket transmission.EthernethasanMTUof1500octets(ETH_DATA_LEN).Thisvaluecanbe changedwithifconf ig.


unsigned long tx_queue_len;

Themaximumnumberofframesthatcanbequeuedonthedevicestransmissionqueue.
unsigned short type;

Thehardwaretypeoftheinterface.ThepropervalueforEthernetinterfacesis ARPHRD_ETHER,andthatisthevaluesetbyether_setup.
unsigned char addr_len;

Hardware(MAC)addresslengthanddevicehardwareaddresses.(6Octetsforethernet).
unsigned char broadcast[MAX_ADDR_LEN];

Thebroadcastaddressismadeupofsix0xffoctets;ether_setuparrangesforthesevalues tobecorrect.
unsigned char dev_addr[MAX_ADDR_LEN];

Thedeviceaddressmustbereadfromtheinterfaceboardinadevicespecificway,andthe drivershouldcopyittodev_addr.ThehardwareaddressisusedtogeneratecorrectEthernet headersbeforethepacketishandedovertothedriverfortransmission.


unsigned short flags; int features;

flagsfieldisabitmaskincludingthefollowingbitvalues. Thevalidflags,whicharedefinedin<linux/if.h>,are:
IFF_UP

Thisflagisreadonlyforthedriver.Thekernelturnsitonwhentheinterfaceisactiveandreadyto transferpackets.
IFF_BROADCAST

Thisflagstatesthattheinterfaceallowsbroadcasting.Ethernetboardsdo.
IFF_DEBUG

Thismarksdebugmode.
IFF_LOOPBACK

Thisflagshouldbesetonlyintheloopbackinterface.
IFF_POINTOPOINT

Thisflagsignalsthattheinterfaceisconnectedtoapointtopointlink.Setbythedriver.

IFF_NOARP

ThismeansthattheinterfacecantperformARP.
IFF_MULTICAST

Thisflagissetbydriverstomarkinterfacesthatarecapableofmulticasttransmission.

ether_setupsetsIFF_MULTICASTbydefault,soifyourdriverdoesnotsupportmulticast,it mustcleartheflagatinitializationtime.
IFF_ALLMULTI

Thisflagtellstheinterfacetoreceiveallmulticastpackets. Theseflagsareallwaysoftellingthekernelthatitneednotapplychecksumstosomeorall

packetsleavingthesystembythisinterface.
NETIF_F_IP_CSUM

ifyourinterfacecanchecksumIPpacketsbutnotothers.
NETIF_F_NO_CSUM

Ifnochecksumsareeverrequiredforthisinterface.
NETIF_F_HW_CSUM

Sincepacketsareonlytransferredthroughsystemmemory,thereisnoopportunityforthemto becorrupted,andnoneedtocheckthem.Ifyourhardwaredoeschecksummingitself,set
NETIF_F_HW_CSUM. NETIF_F_HIGHDMA

SetthisflagifyourdevicecanperformDMAtohighmemory.Intheabsenceofthisflag,all

packetbuffersprovidedtoyourdriverareallocatedinlowmemory.
The device methods

Devicemethodsforanetworkinterfacecanbedividedintotwogroups:fundamentaland optional.Fundamentalmethodsincludethosethatareneededtobeabletousetheinterface;optional methodsimplementmoreadvancedfunctionalitiesthatarenotstrictlyrequired.


int (*open)(struct net_device *dev);

Openstheinterface.Theopenmethodshouldregisteranysystemresourceitneeds,turnon thehardware,andperformanyothersetupyourdevicerequires.
int (*stop)(struct net_device *dev);

Stopstheinterface.Theinterfaceisstoppedwhenitisbroughtdown.Thisfunctionshould reverseoperationsperformedatopentime.
int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev);

Methodthatinitiatesthetransmissionofapacket.Thefullpacket(protocolheadersandall)is containedinasocketbuffer(sk_buff)structure.Itsjobistoorganizetheinformationpassedtoitas argumentsintoanappropriate,devicespecifichardwareheader.


int (*rebuild_header)(struct sk_buff *skb);

FunctionusedtorebuildthehardwareheaderafterARPresolutioncompletesbutbeforea packetistransmitted.
void (*tx_timeout)(struct net_device *dev);

Whenapackettransmissionfailstocompletewithinareasonableperiod,ontheassumption thataninterrupthasbeenmissedortheinterfacehaslockedup.Itshouldhandletheproblemand resume packettransmission.


struct net_device_stats *(*get_stats)(struct net_device *dev);

Wheneveranapplicationneedstogetstatisticsfortheinterface,thismethodiscalled.
int (*set_config)(struct net_device *dev, struct ifmap *map);

Changestheinterfaceconfiguration.Thismethodistheentrypointforconfiguringthedriver. TheI/Oaddressforthedeviceanditsinterruptnumbercanbechangedatruntimeusing set_conf ig. Theremainingdeviceoperationsareoptional:


int weight; int (*poll)(struct net_device *dev; int *quota);

Operatetheinterfaceinapolledmode,withinterruptsdisabled.
void (*poll_controller)(struct net_device *dev);

Functionthatasksthedrivertocheckforeventsontheinterfaceinsituationswhere interruptsaredisabled.
int (*do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);

Performsinterfacespecificioctlcommands.Thecorrespondingfieldinstructnet_devicecan beleftasNULLiftheinterfacedoesntneedanyinterfacespecificcommands.

void (*set_multicast_list)(struct net_device *dev);

Methodcalledwhenthemulticastlistforthedevicechangesandwhentheflagschange.
int (*set_mac_address)(struct net_device *dev, void *addr);

Functionthatcanbeimplementediftheinterfacesupportstheabilitytochangeitshardware address.
eth_mac_addr

Onlycopiesthenewaddressintodev>dev_addr,anditdoessoonlyiftheinterfaceisnot

running.Driversthatuseeth_mac_addrshouldsetthehardwareMACaddressfromdev >dev_addrintheiropenmethod.
int (*change_mtu)(struct net_device *dev, int new_mtu);

Functionthattakesactionifthereisachangeinthemaximumtransferunit(MTU) fortheinterface.
Utility Fields

Usedbytheinterfacetoholdusefulstatusinformation.Someofthefieldsareusedbyifconfig andnetstattoprovidetheuserwithinformationaboutthecurrentconfiguration.
unsigned long trans_start; unsigned long last_rx;

Thetrans_star tvalueisusedbythenetworkingsubsystemtodetecttransmitterlockups. last_rxiscurrentlyunused.


int watchdog_timeo;

Theminimumtime(injiffies)thatshouldpassbeforethenetworkinglayerdecidesthata transmissiontimeouthasoccurredandcallsthedriverstx_timeoutfunction.
void *priv;

Theequivalentoff ilp>pr ivate_data.Thisfieldissetbyalloc_netdevandshouldnot beaccesseddirectly;usenetdev_pr ivinstead.


struct dev_mc_list *mc_list; int mc_count;

Fieldsthathandlemulticasttransmission.mc_countisthecountofitemsinmc_list.
spinlock_t xmit_lock; int xmit_lock_owner;

Thexmit_lockisusedtoavoidmultiplesimultaneouscallstothedrivers

hard_start_xmitfunction.xmit_lock_owneristhenumberoftheCPUthathasobtained xmit_lock.

CHAPTER 12:

PCI DRIVERS
THE PCI INTERFACE

PCIisactuallyacompletesetofspecificationsdefininghowdifferentpartsofacomputer shouldinteract.PCIisahighspeedbususedforcommunicationbetweentheCPUandI/Odevices. ThePCIspecificationenablestransferof32bitsofdatainparallelat33MHzor66MHz,yieldinga peakthroughputof266MBps. ThePCIarchitecturewasdesignedasareplacementfortheISAstandard,withthreemain goals:togetbetterperformancewhentransferringdatabetweenthecomputeranditsperipherals,to beasplatformindependentaspossible,andtosimplifyaddingandremovingperipheralstothe system. PCIssupportforautodetectionofinterfaceboards.PCIdevicesarejumperlessandare automaticallyconfiguredatboottime.Then,thedevicedrivermustbeabletoaccessconfiguration informationinthedeviceinordertocompleteinitialization.Thishappenswithouttheneedto performanyprobing.
PCI ADDRESSING

PCIdevicesareaddressedusingbus,device,andfunctionnumbers,andtheyareidentifiedvia vendorIDs,deviceIDs,andclasscodes.Pluggingmorethanonebusinasinglesystemisaccomplished bymeansofbridges,specialpurposePCIperipheralswhosetaskisjoiningtwobuses. CodeView:


bash>lspci 00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02) ... 02:00.0 CardBus bridge: Texas Instruments PCI4510 PC card Cardbus Controller (rev 03) ... 03:00.0 Ethernet controller: Xircom Cardbus Ethernet 10/100 (rev 03)

03:00.1 Serial controller: Xircom Cardbus Ethernet + 56k Modem (rev 03)

Considerthetuple(XX:YY.Z)atthebeginningofeachentryintheprecedingoutput.XX standsforthePCIbusnumber.APCIdomaincanhostupto256buses.Inthelaptopused previously,theCardBusbridgeisconnectedtoPCIbus2.ThisbridgesourcesanotherPCIbus numbered3thathoststheXircomcard. YYisthePCIdevicenumber.Eachbuscanconnecttoamaximumof32PCIdevices.Each devicecan,inturn,implementuptoeightfunctionsrepresentedbyZ.TheXircomcardcan simultaneouslyperformtwofunctions.Thus,03:00.0addressestheEthernetfunctionofthecard, while03:00.1correspondstoitsmodemcommunicationfunction.Issuelspcittoelicitatreelike layoutofthePCIbusesanddevicesonyoursystem: Thehardwarecircuitryofeachperipheralboardanswersqueriespertainingtothreeaddress spaces:memorylocations,I/Oports,andconfigurationregisters.Thefirsttwoaddressspacesare sharedbyallthedevicesonthesamePCIbus(i.e.,whenyouaccessamemorylocation,allthedevices onthatPCIbusseethebuscycleatthesametime).Theconfigurationspace,ontheotherhand, exploitsgeographicaladdressing.Configurationqueriesaddressonlyoneslotatatime,sotheynever collide.

PCIdevicespossessa256bytememoryregionthatholdsconfigurationregisters.Thisspace isthekeytoidentifythemakeandcapabilitiesofPCIcards.Let'stakeapeekinsidetheconfiguration spacesoftheCardBuscontrollerandtheXircomdualfunctioncardpreviouslyused.TheXircomcard hastwoconfigurationspaces,onepersupportedfunction: CodeView:


bash> lspci x ... 03:00.0 Ethernet controller: Xircom Cardbus Ethernet 10/100 (rev 03) 00: 5d 11 03 00 07 00 10 02 03 00 00 02 00 40 80 00 10: 01 30 00 00 00 00 00 d2 00 08 00 d2 00 00 00 00 20: 00 00 00 00 00 00 00 00 07 01 00 00 5d 11 81 11 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 14 28 03:00.1 Serial controller: Xircom Cardbus Ethernet + 56k Modem (rev 03) 00: 5d 11 03 01 03 00 10 02 03 02 00 07 00 00 80 00 10: 81 30 00 00 00 10 00 d2 00 18 00 d2 00 00 00 00 20: 00 00 00 00 00 00 00 00 07 02 00 00 5d 11 81 11 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 00 00

ThefirsttwobytescontainthevendorID,whichidentifiesthecompanythatmanufactured thecard.PCIvendorIDsaremaintainedandassignedglobally.
CONFIGURATION REGISTERS AND INITIALISATION

TheconfigurationregistersthatPCIdevicescontain.AllPCIdevicesfeatureatleasta256 byteaddressspace.Thefirst64bytesarestandardized,whiletherestaredevicedependent.

SomeofthePCIconfigurationregistersarerequiredandsomeareoptional.EveryPCIdevice

mustcontainmeaningfulvaluesintherequiredregisters,whereasthecontentsoftheoptional registersdependontheactualcapabilitiesoftheperipheral.Theoptionalfieldsarenotusedunlessthe contentsoftherequiredfieldsindicatethattheyarevalid. ThePCIregistersarealwayslittleendian.Usually,thetechnicaldocumentationreleased witheachdevicedescribesthesupportedregisters.Whatwereinterestedinishowadrivercanlook foritsdeviceandhowitcanaccessthedevicesconfigurationspace. ThreeorfivePCIregistersidentifyadevice:vendorID,deviceID,andclassarethethree thatarealwaysused.EveryPCImanufacturerassignspropervaluestothesereadonlyregisters,and thedrivercanusethemtolookforthedevice.Additionally,thefieldssubsystemvendorIDand subsystemdeviceIDaresometimessetbythevendortofurtherdifferentiatesimilardevices.
vendorID

This16bitregisteridentifiesahardwaremanufacturer.
deviceID

Thisisanother16bitregister,selectedbythemanufacturer;noofficialregistrationisrequired forthedeviceID.ThisIDisusuallypairedwiththevendorIDtomakeaunique32bitidentifierfor ahardwaredevice.WeusethewordsignaturetorefertothevendoranddeviceIDpair.Adevice driver usuallyreliesonthesignaturetoidentifyitsdevice.


class

Everyperipheraldevicebelongstoaclass.Theclassregisterisa16bitvaluewhosetop8bits identifythebaseclass(orgroup).Forexample,ethernetandtokenringaretwoclasses belongingtothenetworkgroup,whiletheserialandparallelclassesbelongtothe communicationgroup.


subsystem vendorID subsystem deviceID

Thesefieldscanbeusedforfurtheridentificationofadevice. Usingthesedifferentidentifiers,aPCIdrivercantellthekernelwhatkindofdevicesit supports.Thestr uctpci_device_idstructureisusedtodefinealistofthedifferenttypesofPCI devicesthatadriversupports.Thisstructurecontainsthefollowingfields:


__u32 vendor; __u32 device;

ThesespecifythePCIvendoranddeviceIDsofadevice.Ifadrivercanhandleanyvendoror deviceID,thevaluePCI_ANY_IDshouldbeusedforthesefields.

__u32 subvendor; __u32 subdevice;

ThesespecifythePCIsubsystemvendorandsubsystemdeviceIDsofadevice.Ifadrivercan

handleanytypeofsubsystemID,thevaluePCI_ANY_IDshouldbeusedforthesefields.
__u32 class; __u32 class_mask;

ThesetwovaluesallowthedrivertospecifythatitsupportsatypeofPCIclassdevice.The

differentclassesofPCIdevicesaredescribedinthePCIspecification.Ifadrivercanhandleanytype ofsubsystemID,thevaluePCI_ANY_IDshouldbeusedforthesefields.
kernel_ulong_t driver_data;

ThisvalueisnotusedtomatchadevicebutisusedtoholdinformationthatthePCIdriver canusetodifferentiatebetweendifferentdevicesifitwantsto.
PCI_DEVICE(vendor, device)

Thiscreatesastr uctpci_device_idthatmatchesonlythespecificvendoranddeviceID.The macrosetsthesubvendorandsubdevicefieldsofthestructuretoPCI_ANY_ID.


PCI_DEVICE_CLASS(device_class, device_class_mask)

Thiscreatesastructpci_device_idthatmatchesaspecificPCIclass. Wecandefinethetypeofdeviceadrivercansupportintwoways:
1 type: struct pci_device_id { __u32 vendor, device; /* Vendor and Device IDs */ __u32 subvendor, subdevice; /* Subvendor and Subdevice IDs */ __u32 class, classmask; /* Class and class mask */ kernel_ulong_t driver_data; /* Private data */ }; 2 type: static struct pci_device_id i810_ids[ ] = { { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810_IG1) { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810_IG3) { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810E_IG) { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82815_CGC)

{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82845G_IG) { 0, }, }; REGISTERING A PCI DRIVER

ThemainstructurethatallPCIdriversmustcreateinordertoberegisteredwiththekernel properlyisthestructpci_driverstructure.Thisstructureconsistsofanumberoffunctioncallbacks andvariablesthatdescribethePCIdrivertothePCIcore.


const char *name;

Thenameofthedriver.ItmustbeuniqueamongallPCIdriversinthekernelandisnormally settothesamenameasthemodulenameofthedriver.
const struct pci_device_id *id_table;

Pointertothestructpci_device_idtable.
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id);

PointertotheprobefunctioninthePCIdriver.ThisfunctioniscalledbythePCIcorewhenit hasastr uctpci_devthatitthinksthisdriverwantstocontrol.Apointertothestruct pci_device_idthatthePCIcoreusedtomakethisdecisionisalsopassedtothisfunction.Ifthe PCIdriverclaimsthestr uctpci_devthatispassedtoit,itshouldinitializethedeviceproperly andreturn0.Ifthedriverdoesnotwanttoclaimthedevice,oranerroroccurs,itshouldreturna negativeerrorvalue.


void (*remove) (struct pci_dev *dev);

PointertothefunctionthatthePCIcorecallswhenthestructpci_devisbeingremoved fromthesystem,orwhenthePCIdriverisbeingunloadedfromthekernel.
int (*suspend) (struct pci_dev *dev, u32 state);

PointertothefunctionthatthePCIcorecallswhenthestructpci_devisbeingsuspended.

Thesuspendstateispassedinthestatevariable.
int (*resume) (struct pci_dev *dev);

PointertothefunctionthatthePCIcorecallswhenthestructpci_devisbeingresumed.Itis alwayscalledaftersuspendhasbeencalled. ThisarrayofIDsisusedinthestructpci_dr iveranditisalsousedtotelluserspacewhich devicesthisspecificdriversupports.

static struct pci_driver driver1= { .name="pci", .id_table=pci_table, .probe=probe1, .remove=remove1, }; /* suspend() and resume() methods that implement power management are not used by this driver */

Toregisterthestructpci_driverwiththePCIcore,acalltopci_register_dr iverismade withapointertothestr uctpci_dr iver.


static int init1(void){ int i; i=pci_register_driver(&driver1); if(i==0){ printk(KERN_NOTICE "PCI REGISTERED\n"); return i; } return -1; }

Notethatthepci_register_dr iverfunctioneitherreturnsanegativeerrornumberor0if everythingwasregisteredsuccessfully.Itdoesnotreturnthenumberofdevicesthatwereboundto thedriveroranerrornumber. WhenthePCIdriveristobeunloaded,thestructpci_driverneedstobeunregisteredfrom thekernel.Thisisdonewithacalltopci_unregister_driver.Whenthiscallhappens,anyPCI devicesthatwerecurrentlyboundtothisdriverareremoved,andtheremovefunctionforthisPCI driveriscalledbeforethepci_unregister_driverfunctionreturns.


static void __exit pci_skel_exit(void) { pci_unregister_driver(&pci_driver); } PROBING THE NETWORK FUNCTION

Theprobe()methodforthenetworkfunction.

ThisEnablesthePCIdevice. DiscoversresourceinformationsuchasI/ObaseaddressesandIRQ.Allocatesandpopulatesa networkingdatastructureassociatedwiththisdevice. Registersitselfwiththekernelnetworkinglayer.

ThePCIsubsystemcallsprobe()withtwoarguments:

Apointertopci_dev,thedatastructurethatdescribesthisPCIdevice.Thisstructure,defined in include/linux/pci.h,ismaintainedbythePCIsubsystemforeachPCIdeviceonyoursystem.

Apointertopci_device_id,theentryinthedriver'spci_device_idtablethatmatchesthe informationfoundintheconfigurationspaceoftheinsertedcard.

int probe1(struct pci_dev *pdev,const struct pci_device_id *id){ struct net_device *net_dev=NULL; printk("probe start\n"); if(pci_enable_device(pdev)!=0){ printk("ERROR0\n"); return -1; } pci_set_master(pdev); if(!pci_dma_supported(pdev,0xffffffff)) { printk("DMA NOT SUPPORTED\n"); return-1; } else pdev->dma_mask=0xffffffff; net_dev = alloc_etherdev(sizeof(struct prv_data)); adapter=(struct prv_data*)net_dev->priv; adapter=netdev_priv(net_dev); memset(adapter,0,sizeof(struct prv_data)); adapter->netdev=net_dev; adapter->pci_dev=pdev; pci_set_drvdata(pdev,net_dev); if(pci_request_regions(pdev,"pci")) { printk("ERROR\n"); return -1; } adapter->regaddr=ioremap(pci_resource_start(pdev,1),pci_resource_len(pdev,)); if(adapter->regaddr==NULL) { printk("ERROR1\n"); return -1; } else net_dev->base_addr=pci_resource_start(pdev,1); memcpy(net_dev->name,"myeth0",6); //setting interface name memcpy(net_dev->dev_addr,adapter->regaddr,6); //getting MAC addr net_dev->open=&open1; net_dev->stop=&close1;

net_dev->hard_start_xmit=&transmit; net_dev->get_stats=pci_get_stats; net_dev->irq=pdev->irq; spin_lock_init (&adapter->lock); net_dev->mtu=1000; SET_MODULE_OWNER(net_dev); if(register_netdev(net_dev)) { printk("ERROR2\n"); return -1; } printk(KERN_NOTICE "NETDEV REGISTERED\n"); printk("probe end\n"); return 0; }

Inthekernel,theI/OregionsofPCIdeviceshavebeenintegratedintothegenericresource management.Forthisreason,youdontneedtoaccesstheconfigurationvariablesinordertoknow whereyourdeviceismappedinmemoryorI/Ospace.Thepreferredinterfaceforgettingregion informationconsistsofthefollowingfunctions:


unsigned long pci_resource_start(struct pci_dev *dev, int bar);

Thefunctionreturnsthefirstaddress(memoryaddressorI/Oportnumber)associatedwith oneofthesixPCII/Oregions.
unsigned long pci_resource_end(struct pci_dev *dev, int bar);

ThefunctionreturnsthelastaddressthatispartoftheI/Oregionnumberbar.
unsigned long pci_resource_flags(struct pci_dev *dev, int bar);

Thisfunctionreturnstheflagsassociatedwiththisresource. Allresourceflagsaredefinedin<linux/ioport.h>;themostimportantare:
IORESOURCE_IO IORESOURCE_MEM

IftheassociatedI/Oregionexists,oneandonlyoneoftheseflagsisset.
IORESOURCE_PREFETCH IORESOURCE_READONLY

Theseflagstellwhetheramemoryregionisprefetchableand/orwriteprotected.Thelatterflagis neversetforPCIresources.
PCI INTERRUPTS

InterruptsareeasytohandleinLinux.BythetimeLinuxboots,thecomputersfirmwarehas alreadyassignedauniqueinterruptnumbertothedevice,andthedriverjustneedstouseit.The interruptnumberisstoredinconfigurationregister60(PCI_INTERRUPT_LINE),whichisone bytewide.Ifthedevicedoesntsupportinterrupts,register61(PCI_INTERRUPT_PIN)is0; otherwise,itsnonzero.


result = pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &myirq); if (result) { /* deal with error */ }

APCIconnectorhasfourinterruptpins,andperipheralboardscanuseanyorallofthem. Eachpinisindividuallyroutedtothemotherboardsinterruptcontroller,sointerruptscanbeshared withoutanyelectricalproblems.Theinterruptcontrolleristhenresponsibleformappingtheinterrupt wires(pins)totheprocessorshardware;thisplatformdependentoperationislefttothecontrollerin ordertoachieveplatformindependenceinthebusitself.Thereadonlyconfigurationregisterlocated atPCI_INTERRUPT_PINisusedtotellthecomputerwhichsinglepinisactuallyused.


HARDWARE ABSTRACTION

Therelevantstructureforconfigurationregisteraccessincludesonlytwofields:
struct pci_ops { int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val); int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val); };

CHAPTER 7 :

TIME, DELAYS AND DEFERRED WORK

MEASURING TIME LAPSES

Thekernelkeepstrackoftheflowoftimebymeansoftimerinterrupts.Timerinterruptsare generatedbythesystemstiminghardwareatregularintervals;thisintervalisprogrammedatboot timebythekernelaccordingtothevalueofHZ.Thisfrequency,orthenumberoftimerticksper second,iscontainedinthekernelvariableHZ. Everytimeatimerinterruptoccurs,thevalueofaninternalkernelcounterisincremented.The counterisinitializedto0atsystemboot,soitrepresentsthenumberofclocktickssincelastboot.The counterisa64bitvariable(evenon32bitarchitectures)andiscalledjiffies_64.


USING THE JIFFIES COUNTER

Thecounterandtheutilityfunctionstoreaditlivein<linux/sched.h>,thatautomatically pullsjif f ies.hin.Bothjiffiesandjiffies_64mustbeconsideredreadonly.Wheneveryourcode needstorememberthecurrentvalueofjiffies,itcansimplyaccesstheunsignedlongvariable,which isdeclaredasvolatiletotellthecompilernottooptimizememoryreads.Youneedtoreadthecurrent counterwheneveryourcodeneedstocalculateafuturetimestamp. Tocompareyourcachedvalueandthecurrentvalue,youshoulduseoneofthefollowing macros:


#include <linux/jiffies.h> int time_after(unsigned long a, unsigned long b);

Evaluatestruewhena,asasnapshotofjiffies,representsatimeafterb(a>b).
int time_before(unsigned long a, unsigned long b);

Evaluatestruewhentimeaisbeforetimeb(a<b).
int time_after_eq(unsigned long a, unsigned long b); int time_before_eq(unsigned long a, unsigned long b);

Thelasttwocompareforafterorequalandbeforeorequal. Thedifferencebetweentwoinstancesofjiffiesinasafeway,youcanusethesametrick:
diff = (long)t2 (long)t1;.

Userspaceprogramsthattendtorepresenttimevalueswithstr ucttimevalandstr uct timespec.Thetwostructuresrepresentaprecisetimequantitywithtwonumbers:secondsand microsecondsareusedintheolderandpopularstr ucttimeval,andsecondsandnanosecondsare usedinthenewerstr ucttimespec.


#include <linux/time.h> unsigned long timespec_to_jiffies(struct timespec *value); void jiffies_to_timespec(unsigned long jiffies, struct timespec *value);

unsigned long timeval_to_jiffies(struct timeval *value); void jiffies_to_timeval(unsigned long jiffies, struct timeval *value); struct timeval: usesecondsandmicroseconds. struct timespec: usesecondsandmicroseconds. PROCESSOR SPECIFIC REGISTER

Ifyouneedtomeasureveryshorttimeintervalsoryouneedextremelyhighprecisioninyour figures,youcanresorttoplatformdependentresources. CPUmanufacturersintroducedawaytocountclockcyclesasaneasyandreliablewayto measuretimelapses.Therefore,mostmodernprocessorsincludeacounterregisterthatissteadily incrementedonceateachclockcycle.Thisclockcounteristheonlyreliablewaytocarryouthigh resolutiontimekeepingtasks. Afterincluding<asm/msr.h>(anx86specificheaderwhosenamestandsformachine specificregisters),youcanuseoneofthesemacros:


rdtsc(low32,high32); rdtscl(low32); rdtscll(var64);

Someoftheotherplatformsoffersimilarfunctionality,andkernelheadersofferan architectureindependentfunctionthatyoucanuseinsteadofrdtsc.Itiscalledget_cycles,defined in<asm/timex.h>(includedby<linux/timex.h>).Itsprototypeis:


#include <linux/timex.h> cycles_t get_cycles(void); KNOWING THE CURRENT TIME

Kernelcodecanalwaysretrievearepresentationofthecurrenttimebylookingatthevalueof jiffies.Driverscanusethecurrentvalueofjiffiestocalculatetimeintervalsacrossevents(for example,totelldoubleclicksfromsingleclicksininputdevicedriversorcalculatetimeouts).Inshort, lookingatjiffiesisalmostalwayssufficientwhenyouneedtomeasuretimeintervals.Thereisa kernelfunctionthatturnsawallclocktimeintoajiffiesvalue,however:


#include <linux/time.h> unsigned long mktime (unsigned int year, unsigned int mon, unsigned int day, unsigned int hour, unsigned int min, unsigned int sec);

Sometimesyouneedtodealwithabsolutetimestampeveninkernelspace.Tothisaim, <linux/time.h>exportsthedo_gettimeofdayfunction.Whencalled,itfillsastructtimeval pointerthesameoneusedinthegettimeofdaysystemcallwiththefamiliarsecondsand microsecondsvalues.Theprototypefordo_gettimeofdayis:

#include <linux/time.h> void do_gettimeofday(struct timeval *tv);

Thestructureisdefinedin<linux/pci.h>andusedbydrivers/pci/pci.c,wheretheactual publicfunctionsaredefined.Thesourcestatesthatdo_gettimeofdayhasnearmicrosecond resolution,becauseitasksthetiminghardwarewhatfractionofthecurrentjiffyhasalready elapsed. jitcreatesafilecalled/proc/cur rentime,whichreturnsthefollowingitemsinASCII when read: Thecurrentjiffiesandjiffies_64valuesashexnumbers Thecurrenttimeasreturnedbydo_gettimeofday Thetimespecreturnedbycurrent_kernel_time DELAYING EXECUTION Devicedriversoftenneedtodelaytheexecutionofaparticularpieceofcodeforaperiodof time,usuallytoallowthehardwaretoaccomplishsometask.Weusethephraselongdelaytorefer toamultiplejiffydelay,whichcanbeaslowasafewmillisecondsonsomeplatforms,butisstill longasseenbytheCPUandthekernel.
LONG DELAYS

Occasionallyadriverneedstodelayexecutionforrelativelylongperiodsmorethanoneclock

tick.
Busy Waiting

Ifyouwanttodelayexecutionbyamultipleoftheclocktick ,allowingsomeslackinthe

valuetheeasiest(thoughnotrecommended)implementationisaloopthatmonitorsthejiffy counter.Thebusywaitingimplementationusuallylookslikethefollowingcode,wherej1isthevalue ofjiffiesattheexpirationofthedelay:


while (time_before(jiffies, j1)) cpu_relax( );

Thecalltocpu_relaxinvokesanarchitecturespecificwayofsayingthatyourenotdoing Thisbusyloopseverelydegradessystemperformance.Ifyoudidntconfigureyourkernelfor

muchwiththeprocessoratthemoment. preemptiveoperation,theloopcompletelylockstheprocessorforthedurationofthedelay;the schedulerneverpreemptsaprocessthatisrunninginkernelspace,andthecomputerlookscompletely deaduntiltimej1isreached.

Yielding The Processor

Busywaitingimposesaheavyloadonthesystemasawhole;wewouldliketofindabetter technique.ThefirstchangethatcomestomindistoexplicitlyreleasetheCPUwhenwerenot interestedinit.Thisisaccomplishedbycallingtheschedulefunction,declaredin<linux/sched.h>:


while (time_before(jiffies, j1)) { schedule( ); } Timeouts

Thesuboptimaldelayloopsshownuptonowworkbywatchingthejiffycounterwithout tellinganyone.Butthebestwaytoimplementadelay,asyoumayimagine,isusuallytoaskthe kerneltodoitforyou.Therearetwowaysofsettingupjiffybasedtimeouts,dependingonwhether yourdriveris waitingforothereventsornot.Ifyourdriverusesawaitqueuetowaitforsomeother event,butyoualsowanttobesurethatitrunswithinacertainperiodoftime,itcanuse wait_event_timeoutor wait_event_inter r uptible_timeout:


#include <linux/wait.h> long wait_event_timeout(wait_queue_head_t q, condition, long timeout); long wait_event_interruptible_timeout(wait_queue_head_t q, condition, long timeout);

Thesefunctionssleeponthegivenwaitqueue,buttheyreturnafterthetimeoutexpires.Thus, theyimplementaboundedsleepthatdoesnotgoonforever.Notethatthetimeoutvaluerepresents thenumberofjiffiestowait,notanabsolutetimevalue.Thefunctionscomplainthroughaprintk statementiftheprovidedtimeoutisnegative.Ifthetimeoutexpires,thefunctionsreturn0;ifthe processisawakenedbyanotherevent,itreturnstheremainingdelayexpressedinjiffies. The/proc/jitqueuefileshowsadelaybasedonwait_event_interr uptible_timeout, althoughthemodulehasnoeventtowaitfor,anduses0asacondition:


wait_queue_head_t wait; init_waitqueue_head (&wait); wait_event_interruptible_timeout(wait, 0, delay);

wait_event_timeoutandwait_event_interr uptible_timeoutweredesignedwitha hardwaredriverinmind,whereexecutioncouldberesumedineitheroftwoways:eithersomebody callswake_uponthewaitqueue,orthetimeoutexpires.Toaccommodateforthisverysituation, whereyouwanttodelayexecutionwaitingfornospecificevent,thekerneloffersthe schedule_timeoutfunctionsoyoucanavoiddeclaringandusingasuperfluouswaitqueuehead:

#include <linux/sched.h> signed long schedule_timeout(signed long timeout);

Here,timeoutisthenumberofjiffiestodelay.Thereturnvalueis0unlessthefunctionreturnsbefore thegiventimeouthaselapsed(inresponsetoasignal).schedule_timeoutrequiresthatthecaller firstsetthecurrentprocessstate,soatypicalcalllookslike:


set_current_state(TASK_INTERRUPTIBLE); schedule_timeout (delay);

Thepreviouslines(from/proc/jitschedto)causetheprocesstosleepuntilthegiventimehas passed.Sincewait_event_inter r uptible_timeoutreliesonschedule_timeout. SHORT DELAYS Whenadevicedriverneedstodealwithlatenciesinitshardware,thedelaysinvolvedare usuallyafewdozenmicrosecondsatmost.Inthiscasewecanuseshortdelays. Thekernelfunctionsndelay,udelay,andmdelayservewellforshortdelays,delayingexecution forthespecifiednumberofnanoseconds,microseconds,ormillisecondsrespectively.Theirprototypes are:
#include <linux/delay.h> void ndelay(unsigned long nsecs); void udelay(unsigned long usecs); void mdelay(unsigned long msecs);

Toavoidintegeroverflowsinloopcalculations,udelayandndelayimposeanupperboundin thevaluepassedtothem.Ifyourmodulefailstoloadanddisplaysanunresolvedsymbol, __bad_udelay,itmeansyoucalledudelaywithtoolargeanargument. Thereisanotherwayofachievingmillisecond(andlonger)delaysthatdoesnotinvolvebusy waiting.Thefile<linux/delay.h>declaresthesefunctions:


void msleep(unsigned int millisecs); unsigned long msleep_interruptible(unsigned int millisecs); void ssleep(unsigned int seconds) ;

Thefirsttwofunctionsputsthecallingprocesstosleepforthegivennumberofmillisecs.A calltomsleepisuninterruptible;youcanbesurethattheprocesssleepsforatleastthegivennumber ofmilliseconds.Ifyourdriverissittingonawaitqueueandyouwantawakeuptobreakthesleep, usemsleep_inter r uptible.Thereturnvaluefrommsleep_interr uptibleisnormally0;if, however,theprocessisawakenedearly,thereturnvalueisthenumberofmillisecondsremaininginthe originallyrequestedsleepperiod.Acalltossleepputstheprocessintoanuninterruptiblesleepfor thegivennumberofseconds. ifyoucantoleratelongerdelaysthanrequested,youshoulduseschedule_timeout,

msleep,orssleep. KERNEL TIMERS Wheneveryouneedtoscheduleanactiontohappenlater,withoutblockingthecurrentprocess untilthattimearrives,kerneltimersarethetoolforyou.Itisusedtoscheduleexecutionofafunction ataparticulartimeinthefuture,basedontheclocktick ,andcanbeusedforavarietyoftasks;for example,pollingadevicebycheckingitsstateatregularintervalswhenthehardwarecantfire interrupts. Akerneltimerisadatastructurethatinstructsthekerneltoexecuteauserdefinedfunction withauserdefinedargumentatauserdefinedtime.Theimplementationresidesin <linux/timer.h>andker nel/timer.c Oneimportantfeatureofker neltimersisthatataskcanreregisteritselftorunagainata latertime.Thisispossiblebecauseeachtimer_liststructureisunlinkedfromthelistofactivetimers beforebeingrunandcan,therefore,beimmediatelyrelinkedelsewhere.Itissometimesuseful,whenit canbeusedtoimplementthepollingofdevices.
The Timer API

Thekernelprovidesdriverswithanumberoffunctionstodeclare,register,andremovekerneltimers.
#include <linux/timer.h> struct timer_list { /* ... */ unsigned long expires; void (*function)(unsigned long); unsigned long data; }; void init_timer(struct timer_list *timer); struct timer_list TIMER_INITIALIZER(_function, _expires, _data); void add_timer(struct timer_list * timer); int del_timer(struct timer_list * timer);

Theexpiresfieldrepresentsthejiffiesvaluewhenthetimerisexpectedtorun;atthattime,the

functionfunctioniscalledwithdataasanargument.Ifyouneedtopassmultipleitemsinthe argument,youcanbundlethemasasingledatastructureandpassapointercasttounsignedlong. Thestructuremustbeinitializedbeforeuse.Initializationcanbeperformedbycalling init_timerorassigningTIMER_INITIALIZERtoastaticstructure,accordingtoyourneeds. Afterinitialization,youcanchangethethreepublicfieldsbeforecallingadd_timer.Todisablea registeredtimerbeforeitexpires,calldel_timer.

int mod_timer(struct timer_list *timer, unsigned long expires);

Updatestheexpirationtimeofatimer,acommontaskforwhichatimeouttimerisused. mod_timer canbecalledoninactivetimersaswell,whereyounormallyuseadd_timer.


int del_timer_sync(struct timer_list *timer);

Workslikedel_timer,butalsoguaranteesthatwhenitreturns,thetimerfunctionisnotrunningon anyCPU.del_timer_syncisusedtoavoidraceconditionsonSMPsystemsandisthesameas del_timerinUPkernels.


int timer_pending(const struct timer_list * timer);

Returnstrueorfalsetoindicatewhetherthetimeriscurrentlyscheduledtorunbyreadingoneof theopaquefieldsofthestructure. TASKLETS Anotherkernelfacilityrelatedtotimingissuesisthetaskletmechanism.Itismostlyusedin interruptmanagement.Byschedulingatasklet,yousimplyaskforittobeexecutedatalatertime chosenbythekernel.Thisbehaviorisespeciallyusefulwithinterrupthandlers,wherethehardware interruptmustbemanagedasquicklyaspossible,butmostofthedatamanagementcanbesafely delayedtoalatertime. Ataskletexistsasadatastructurethatmustbeinitializedbeforeuse.Initializationcanbe performedbycallingaspecificfunctionorbydeclaringthestructureusingcertainmacros:
#include <linux/interrupt.h> struct tasklet_struct { /* ... */ void (*func)(unsigned long); unsigned long data; }; void tasklet_init(struct tasklet_struct *t, void (*func)(unsigned long), unsigned long data); DECLARE_TASKLET(name, func, data); DECLARE_TASKLET_DISABLED(name, func, data);

Taskletsofferanumberofinterestingfeatures: Ataskletcanbedisabledandreenabledlater;itwontbeexecuteduntilitisenabledasmany timesasithasbeendisabled. Justliketimers,ataskletcanreregisteritself.

Ataskletcanbescheduledtoexecuteatnormalpriorityorhighpriority.Thelattergroupisalways executedfirst. Taskletsmayberunimmediatelyifthesystemisnotunderheavyloadbutneverlaterthanthe nexttimertick. Ataskletscanbeconcurrentwithothertaskletsbutisstrictlyserializedwithrespecttoitself thesametaskletneverrunssimultaneouslyonmorethanoneprocessor.Also,asalreadynoted,a taskletalwaysrunsonthesameCPUthatschedulesit. Thefollowinglistlaysoutindetailthekernelinterfacetotaskletsafterthetaskletstructure

hasbeeninitialized:
void tasklet_disable(struct tasklet_struct *t);

Thisfunctiondisablesthegiventasklet.Itsexecutionisdeferreduntilthetasklethasbeen enabledagain.Aftercallingtasklet_disable,youcanbesurethatthetaskletisnotrunning anywhereinthesystem.


void tasklet_disable_nosync(struct tasklet_struct *t);

Disablethetasklet,butwithoutwaitingforanycurrentlyrunningfunctiontoexit.Whenit returns,thetaskletisdisabledandwontbescheduledinthefutureuntilreenabled,butitmaybe stillrunningonanotherCPUwhenthefunctionreturns.


void tasklet_enable(struct tasklet_struct *t);

Enablesataskletthathadbeenpreviouslydisabled.Ifthetasklethasalreadybeenscheduled, itwillrunsoon.Acalltotasklet_enablemustmatcheachcalltotasklet_disable,asthekernel keepstrackofthedisablecountforeachtasklet.


void tasklet_schedule(struct tasklet_struct *t);

Schedulethetaskletforexecution.Ifataskletisscheduledagainbeforeithasachancetorun, itrunsonlyonce.However,ifitisscheduledwhileitruns,itrunsagainafteritcompletes;
void tasklet_hi_schedule(struct tasklet_struct *t);

Schedulethetaskletforexecutionwithhigherpriority.
void tasklet_kill(struct tasklet_struct *t);

Thisfunctionensuresthatthetaskletisnotscheduledtorunagain;itisusuallycalledwhena deviceisbeingclosedorthemoduleremoved.Ifthetaskletisscheduledtorun,thefunctionwaits

untilithasexecuted. WORKQUEUES Workqueuesare,superficially,similartotasklets;theyallowkernelcodetorequestthata functionbecalledatsomefuturetime. Thereare,however,somesignificantdifferencesbetweenthetwo,including: Taskletsruninsoftwareinterruptcontextwiththeresultthatalltaskletcodemustbeatomic. Instead,workqueuefunctionsruninthecontextofaspecialkernelprocess;asaresult,theyhave moreflexibility.Inparticular,workqueuefunctionscansleep. Taskletsalwaysrunontheprocessorfromwhichtheywereoriginallysubmitted.Workqueues workinthesameway,bydefault. Kernelcodecanrequestthattheexecutionofworkqueuefunctionsbedelayed foranexplicitinterval. Thekeydifferencebetweenthetwoisthattaskletsexecutequickly,forashortperiodoftime, andinatomicmode,whileworkqueuefunctionsmayhavehigherlatencybutneednotbeatomic. Eachmechanismhassituationswhereitisappropriate. Workqueueshaveatypeofstructworkqueue_struct,whichisdefinedin <linux/workqueue.h>.Aworkqueuemustbeexplicitlycreatedbeforeuse,usingoneofthefollowing twofunctions:
struct workqueue_struct *create_workqueue(const char *name); struct workqueue_struct *create_singlethread_workqueue(const char *name);

Eachworkqueuehasoneormorededicatedprocesses(kernelthreads),whichrunfunctions submittedtothequeue.Ifyouusecreate_workqueue,yougetaworkqueuethathasadedicated threadforeachprocessoronthesystem.Inmanycases,allthosethreadsaresimplyoverkill;ifasingle workerthreadwillsuffice,createtheworkqueuewithcreate_singlethread_workqueueinstead. Tosubmitatasktoaworkqueue,youneedtofillinawork_structstructure.Thiscanbedone atcompiletimeasfollows:


DECLARE_WORK(name, void (*function)(void *), void *data);

Wherenameisthenameofthestructuretobedeclared,functionisthefunctionthatistobe calledfromtheworkqueue,anddataisavaluetopasstothatfunction.Ifyouneedtosetupthe work_str uctstructureatruntime,usethefollowingtwomacros:


INIT_WORK(struct work_struct *work, void (*function)(void *), void *data); PREPARE_WORK(struct work_struct *work, void (*function)(void *), void *data);

Therearetwofunctionsforsubmittingworktoaworkqueue:

int queue_work(struct workqueue_struct *queue, struct work_struct *work); int queue_delayed_work(struct workqueue_struct *queue, struct work_struct *work, unsigned long delay);

Tocancelapendingworkqueueentry,youmaycall:
int cancel_delayed_work(struct work_struct *work);

CHAPTER 8:

ALLOCATING MEMORY
THE REAL STORY OF MALLOC

Thekmallocallocationengineisapowerfultoolandeasilylearnedbecauseofitssimilaritytomalloc.

Thefunctionisfast(unlessitblocks)anddoesntclearthememoryitobtains;theallocatedregionstill holdsitspreviouscontent.*Theallocatedregionisalsocontiguousinphysicalmemory.
The Flags Argument

Theprototypeforkmallocis: Thefirstargumenttokmallocisthesizeoftheblocktobeallocated.Theallocationflags,is Themostcommonlyusedflag,GFP_KERNEL,meansthattheallocationisperformedon

#include <linux/slab.h> void *kmalloc(size_t size, int flags);

muchmoreinteresting,becauseitcontrolsthebehaviorofkmallocinanumberofways. behalfofaprocessrunninginkernelspace.GFP_KERNELmeansthatkmalloccanputthe currentprocesstosleepwaitingforapagewhencalledinlowmemorysituations. Incase,thecurrentprocessshouldnotbeputtosleep,andthedrivershoulduseaflagof GFP_ATOMICinstead.Thekernelnormallytriestokeepsomefreepagesaroundinordertofulfill atomicallocation.WhenGFP_ATOMICisused,kmalloccanuseeventhelastfreepage.Ifthat last pagedoesnotexist,however,theallocationfails. Alltheflagsaredefinedin<linux/gfp.h>,andindividualflagsareprefixedwithadouble underscore,suchas__GFP_DMA. Othermacrosareasfollows:
GFP_ATOMIC

Usedtoallocatememoryfrominterrupthandlersandothercodeoutsideofa

processcontext.Neversleeps.
GFP_KERNEL

Normalallocationofkernelmemory.Maysleep. Usedtoallocatememoryforuserspacepages;itmaysleep.

GFP_USER

GFP_HIGHUSER

LikeGFP_USER,butallocatesfromhighmemory.

GFP_NOIO GFP_NOFS

TheseflagsfunctionlikeGFP_KERNEL,buttheyaddrestrictionsonwhatthekernelcandoto

satisfytherequest.AGFP_NOFSallocationisnotallowedtoperformanyfilesystemcalls,while GFP_NOIOdisallowstheinitiationofanyI/Oatall.Theyareusedprimarilyinthefilesystem andvirtualmemorycodewhereanallocationmaybeallowedtosleep,butrecursivefilesystemcalls wouldbeabadidea.


__GFP_DMA

ThisflagrequestsallocationtohappenintheDMAcapablememoryzone.
__GFP_HIGHMEM

Thisflagindicatesthattheallocatedmemorymaybelocatedinhighmemory.
__GFP_COLD

Normally,thememoryallocatortriestoreturncachewarmpagespagesthatarelikelytobe foundintheprocessorcache.Instead,thisflagrequestsacoldpage,whichhasnotbeenusedin sometime.ItisusefulforallocatingpagesforDMAreads,wherepresenceintheprocessorcacheis notuseful.


__GFP_NOWARN

Thisrarelyusedflagpreventskernelfromissuingwarningswhenanallocationcannotbesatisfied.
__GFP_HIGH

Thisflagmarksahighpriorityrequest,whichisallowedtoconsumeeventhelastpagesofmemory setasidebythekernelforemergencies.
__GFP_NOFAIL __GFP_NORETRY

Theseflagsmodifyhowtheallocatorbehaveswhenithasdifficultysatisfyinganallocation.
__GFP_REPEAT

Meanstryalittleharderbyrepeatingtheattemptbuttheallocationcanstillfail.
__GFP_NOFAIL

Tellstheallocatornevertofail;itworksashardasneededtosatisfytherequest. __GFP_NOFAILisverystronglydiscouraged;therewillprobablyneverbeavalidreasontouse itinadevicedriver.


__GFP_NORETRY

Tellstheallocatortogiveupimmediatelyiftherequestedmemoryisnotavailable. Memory zones Both__GFP_DMAand__GFP_HIGHMEMhaveaplatformdependentrole, althoughtheir useisvalidforallplatforms.TheLinuxkernelknowsaboutaminimumofthreememoryzones: DMAcapablememory,normalmemory,andhighmemory.Whileallocationnormallyhappensinthe normalzone,settingeitherofthebitsjustmentionedrequiresmemorytobeallocatedfromadifferent zone. Thisleadsustothefollowingkernelmemoryzones:
1. ZONE_DMA(<16MB),thezoneusedforDirectMemoryAccess(DMA).Becauselegacy

ISAdeviceshave24addresslinesandcanaccessonlythefirst16MB,thekerneltriesto dedicatethisareaforsuchdevices.
2. ZONE_NORMAL(16MBto896MB),thenormallyaddressableregion,alsocalledlow

memory.The"virtual"fieldinstructpageforlowmemorypagescontainsthecorresponding logicaladdresses.
3. ZONE_HIGH(>896MB),thespacethatthekernelcanaccessonlyaftermappingresident

pagestoregionsinZONE_NORMAL(usingkmap()andkunmap()).Thecorrespondingkernel addressesarevirtualandnotlogical.The"virtual"fieldinstructpageforhighmemorypages pointstoNULLifthepageisnotkmapped.

The Size Argument kmalloclooksratherdifferentfromatypicaluserspacemallocimplementation.Asimple, heaporientedallocationtechniquewouldquicklyrunintotrouble. Allocationrequestsarehandledbygoingtoapoolthatholdssufficientlylargeobjectsand handinganentirememorychunkbacktotherequester. Ifyouaskforanarbitraryamountofmemory,yourelikelytogetslightlymorethanyouasked for,uptotwiceasmuch.Also,programmersshouldrememberthatthesmallestallocationthatkmalloc canhandleisasbigas32or64bytes,dependingonthepagesizeusedbythesystemsarchitecture. Becausememoryreturnedbykmalloc()retainsthecontentsfromitspreviousincarnation, therecouldbeasecurityriskifit'sexposedtouserspace.Togetzeroedkmallocedmemory,use kzalloc(). Lookaside Caches Adevicedriveroftenendsupallocatingmanyobjectsofthesamesize,overandover.Given thatthekernelalreadymaintainsasetofmemorypoolsofobjectsthatareallthesamesize.The kerneldoesimplementafacilitytocreatethissortofpool,whichisoftencalledalookasidecache. USBandSCSIdriversinLinux2.6usecaches. Itsfunctionsandtypesaredeclaredin<linux/slab.h>.Theslaballocatorimplementscaches thathaveatypeofkmem_cache_t;theyarecreatedwithacalltokmem_cache_create:
kmem_cache_t *kmem_cache_create(const char *name, size_t size, size_t offset,unsigned long flags, void (*constructor)(void *, kmem_cache_t *,unsigned

long flags), void (*destructor)(void *, kmem_cache_t *, unsigned long flags));

Thefunctioncreatesanewcacheobjectthatcanhostanynumberofmemoryareasallofthe samesize,specifiedbythesizeargument.Thenameargumentisassociatedwiththiscacheand functionsashousekeepinginformationusableintrackingproblems;usually,itissettothenameof thetypeofstructurethatiscached.Thecachekeepsapointertothename,ratherthancopyingit,so thedrivershouldpassinapointertoanameinstaticstorage(usuallythenameisjustaliteralstring). Thenamecannotcontainblanks. Theoffsetistheoffsetofthefirstobjectinthepage;itcanbeusedtoensureaparticular alignmentfortheallocatedobjects,butyoumostlikelywilluse0torequestthedefaultvalue.flags controlshowallocationisdoneandisabitmaskofthefollowingflags:


SLAB_NO_REAP

Settingthisflagprotectsthecachefrombeingreducedwhenthesystemislookingformemory. Settingthisflagisnormallyabadidea;itisimportanttoavoidrestrictingthememoryallocators freedomofactionunnecessarily.


SLAB_HWCACHE_ALIGN

Thisflagrequireseachdataobjecttobealignedtoacacheline;actualalignmentdependsonthe cachelayoutofthehostplatform.Thisoptioncanbeagoodchoiceifyourcachecontainsitemsthat arefrequentlyaccessedonSMPmachines.Thepaddingrequiredtoachievecachelinealignmentcan endupwastingsignificantamountsofmemory,however.


SLAB_CACHE_DMA

ThisflagrequireseachdataobjecttobeallocatedintheDMAmemoryzone. Theconstr uctoranddestr uctorargumentstothefunctionareoptionalfunctions;the formercanbeusedtoinitializenewlyallocatedobjects,andthelattercanbeusedtocleanup objectsprior totheirmemorybeingreleasedbacktothesystemasawhole. Aconstr uctoriscalledwhenthememoryforasetofobjectsisallocated;becausethat memorymayholdseveralobjects,theconstructormaybecalledmultipletimes. Constructorsanddestructorsmayormaynotbeallowedtosleep,accordingtowhethertheyare

passedtheSLAB_CTOR_ATOMICflag(whereCTORisshortforconstructor).Onceacacheof objectsiscreated,youcanallocateobjectsfromitbycallingkmem_cache_alloc:
void *kmem_cache_alloc(kmem_cache_t *cache, int flags);

Here,thecacheargumentisthecacheyouhavecreatedpreviously;theflagsarethesameas youwouldpasstokmallocandareconsultedifkmem_cache_allocneedstogooutandallocate morememoryitself.Tofreeanobject,usekmem_cache_free:


void kmem_cache_free(kmem_cache_t *cache, const void *obj);

Whendrivercodeisfinishedwiththecache,typicallywhenthemoduleisunloaded,itshould freeitscacheasfollows:
int kmem_cache_destroy(kmem_cache_t *cache);

A scull Based on the Slab Caches: scullc scullcusesmemorycaches.Thesizeofthequantumcanbemodifiedatcompiletimeandat loadtime,butnotatruntime.


/* declare one cache pointer: use it for all devices */ kmem_cache_t *scullc_cache; /* scullc_init: create a cache for our quanta */ scullc_cache = kmem_cache_create("scullc", scullc_quantum, 0, SLAB_HWCACHE_ALIGN, NULL, NULL); /* no ctor/dtor */ if (!scullc_cache) { scullc_cleanup( ); return -ENOMEM; } /* Allocate a quantum using the memory cache */ if (!dptr->data[s_pos]) { dptr->data[s_pos] = kmem_cache_alloc(scullc_cache, GFP_KERNEL); if (!dptr->data[s_pos]) goto nomem; memset(dptr->data[s_pos], 0, scullc_quantum); } /* And these lines release memory: */ for (i = 0; i < qset; i++) if (dptr->data[i]) kmem_cache_free(scullc_cache, dptr->data[i]); /* Finally, at module unload time, we have to return the cache to the system: */ /* scullc_cleanup: release the cache of our quanta */ if (scullc_cache) kmem_cache_destroy(scullc_cache);

Themaindifferencesinpassingfromsculltoscullcareaslightspeedimprovementandbetter memoryuse.Sincequantaareallocatedfromapoolofmemoryfragmentsofexactlytherightsize, theirplacementinmemoryisasdenseaspossible,asopposedtoscullquanta,whichbringinan unpredictablememoryfragmentation.

MEMORY POOLS Thereareplacesinthekernelwherememoryallocationscannotbeallowedtofail.Asawayof guaranteeingallocationsinthosesituations,thekerneldeveloperscreatedanabstractionknownasa memorypool(ormempool).Amemorypoolisreallyjustaformofalookasidecachethattriesto alwayskeepalistoffreememoryaroundforuseinemergencies. Amemorypoolhasatypeofmempool_t(definedin<linux/mempool.h>);youcancreate onewithmempool_create:


mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn, mempool_free_t *free_fn, void *pool_data);

Themin_nrargumentistheminimumnumberofallocatedobjectsthatthepoolshould alwayskeeparound.Theactualallocationandfreeingofobjectsishandledbyalloc_fnand free_fn,whichhavetheseprototypes:


typedef void *(mempool_alloc_t)(int gfp_mask, void *pool_data); typedef void (mempool_free_t)(void *element, void *pool_data);

Thefinalparametertomempool_create(pool_data)ispassedtoalloc_fnand free_fn.
/* code that sets up memory pools */ cache = kmem_cache_create(. . .); pool = mempool_create(MY_POOL_MINIMUM, mempool_alloc_slab, mempool_free_slab, cache); /* Objects can be allocated and freed */ void *mempool_alloc(mempool_t *pool, int gfp_mask); void mempool_free(void *element, mempool_t *pool); /* A mempool can be resized with: */ int mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);

Thiscall,ifsuccessful,resizesthepooltohaveatleastnew_min_nrobjects.Ifyounolonger needamemorypool,returnittothesystemwith:
void mempool_destroy(mempool_t *pool);

get_free_page and Friends Ifamoduleneedstoallocatebigchunksofmemory,itisusuallybettertouseapageoriented technique.


get_zeroed_page(unsigned int flags);

Returnsapointertoanewpageandfillsthepagewithzeros.
__get_free_page(unsigned int flags);

Similartoget_zeroed_page,butdoesntclearthepage.

__get_free_pages(unsigned int flags, unsigned int order);

Allocatesandreturnsapointertothefirstbyteofamemoryareathatispotentially several(physicallycontiguous)pageslongbutdoesntzerothearea. flagsargumentusedeitherGFP_KERNELorGFP_ATOMICisused,withthe additionofthe __GFP_DMAflag(formemorythatcanbeusedforISAdirectmemoryaccess operations)or __GFP_HIGHMEM whenhighmemorycanbeused.orderisthebasetwologarithm ofthenumberofpagesyouarerequestingorfreeing(i.e.,log2N).Forexample,orderis0ifyouwant onepage.Iforderistoobig(nocontiguousareaofthatsizeisavailable),thepageallocationfails. Whenaprogramisdonewiththepages,itcanfreethemwithoneofthefollowingfunctions. Thefirstfunctionisamacrothatfallsbackonthesecond:
void free_page(unsigned long addr); void free_pages(unsigned long addr, unsigned long order);

Ifyoutrytofreeadifferentnumberofpagesfromwhatyouallocated,thememorymap becomescorrupted,andthesystemgetsintroubleatalatertime.Thefunctionscanfailtoallocate memoryincertaincircumstances,particularlywhenGFP_ATOMICisused.Althoughkmalloc (GFP_KERNEL)sometimesfailswhenthereisnoavailablememory,thekerneldoesitsbestto fulfillallocationrequests. VMALLOC AND FRIENDS vmalloc,whichallocatesacontiguousmemoryregioninthevirtualaddressspace.Although thepagesarenotconsecutiveinphysicalmemory(eachpageisretrievedwithaseparatecallto alloc_page),thekernelseesthemasacontiguousrangeofaddresses.vmallocreturns0(the NULL address)ifanerroroccurs,otherwise,itreturnsapointertoalinearmemoryareaofsizeatleastsize. Useofvmallocisdiscouragedinmostsituations.Memoryobtainedfromvmallocisslightly lessefficienttoworkwith,and,onsomearchitectures,theamountofaddressspacesetasidefor vmallocisrelativelysmall.Ifyouneedtoallocatelargememorybuffers,andyoudon'trequirethe memorytobephysicallycontiguous,usevmalloc()ratherthankmalloc():
void *vmalloc(unsigned long count);

Herecountistherequestedallocationsize.Thefunctionreturnskernelvirtualaddresses. vmalloc()enjoysbiggerallocationsizelimitsthankmalloc()butisslowerandcan'tbe calledfrominterruptcontext.Moreover,youcannotusethephysicallydiscontiguousmemoryreturned byvmalloc()toperformDirectMemor yAccess(DMA).Highperformancenetworkdrivers commonlyusevmalloc()toallocatelargedescriptorringswhenthedeviceisopened.

#include <linux/vmalloc.h> void *vmalloc(unsigned long size); void vfree(void * addr); void *ioremap(unsigned long offset, unsigned long size); void iounmap(void * addr);

vmallochasmoreoverheadthan__get_free_pages,becauseitmustbothretrievethe memoryandbuildthepagetables.Therefore,itdoesntmakesensetocallvmalloctoallocatejust onepage. Thekernelthatusesvmallocisthecreate_modulesystemcall,whichusesvmalloctoget spaceforthemodulebeingcreated.Codeanddataofthemodulearelatercopiedtotheallocatedspace usingcopy_from_user.Inthisway,themoduleappearstobeloadedintocontiguousmemory. Likevmalloc,ioremapbuildsnewpagetables;unlikevmalloc,however,itdoesnt actuallyallocateanymemor y.Thevirtualaddressobtainediseventuallyreleasedbycalling iounmap. ioremapismostusefulformappingthe(physical)addressofaPCIbufferto(virtual)kernel space.Forexample,itcanbeusedtoaccesstheframebufferofaPCIvideodevice;suchbuffersare usuallymappedathighphysicaladdresses,outsideoftheaddressrangeforwhichthekernelbuilds pagetablesatboottime. Forthesakeofportability,youshouldnotdirectlyaccessaddressesreturnedbyioremapasif theywerepointerstomemory.Rather,youshouldalwaysusereadbandtheotherI/Ofunctions. Oneminordrawbackofvmallocisthatitcantbeusedinatomiccontextbecause,internally, ituseskmalloc(GFP_KERNEL)toacquirestorageforthepagetables,andthereforecouldsleep. OBTAINING LARGE BUFFERS Allocationsoflarge,contiguousmemorybuffersarepronetofailure.Systemmemoryfragments overtime,andchancesarethatatrulylargeregionofmemorywillsimplynotbeavailable. ThebestwayofperforminglargeI/Ooperationsisthroughscatter/gatheroperations.
Acquiring a Dedicated Buffer at Boot Time:

Ifyoureallyneedahugebufferofphysicallycontiguousmemory,thebestapproachisoftento

allocateitbyrequestingmemoryatboottime.Allocationatboottimeistheonlywaytoretrieve consecutivememorypageswhilebypassingthelimitsimposedby__get_free_pagesonthebuffer size,bothintermsofmaximumallowedsizeandlimitedchoiceofsizes.Allocatingmemoryatboot timeisadirtytechnique,becauseitbypassesallmemorymanagementpoliciesbyreservinga

private memorypool.Thistechniqueisinelegantandinflexible,butitisalsotheleastpronetofailure.A modulecantallocatememoryatboottime;onlydriversdirectlylinkedtothekernelcandothat. Adevicedriverusingthiskindofallocationcanbeinstalledorreplacedonlybyrebuildingthe kernelandrebootingthecomputer. Whenthekernelisbooted,itgainsaccesstoallthephysicalmemoryavailableinthesystem.It theninitializeseachofitssubsystemsbycallingthatsubsystemsinitializationfunction,allowing initializationcodetoallocateamemorybufferforprivateusebyreducingtheamountofRAMleft fornormalsystemoperation. Boottimememoryallocationisperformedbycallingoneofthesefunctions:
#include <linux/bootmem.h> void *alloc_bootmem(unsigned long size); void *alloc_bootmem_low(unsigned long size); void *alloc_bootmem_pages(unsigned long size); void *alloc_bootmem_low_pages(unsigned long size);

Itisraretofreememoryallocatedatboottime;youwillalmostcertainlybeunabletogetit backlaterifyouwantit.Thereisaninterfacetofreethismemory,however:
void free_bootmem(unsigned long addr, unsigned long size);

CHAPTER 10:

INTERRUPT HANDLING
Theremustbeawayforadevicetolettheprocessorknowwhensomethinghashappened. Thatway,ofcourse,isinterrupts.Aninterruptissimplyasignalthatthehardwarecansendwhenit wantstheprocessorsattention.Linuxhandlesinterruptsinmuchthesamewaythatithandles signalsinuserspace. INSTALLING AN INTERRUPT HANDLER Ifyouwanttoactuallyseeinterruptsbeinggenerated,writingtothehardwaredeviceisnt enough;asoftwarehandlermustbeconfiguredinthesystem.IftheLinuxkernelhasntbeentoldto expectyourinterrupt,itsimplyacknowledgesandignoresit.Thekernelkeepsaregistryofinterrupt lines,similartotheregistryofI/Oports.Amoduleisexpectedtorequestaninterruptchannel(or IRQ, forinterruptrequest)beforeusingitandtoreleaseitwhenfinished.Inmanysituations,modulesare alsoexpectedtobeabletoshareinterruptlineswithotherdrivers.Thefollowingfunctions,declared in<linux/inter r upt.h>,implementtheinterruptregistrationinterface:
int request_irq(unsigned int irq, irqreturn_t (*handler)(int, void *, struct pt_regs *), unsigned long flags, const char *dev_name, void *dev_id); void free_irq(unsigned int irq, void *dev_id);

Thevaluereturnedfromrequest_irqtotherequestingfunctioniseither0toindicatesuccessor anegativeerrorcode,asusual.ItsnotuncommonforthefunctiontoreturnEBUSYtosignalthat anotherdriverisalreadyusingtherequestedinterruptline.Theargumentstothefunctionsareas follows:


unsigned int irq

Theinterruptnumberbeingrequested.
irqreturn_t (*handler)(int, void *, struct pt_regs *)

Thepointertothehandlingfunctionbeinginstalled.
unsigned long flags

Abitmaskofoptionsrelatedtointerruptmanagement.
const char *dev_name

Thestringpassedtorequest_irqisusedin/proc/interruptstoshowtheowneroftheinterrupt.
void *dev_id

Pointerusedforsharedinterruptlines.Itisauniqueidentifierthatisusedwhentheinterruptline isfreedandthatmayalsobeusedbythedrivertopointtoitsownprivatedataarea(toidentify which deviceisinterrupting).Iftheinterruptisnotshared,dev_idcanbesettoNULL,butita goodideaanywaytousethisitemtopointtothedevicestructure. Thebitsthatcanbesetinflagsareasfollows:


SA_INTERRUPT

Thisindicatesafastinterrupthandler.Fasthandlersareexecutedwithinterruptsdisabledon thecurrentprocessor.
SA_SHIRQ

Thisbitsignalsthattheinterruptcanbesharedbetweendevices.
SA_SAMPLE_RANDOM

Thisbitindicatethatthegeneratedinterruptscancontributetoentropypoolusedby /dev/randomand/dev/urandom.Thesedevicesreturntrulyrandomnumberswhenreadandare designedtohelpapplicationsoftwarechoosesecurekeysforencryption.Ifyourdevicegenerates interruptsattruly randomtimes,youshouldsetthisflag. Theinterrupthandlercanbeinstalledeitheratdriverinitializationorwhenthedeviceisfirst opened.Althoughinstallingtheinterrupthandlerfromwithinthemodulesinitializationfunction mightsoundlikeagoodidea,itoftenisnt,especiallyifyourdevicedoesnotshareinterrupts.Because thenumberofinterruptlinesislimited,youdontwanttowastethem.IfamodulerequestsanIRQ atinitialization,itpreventsanyotherdriverfromusingtheinterrupt,evenifthedeviceholdingitis neverused.Requestingtheinterruptatdeviceopen,ontheotherhand,allowssomesharingof resources. Thecorrectplacetocallrequest_irqiswhenthedeviceisfirstopened,beforethehardwareis instructedtogenerateinterrupts.Theplacetocallfree_irqisthelasttimethedeviceisclosed,after thehardwareistoldnottointerrupttheprocessoranymore.Thedisadvantageofthistechniqueis

thatyouneedtokeepaperdeviceopencountsothatyouknowwheninterruptscanbedisabled. Theinterruptrequestedbythefollowingcodeisshort_irq.Theactualassignmentofthe variable.shor t_baseisthebaseI/Oaddressoftheparallelinterfacebeingused;register2ofthe interfaceiswrittentoenableinterruptreporting.


if (short_irq >= 0) { result = request_irq(short_irq, short_interrupt, SA_INTERRUPT, "short", NULL); if (result) { printk(KERN_INFO "short: can't get assigned irq %i\n", short_irq); short_irq = -1; } else { /* actually enable it -- assume this *is* a parallel port */ outb(0x10,short_base+2); } }

Thecodeshowsthatthehandlerbeinginstalledisafasthandler(SA_INTERRUPT), doesntsupportinterruptsharing(SA_SHIRQismissing),anddoesntcontributetosystementropy (SA_SAMPLE_RANDOMismissing,too).Theoutbcallthenenablesinterruptreportingfor theparallelport. Thei386andx86_64architecturesdefineafunctionforqueryingtheavailabilityofan interruptline:


int can_request_irq(unsigned int irq, unsigned long flags); This function returns a nonzero value if an attempt to allocate the given interrupt succeeds.

The /proc Interface


Whenever a hardware interrupt reaches the processor, an internal counter is incremented. Reported interrupts are shown in /proc/interrupts. root@montalcino:/bike/corbet/write/ldd3/src/short# m /proc/interrupts CPU0 0: 4848108 2: 0 8: 3 10: 4335 11: 8903 12: 49 NMI: 0 LOC: 4848187 ERR: 0 MIS: 0 CPU1 34 0 1 1 0 1 0 4848186 IO-APIC-edge timer XT-PIC cascade IO-APIC-edge rtc IO-APIC-level aic7xxx IO-APIC-level uhci_hcd IO-APIC-edge i8042

The first column is the IRQ number. The /proc/interrupts display shows how many interrupts

have been delivered to each CPU on the system. The last two columns give information on the programmable interrupt controller that handles the interrupt, and the name(s) of the device(s) that have registered handlers for the interrupt (as specified in the dev_name argument to request_irq).

AUTODETECTING THE IRQ NUMBER


One of the most challenging problems for a driver at initialization time can be how to determine which IRQ line is going to be used by the device. The driver needs the information in order to correctly install the handler. Even though a programmer could require the user to specify the interrupt number at load time, this is a bad practice, because most of the time the user doesnt know the number, either because he didnt configure the jumpers or because the device is jumperless. Sometimes autodetection depends on the knowledge that some devices feature a default behavior that rarely, if ever, changes. In this case, the driver might assume that the default values apply. if (short_irq < 0) /* not yet specified: force the default on */ switch(short_base) { case 0x378: short_irq = 7; break; case 0x278: short_irq = 2; break; case 0x3bc: short_irq = 5; break; }

ThecodeassignstheinterruptnumberaccordingtothechosenbaseI/Oaddress,while
insmod ./short.ko irq=x

allowingtheusertooverridethedefaultatloadtimewithsomethinglike: Somedevicesaremoreadvancedindesignandsimplyannouncewhichinterrupttheyre goingtouse.Inthiscase,thedriverretrievestheinterruptnumberbyreadingastatusbytefromone ofthedevicesI/OportsorPCIconfigurationspace.Whenthetargetdeviceisonethathasthe abilitytotellthedriverwhichinterruptitisgoingtouse,autodetectingtheIRQnumberjustmeans probingthedevice,withnoadditionalworkrequiredtoprobetheinterrupt. Unfortunately,noteverydeviceisprogrammerfriendly,andautodetectionmightrequiresome probing.Thetechniqueisquitesimple:thedrivertellsthedevicetogenerateinterruptsandwatches whathappens.Ifeverythinggoeswell,onlyoneinterruptlineisactivated. Welookattwowaystoperformthetask:callingkerneldefinedhelperfunctionsand implementingourownversion.
Kernel-assisted probing

TheLinuxkerneloffersalowlevelfacilityforprobingtheinterruptnumber.Itworksforonly nonsharedinterrupts.Thefacilityconsistsoftwofunctions,declaredin<linux/interr upt.h>.


unsigned long probe_irq_on(void);

Thisfunctionreturnsabitmaskofunassignedinterrupts.Thedrivermustpreservethereturned bitmask ,andpassittoprobe_irq_of flater.Afterthiscall,thedrivershouldarrangeforitsdevice

togenerateatleastoneinterrupt.
int probe_irq_off(unsigned long);

Afterthedevicehasrequestedaninterrupt,thedrivercallsthisfunction,passingasits argumentthebitmaskpreviouslyreturnedbyprobe_irq_on.probe_irq_of freturnsthenumber oftheinterruptthatwasissuedafterprobe_on.Ifnointerruptsoccurred,0isreturned.Ifmore thanoneinterruptoccurred(ambiguousdetection),probe_irq_offreturnsanegativevalue. Theprogrammershouldbecarefultoenableinterruptsonthedeviceafterthecallto probe_irq_onandtodisablethembeforecallingprobe_irq_off.Additionally,youmust remembertoservicethependinginterruptinyourdeviceafterprobe_irq_of f.


int count = 0; do { unsigned long mask; mask = probe_irq_on( ); outb_p(0x10,short_base+2); /* enable reporting */ outb_p(0x00,short_base); /* clear the bit */ outb_p(0xFF,short_base); /* set the bit: interrupt! */ outb_p(0x00,short_base+2); /* disable reporting */ udelay(5); /* give it some time */ short_irq = probe_irq_off(mask); if (short_irq = = 0) { /* none of them? */ printk(KERN_INFO "short: no irq reported by probe\n"); short_irq = -1; } /*if more than one line has been activated, the result is negative. We should service the interrupt (no need for lpt port) and loop over again. Loop at most five times, then give up */ } while (short_irq < 0 && count++ < 5); if (short_irq < 0) printk("short: probe failed %i times, giving up\n", count); Note the use of udelay before calling probe_irq_off. Depending on the speed of your processor, you may have to wait for a brief period to give the interrupt time to actually be delivered.

Do It Yourself Probing
Probing can also be implemented in the driver itself without too much trouble. It is a rare driver that must implement its own probing. To that end, the short module performs do-it-yourself detection of the IRQ line if it is loaded with probe=2. The mechanism is the same as the one described earlier: enable all unused interrupts, then wait and see what happens. Often a device can be configured to use one IRQ number from a set of three or four; probing just those IRQs enables us to detect the right one, without having to test for all possible IRQs. The short implementation assumes that 3, 5, 7, and 9 are the only possible IRQ values. The following code probes by testing all possible interrupts and looking at what happens. The

trials array lists the IRQs to try and has 0 as the end marker; the tried array is used to keep track of which handlers have actually been registered by this driver . int trials[ ] = {3, 5, 7, 9, 0}; int tried[ ] = {0, 0, 0, 0, 0}; int i, count = 0; /* *install the probing handler for all possible lines. Remember the result (0 for success, or * -EBUSY) in order to only free what has been acquired */ for (i = 0; trials[i]; i++) tried[i] = request_irq(trials[i], short_probing, SA_INTERRUPT, "short probe", NULL); do { short_irq = 0; /* none got, yet */ outb_p(0x10,short_base+2); /* enable */ outb_p(0x00,short_base); outb_p(0xFF,short_base); /* toggle the bit */ outb_p(0x00,short_base+2); /* disable */ udelay(5); /* give it some time */ /* the value has been set by the handler */ if (short_irq = = 0) { /* none of them? */ printk(KERN_INFO "short: no irq reported by probe\n"); } /* *If more than one line has been activated, the result is negative. We should service the interrupt *(but the lpt port doesn't need it) and loop over again. Do it at most 5 times */ } while (short_irq <=0 && count++ < 5); /* end of loop, uninstall the handler */ for (i = 0; trials[i]; i++) if (tried[i] = = 0) free_irq(trials[i], NULL); if (short_irq < 0) printk("short: probe failed %i times, giving up\n", count);

YoumightnotknowinadvancewhatthepossibleIRQvaluesare.Inthatcase,youneedto probeallthefreeinterrupts,insteadoflimitingyourselftoafewtrials[].Toprobeforallinterrupts, youhavetoprobefromIRQ0toIRQNR_IRQS1,whereNR_IRQSisdefinedin <asm/irq.h>andisplatformdependent. Thehandlersroleistoupdateshort_irqaccordingtowhichinterruptsareactuallyreceived. A0valueinshort_irqmeansnothingyet,whileanegativevaluemeansambiguous.


irqreturn_t short_probing(int irq, void *dev_id, struct pt_regs *regs) { if (short_irq = = 0) short_irq = irq; /* found */

if (short_irq != irq) short_irq = -irq; /* ambiguous */ return IRQ_HANDLED; }

FAST AND SLOW HANDLERS Fastinterruptswerethosethatcouldbehandledveryquickly,whereashandlingslow interruptstooksignificantlylonger.Slowinterruptscouldbesufficientlydemandingoftheprocessor, anditwasworthwhiletoreenableinterruptswhiletheywerebeinghandled. Fastinterrupts(thosethatwererequestedwiththeSA_INTERRUPTflag)areexecuted withallotherinterruptsdisabledonthecurrentprocessor.Notethatotherprocessorscanstillhandle interrupts,althoughyouwillneverseetwoprocessorshandlingthesameIRQatthesametime. Thefirstthingdo_IRQdoesistoacknowledgetheinterruptsothattheinterruptcontroller cangoontootherthings.ItthenobtainsaspinlockforthegivenIRQnumber,thuspreventingany otherCPUfromhandlingthisIRQ.Itclearsacoupleofstatusbits(includingonecalled IRQ_WAITIN G)andthenlooksupthehandler(s)forthisparticularIRQ.Ifthereisnohandler, theresnothingtodo;thespinlockisreleased,anypendingsoftwareinterruptsarehandled,and do_IRQreturns. ifadeviceisinterrupting,thereisatleastonehandlerregisteredforitsIRQaswell.The functionhandle_IRQ_eventiscalledtoactuallyinvokethehandlers.Ifthehandlerisoftheslow variety(SA_INTERRUPTisnotset),interruptsarereenabledinthehardware,andthehandleris invoked. IMPLEMENTING A HANDLERS Ahandlercanttransferdatatoorfromuserspace,becauseitdoesntexecuteinthecontextof aprocess.Handlersalsocannotdoanythingthatwouldsleep,suchascallingwait_event, allocating memorywithanythingotherthanGFP_ATOMIC,orlockingasemaphore.Finally, handlerscannotcallschedule. Theroleofaninterrupthandleristogivefeedbacktoitsdeviceaboutinterruptreceptionand toreadorwritedataaccordingtothemeaningoftheinterruptbeingserviced.Thefirststepusually consistsofclearingabitontheinterfaceboard;mosthardwaredeviceswontgenerateother interruptsuntiltheirinterruptpendingbithasbeencleared. Atypicaltaskforaninterrupthandlerisawakeningprocessessleepingonthedeviceifthe interruptsignalstheeventtheyrewaitingfor,suchasthearrivalofnewdata.

irqreturn_t short_interrupt(int irq, void *dev_id, struct pt_regs *regs) { struct timeval tv; int written; do_gettimeofday(&tv); /* Write a 16 byte record. Assume PAGE_SIZE is a multiple of 16 */ written = sprintf((char *)short_head,"%08u.%06u\n", (int)(tv.tv_sec % 100000000), (int)(tv.tv_usec)); BUG_ON(written != 16); short_incr_bp(&short_head, written); wake_up_interruptible(&short_queue); /* awake any reading process */ return IRQ_HANDLED; }

Thefollowingcodeimplementsreadandwritefor/dev/shortint:
ssize_t short_i_read (struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { int count0; DEFINE_WAIT(wait); while (short_head = = short_tail) { prepare_to_wait(&short_queue, &wait, TASK_INTERRUPTIBLE); if (short_head = = short_tail) schedule( ); finish_wait(&short_queue, &wait); if (signal_pending (current)) /* a signal arrived */ return -ERESTARTSYS; /* tell the fs layer to handle it */ } /* count0 is the number of readable data bytes */ count0 = short_head - short_tail; if (count0 < 0) /* wrapped */ count0 = short_buffer + PAGE_SIZE - short_tail; if (count0 < count) count = count0; if (copy_to_user(buf, (char *)short_tail, count)) return -EFAULT; short_incr_bp (&short_tail, count); return count; } ssize_t short_i_write (struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) { int written = 0, odd = *f_pos & 1; unsigned long port = short_base; /* output to the parallel data latch */ void *address = (void *) short_base; if (use_mem) { while (written < count) iowrite8(0xff * ((++written + odd) & 1), address); } else { while (written < count)

outb(0xff * ((++written + odd) & 1), port); } *f_pos += count; return written; } ENABLING AND DISABLING INTERRUPTS Disabling a single interrupt

Sometimes(butrarely!)adriverneedstodisableinterruptdeliveryforaspecificinterruptline. Thekerneloffersthreefunctionsforthispurpose,alldeclaredin<asm/irq.h>.youcannotdisable sharedinterruptlines,and,onmodernsystems,sharedinterruptsarethenorm.


void disable_irq(int irq); void disable_irq_nosync(int irq); void enable_irq(int irq);

Callstothesefunctionscanbenestedifdisable_irqiscalledtwiceinsuccession,two enable_irqcallsarerequiredbeforetheIRQistrulyreenabled.ifthethreadcallingdisable_irq holdsanyresources(suchasspinlocks)thattheinterrupthandlerneeds,thesystemcandeadlock. disable_irq_nosyncdiffersfromdisable_irqinthatitreturnsimmediately. SOFTIRQS AND TASKLETS Interrupthandlershavetwoconflictingrequirements:Theyareresponsibleforthebulkof devicedataprocessing,buttheyhavetoexitasfastaspossible.Tobailoutofthissituation,interrupt handlersaredesignedintwoparts:ahurriedandharriedtophalfthatinteractswiththehardware, anda relaxedbottomhalfthatdoesmostoftheprocessingwithallinterruptsenabled.Unlikeinterrupts, bottomhalvesaresynchronousbecausethekerneldecideswhentoexecutethem.Thefollowing mechanismsareavailableinthekerneltodeferworktoabottomhalf:softirqs,tasklets,andwork queues.Aprimarydifferencebetweenasoftirqandataskletisthattheformerisreentrantwhereas thelatterisn't.Differentinstancesofasoftirqcanrunsimultaneouslyondifferentprocessors,but thatisnotthecasewithtasklets.
tasklet_init() dynamicallyinitializesatasklet.Thefunctiondoesnotallocatememoryfora tasklet_str uct,ratheryouhavetopasstheaddressofanallocatedone.tasklet_schedule()

announcesthatthecorrespondingtaskletispendingexecution.Likeforinterrupts,thekerneloffersa bunchoffunctionstocontroltheexecutionstateoftaskletsonsystemshavingmultipleprocessors:
tasklet_enable() enablestasklets.

tasklet_disable() disablestaskletsandwaitsuntilanycurrentlyexecutingtaskletinstancehas

exited.
tasklet_disable_nosync() hassemanticssimilartodisable_irq_nosync().Thefunctiondoes

notwaitforactiveinstancesofthetasklettofinishexecution.

CHAPTER 13:

USB DRIVERS
USBwasoriginallycreatedtoreplaceawiderangeofslowanddifferentbusestheparallel, serial,andkeyboardconnectionswithasinglebustypethatalldevicescouldconnectto.USBhost controllerisinchargeofaskingeveryUSBdeviceifithasanydatatosend.Becauseofthistopology, aUSBdevicecanneverstartsendingdatawithoutfirstbeingaskedtobythehostcontroller.This

configurationallowsforaveryeasyplugandplaytypeofsystem,wherebydevicescanbe automaticallyconfiguredbythehostcomputer.TheLinuxkernelsupportstwomaintypesofUSB drivers:driversonahostsystemanddriversonadevice. USB DEVICE BASICS LinuxkernelprovidesasubsystemcalledtheUSBcoretohandlemostofthecomplexity. Endpoints AUSBendpointcancarrydatainonlyonedirection,eitherfromthehostcomputertothe device(calledanOUTendpoint)orfromthedevicetothehostcomputer(calledanINendpoint). Endpointscanbethoughtofasunidirectionalpipes.AUSBendpointcanbeoneoffourdifferent types:
CONTROL

ControlendpointsareusedtoallowaccesstodifferentpartsoftheUSBdevice.Theyare commonlyusedforconfiguringthedevice,retrievinginformationaboutthedevice,sendingcommands tothedevice,orretrievingstatusreportsaboutthedevice.ThesetransfersareguaranteedbytheUSB protocoltoalwayshaveenoughreservedbandwidthtomakeitthroughtothedevice.


INTERRUPT

InterruptendpointstransfersmallamountsofdataatafixedrateeverytimetheUSBhost asksthedevicefordata.ThesetransfersareguaranteedbytheUSBprotocoltoalwayshaveenough reservedbandwidthtomakeitthrough.


BULK

Bulkendpointstransferlargeamountsofdata.Theseendpointsareusuallymuchlarger(they canholdmorecharactersatonce)thaninterruptendpoints.Thesetransfersarenotguaranteedbythe USBprotocoltoalwaysmakeitthroughinaspecificamountoftime.Ifthereisnotenoughroomon thebustosendthewholeBULKpacket,itissplitupacrossmultipletransferstoorfromthedevice.


ISOCHRONOUS

Isochronousendpointsalsotransferlargeamountsofdata,butthedataisnotalways guaranteedtomakeitthrough.Theseendpointsareusedindevicesthatcanhandlelossofdata,and relymoreonkeepingaconstantstreamofdataflowing.

Controlandbulkendpointsareusedforasynchronousdatatransfers,wheneverthedriver decidestousethem.Interruptandisochronousendpointsareperiodic.Thismeansthatthese endpointsaresetuptotransferdataatfixedtimescontinuously,whichcausestheirbandwidthtobe reservedbytheUSBcore.USBendpointsaredescribedinthekernelwiththestructurestr uct usb_host_endpoint.Thisstructurecontainstherealendpointinformationinanotherstructure calledstructusb_endpoint_descr iptor.


bEndpointAddress

Alsoincludedinthis8bitvalueisthedirectionoftheendpoint.Thebitmasks USB_DIR_OUTandUSB_DIR_INcanbeplacedagainstthisfieldtodetermineifthedata forthisendpointisdirectedtothedeviceortothehost.


bmAttributes

ThebitmaskUSB_ENDPOINT_XFERT YPE_MASKshouldbeplacedagainstthis valueinordertodetermineiftheendpointisoftypeUSB_ENDPOINT_XFER_ISOC, USB_ENDPOINT_XFER_BULK,oroftypeUSB_ENDPOINT_XFER_INT.


wMaxPacketSize

Thisisthemaximumsizeinbytesthatthisendpointcanhandleatonce.Notethatitis possibleforadrivertosendamountsofdatatoanendpointthatisbiggerthanthisvalue,butthe datawillbedividedupintowMaxPacket Sizechunkswhenactuallytransmittedtothedevice.


bInterval

Ifthisendpointisoftypeinterrupt,thisvalueistheintervalsettingfortheendpointthat is,thetimebetweeninterruptrequestsfortheendpoint.

DataexchangewithaUSBdevicecanbeoneoffourtypes: Control transfers,usedtocarryconfigurationandcontrolinformation Bulk transfersthatferrylargequantitiesoftimeinsensitivedata Interrupt transfers thatexchangesmallquantitiesoftimesensitivedata Isochronous transfersforrealtimedataatpredictablebitrates USB URBS TheUSBcodeintheLinuxkernelcommunicateswithallUSBdevicesusingaurb(USB requestblock).foundintheinclude/linux/usb.hfile.Aurbisusedtosendorreceivedatatoorfroma specificUSBendpointonaspecificUSBdeviceinanasynchronousmanner.AUSBdevicedriver mayallocatemanyurbsforasingleendpointormayreuseasingleurbformanydifferentendpoints, dependingontheneedofthedriver.Everyendinadevicecanhandleaqueueofurbs,sothat

multipleurbscanbesenttothesameendpointbeforethequeueisempty.Thetypicallifecycleofaurb isasfollows: CreatedbyaUSBdevicedriver. AssignedtoaspecificendpointofaspecificUSBdevice. SubmittedtotheUSBcore,bytheUSBdevicedriver. SubmittedtothespecificUSBhostcontrollerdriverforthespecifieddevicebytheUSBcore. ProcessedbytheUSBhostcontrollerdriverthatmakesaUSBtransfertothedevice. Whentheurbiscompleted,theUSBhostcontrollerdrivernotifiestheUSBdevicedriver. Urbscanalsobecanceledanytimebythedriverthatsubmittedtheurb,orbytheUSBcoreif thedeviceisremovedfromthesystem.urbsaredynamicallycreatedandcontainaninternalreference countthatenablesthemtobeautomaticallyfreedwhenthelastuseroftheurbreleasesit.
STRUCT URBS struct usb_device *dev

Pointertothestructusb_devicetowhichthisurbissent.Thisvariablemustbeinitializedbythe USBdriverbeforetheurbcanbesenttotheUSBcore.
unsigned int pipe

Endpointinformationforthespecificstructusb_devicethatthisurbistobesentto.Thisvariable mustbeinitializedbytheUSBdriverbeforetheurbcanbesenttotheUSBcore.Tosetfieldsofthis structure,thedriverusesthefollowingfunctions:


unsigned int usb_sndctrlpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesacontrolOUTendpointforthespecifiedUSBdevicewiththespecifiedendpoint number.
unsigned int usb_rcvctrlpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesacontrolINendpointforthespecifiedUSBdevicewiththespecifiedendpointnumber.
unsigned int usb_sndbulkpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesabulkOUTendpointforthespecifiedUSBdevicewiththespecifiedendpointnumber.
unsigned int usb_rcvbulkpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesabulkINendpointforthespecifiedUSBdevicewiththespecifiedendpointnumber.
unsigned int usb_sndintpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesaninterruptOUTendpointforthespecifiedUSBdevicewiththespecifiedendpoint number.
unsigned int usb_rcvintpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesaninterruptINendpointforthespecifiedUSBdevicewiththespecifiedendpoint number.
unsigned int usb_sndisocpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesanisochronousOUTendpointforthespecifiedUSBdevicewiththespecifiedendpoint number.
unsigned int usb_rcvisocpipe(struct usb_device *dev, unsigned int endpoint)

SpecifiesanisochronousINendpointforthespecifiedUSBdevicewiththespecifiedendpoint number. EachaddressableunitinaUSBdeviceiscalledanendpoint.Theaddressassignedtoan endpointiscalledanendpointaddress.Eachendpointaddresshasanassociateddatatransfertype. Ifanendpointisresponsibleforbulkdatatransfer,forexample,it'scalledabulkendpoint.Endpoint address0isusedexclusivelyfordeviceconfiguration.Anendpointcanbeassociatedwithupstreamor downstreamdatatransfer.DataarrivingupstreamfromadeviceiscalledanINtransfer,whereas dataflowingdownstreamtoadeviceisanOUTtransfer.INandOUTtransfersownseparate addressspaces.So,youcanhaveabulkINendpointandabulkOUTendpointansweringtothe sameaddress.
unsigned int transfer_flags

Thisvariablecanbesettoanumberofdifferentbitvalues,dependingonwhattheUSB driverwantstohappentotheurb.
URB_SHORT_NOT_OK

ItspecifiesthatanyshortreadonanINendpointthatmightoccurshouldbetreatedasan errorbytheUSBcore.ThisvalueisusefulonlyforurbsthataretobereadfromtheUSBdevice,not forwriteurbs.


URB_ISO_ASAP

Iftheurbisisochronous,thisbitcanbesetifthedriverwantstheurbtobescheduled,assoon asthebandwidthutilizationallowsittobe,andtosetthestart_framevariableintheurbatthat

point.
URB_NO_TRANSFER_DMA_MAP

ShouldbesetwhentheurbcontainsaDMAbuffertobetransferred.TheUSBcoreusesthe bufferpointedtobythetransfer_dmavariableandnotthebufferpointedtobythe transfer_buffervariable.


URB_ZERO_PACKET

Ifset,abulkouturbfinishesbysendingashortpacketcontainingnodatawhenthedatais alignedtoanendpointpacketboundary.
URB_NO_INTERRUPT

Ifset,thehardwaremaynotgenerateaninterruptwhentheurbisfinished.TheUSBcore functionsusethisinordertodoDMAbuffertransfers.
void *transfer_buffer

Pointertothebuffertobeusedwhensendingdatatothedevice(foranOUTurb)orwhen receivingdatafromthedevice(foranINurb).
dma_addr_t transfer_dma

BuffertobeusedtotransferdatatotheUSBdeviceusingDMA.
int transfer_buffer_length

Thelengthofthebufferpointedtobythetransfer_bufferorthetransfer_dmavariable (asonlyonecanbeusedforaurb).Ifthisis0,neithertransferbuffersareusedbytheUSBcore. iftheendpointmaximumsizeissmallerthanthevaluespecifiedinthisvariable,thetransfer totheUSBdeviceisbrokenupintosmallerchunksinordertoproperlytransferthedata.Thislarge transferoccursinconsecutiveUSBframes.


unsigned char *setup_packet

Pointertothesetuppacketforacontrolurb.Itistransferredbeforethedatainthetransfer buffer.
dma_addr_t setup_dma

DMAbufferforthesetuppacketforacontrolurb.Itistransferredbeforethedatainthe normaltransferbuffer.
usb_complete_t complete

PointertothecompletionhandlerfunctionthatiscalledbytheUSBcorewhentheurbis completelytransferredorwhenanerroroccurstotheurb.
int actual_length

Whentheurbisfinished,thisvariableissettotheactuallengthofthedataeithersentbythe urb(forOUTurbs)orreceivedbytheurb(forINurbs.).
int status

Whentheurbisfinished,orbeingprocessedbytheUSBcore,thisvariableissettothecurrent statusoftheurb.Validvaluesforthisvariableinclude:
0

Theurbtransferwassuccessful.
-ENOENT

Theurbwasstoppedbyacalltousb_kill_urb.
-EINPROGRESS

TheurbisstillbeingprocessedbytheUSBhostcontrollers.Ifyourdrivereverseesthisvalue, itisabuginyourdriver.
-EPROTO

Oneofthefollowingerrorsoccurredwiththisurb: Abitstufferrorhappenedduringthetransfer. Noresponsepacketwasreceivedintimebythehardware.


-EILSEQ

TherewasaCRCmismatchintheurbtransfer.
-ECOMM

Datawasreceivedfasterduringthetransferthanitcouldbewrittentosystemmemory.This errorvaluehappensonlyforanINurb.
-EOVERFLOW

Ababbleerrorhappenedtotheurb.Ababbleerroroccurswhentheendpointreceivesmore datathantheendpointsspecifiedmaximumpacketsize.
int start_frame

Setsorreturnstheinitialframenumberforisochronoustransferstouse.

int interval

Theintervalatwhichtheurbispolled.Thisisvalidonlyforinterruptorisochronousurbs.
int number_of_packets

Validonlyforisochronousurbsandspecifiesthenumberofisochronoustransferbufferstobe handledbythisurb.
int error_count

Itspecifiesthenumberofisochronoustransfersthatreportedanytypeoferror.
struct usb_iso_packet_descriptor iso_frame_desc[0]

Thisvariableisanarrayofthestructusb_iso_packet_descriptorstructuresthatmake upthisurb.
unsigned int offset

Theoffsetintothetransferbuffer(startingat0forthefirstbyte)wherethispacketsdatais located.
unsigned int length

Thelengthofthetransferbufferforthispacket.
unsigned int actual_length

Thelengthofthedatareceivedintothetransferbuffer
unsigned int status

Thestatusoftheindividualisochronoustransferofthispacket. CREATING AND DESTROYING URBS Thestructurbstructuremustneverbecreatedstaticallyinadriverorwithinanother structure,becausethatwouldbreakthereferencecountingschemeusedbytheUSBcoreforurbs.It mustbecreatedwithacalltotheusb_alloc_urbfunction.Thisfunctionhastheprototype:


struct urb *usb_alloc_urb(int iso_packets, int mem_flags);

iso_packets,isthenumberofisochronouspacketsthisurbshouldcontain.Ifyoudonot wanttocreateanisochronousurb,thisvariableshouldbesetto0.mem_flags,isthesametypeof flagthatispassedtothekmallocfunctioncalltoallocatememoryfromthekernel.Ifthefunctionis successfulinallocatingenoughspacefortheurb,apointertotheurbisreturnedtothecaller.Ifthe returnvalueisNULL,someerroroccurredwithintheUSBcore,andthedriverneedstocleanup

properly.Afteraurb hasbeencreated,itmustbeproperlyinitializedbeforeitcanbeusedbytheUSB core. InordertotelltheUSBcorethatthedriverisfinishedwiththeurb,thedrivermustcallthe usb_free_urbfunction.Thisfunctiononlyhasoneargument:


void usb_free_urb(struct urb *urb);

Theargumentisapointertothestructurbyouwanttorelease.Afterthisfunctioniscalled, theurbstructureisgone,andthedrivercannotaccessitanymore.
Interrupt urbs

Thefunctionusb_f ill_int_urbisahelperfunctiontoproperlyinitializeaurbtobesentto ainterruptendpointofaUSBdevice:


void usb_fill_int_urb(struct urb *urb, struct usb_device *dev, unsigned int pipe, void *transfer_buffer, int buffer_length, usb_complete_t complete, void *context, int interval); struct urb *urb

Apointertotheurbtobeinitialized.
struct usb_device *dev

TheUSBdevicetowhichthisurbistobesent.
unsigned int pipe

ThespecificendpointoftheUSBdevicetowhichthisurbistobesent.Thisvalueiscreatedwith thepreviouslymentionedusb_sndintpipeorusb_rcvintpipefunctions.
void *transfer_buffer

Apointertothebufferfromwhichoutgoingdataistakenorintowhichincomingdataisreceived. Notethatthiscannotbeastaticbufferandmustbecreatedwithacalltokmalloc.
int buffer_length

Thelengthofthebufferpointedtobythetransfer_bufferpointer.
usb_complete_t complete

Pointertothecompletionhandlerthatiscalledwhenthisurbiscompleted.
void *context

Pointertotheblobthatisaddedtotheurbstructureforlaterretrievalbythecompletionhandler function.

int interval

Theintervalatwhichthatthisurbshouldbescheduled.
Bulk urbs

Bulkurbsareinitializedmuchlikeinterrupturbs.Thefunctionthatdoesthisis usb_f ill_bulk_urb,anditlookslike:


void usb_fill_bulk_urb(struct urb *urb, struct usb_device *dev, unsigned int pipe, void *transfer_buffer, int buffer_length, usb_complete_t complete, void *context);

Thereisnointervalparameterbecausebulkurbshavenointervalvalue.Theunsignedint pipevariablemustbeinitializedwithacalltotheusb_sndbulkpipeorusb_rcvbulkpipe function.Theusb_f ill_int_urbfunctiondoesnotsetthetransfer_flagsvariableintheurb,so anymodificationtothisfieldhastobedonebythedriveritself.


Control URBs

Controlurbsareinitializedwithacalltothefunctionusb_f ill_control_urb:
void usb_fill_control_urb(struct urb *urb, struct usb_device *dev, unsigned int pipe, unsigned char *setup_packet, void *transfer_buffer, int buffer_length, usb_complete_t complete, void *context);

unsignedchar*setup_packet,whichmustpointtothesetuppacketdatathatistobesentto theendpoint.Theunsignedintpipevariablemustbeinitializedwithacalltotheusb_sndctrlpipe orusb_rcvictrlpipefunction.Theusb_f ill_control_urbfunctiondoesnotsetthe transfer_flagsvariableintheurb,soanymodificationtothisfieldhastobedonebythedriver itself.


Isochronous URBs

Isochronousurbsunfortunatelydonothaveaninitializerfunctionliketheinterrupt,control, andbulkurbsdo.Sotheymustbeinitializedbyhandinthedriverbeforetheycanbesubmittedto theUSBcore.


urb->dev = dev; urb->context = uvd; urb->pipe = usb_rcvisocpipe(dev, uvd->video_endp-1); urb->interval = 1; urb->transfer_flags = URB_ISO_ASAP; urb->transfer_buffer = cam->sts_buf[i]; urb->complete = konicawc_isoc_irq;

urb->number_of_packets = FRAMES_PER_DESC; urb->transfer_buffer_length = FRAMES_PER_DESC; for (j=0; j < FRAMES_PER_DESC; j++) { urb->iso_frame_desc[j].offset = j; urb->iso_frame_desc[j].length = 1; } SUBMITTING URBS

OncetheurbhasbeenproperlycreatedandinitializedbytheUSBdriver,itisreadytobe submittedtotheUSBcoretobesentouttotheUSBdevice.Thisisdonewithacalltothefunction usb_submit_urb:


int usb_submit_urb(struct urb *urb, int mem_flags);

Theurbparameterisapointertotheurbthatistobesenttothedevice.Themem_flags parameterisequivalenttothesameparameterthatispassedtothekmalloccallandisusedtotell theUSBcorehowtoallocateanymemorybuffersatthismomentintime.Therearereallyonlythree validvaluesthatshouldbeused,dependingonwhenusb_submit_urbisbeingcalled:


GFP_ATOMIC

Thisvalueshouldbeusedwheneverthefollowingaretrue: Thecalleriswithinaurbcompletionhandler,aninterrupt,abottomhalf,a tasklet,oratimercallback. Thecallerisholdingaspinlockorrwlock.Ifasemaphoreisbeingheld,thisvalueisnot necessary. Thecur rent>stateisnotTASK_RUNNIN G.Thestateisalways TASK_RUNNIN Gunlessthedriverhaschangedthecurrentstateitself.
GFP_NOIO

ThisvalueshouldbeusedifthedriverisintheblockI/Opatch.Itshouldalsobeusedintheerror handlingpathofallstoragetypedevices.
GFP_KERNEL

Thisshouldbeusedforallothersituationsthatdonotfallintooneofthepreviouslymentioned categories. CANCELLING URBS TostopaurbthathasbeensubmittedtotheUSBcore,thefunctionsusb_kill_urbor

usb_unlink_urbshouldbecalled:
int usb_kill_urb(struct urb *urb); int usb_unlink_urb(struct urb *urb);

Theurbparameterforbothofthesefunctionsisapointertotheurbthatistobecanceled. Whenthefunctionisusb_kill_urb,theurblifecycleisstopped.Thisfunctionisusually usedwhenthedeviceisdisconnectedfromthesystem,inthedisconnectcallback. Forsomedrivers,theusb_unlink_urbfunctionshouldbeusedtotelltheUSBcoretostop an urb.Thisfunctiondoesnotwaitfortheurbtobefullystoppedbeforereturningtothecaller. WRITING USB DRIVERS Thestructusb_device_idstructureprovidesalistofdifferenttypesofUSBdevicesthat thisdriversupports.Thestructusb_device_idstructureisdefinedwiththefollowingfields:
__u16 match_flags

Determineswhichofthefollowingfieldsinthestructurethedeviceshouldbematchedagainst.
__u16 idVendor

TheUSBvendorIDforthedevice.ThisnumberisassignedbytheUSBforumtoitsmembers andcannotbemadeupbyanyoneelse.
__u16 idProduct

TheUSBproductIDforthedevice.AllvendorsthathaveavendorIDassignedtothemcan managetheirproductIDshowevertheychooseto.
__u16 bcdDevice_lo __u16 bcdDevice_hi

Definethelowandhighendsoftherangeofthevendorassignedproductversionnumber.The bcdDevice_hivalueisinclusive;itsvalueisthenumberofthehighestnumbereddevice. Thesevariables,combinedwiththeidVendorandidProduct,areusedtodefineaspecificversion ofadevice.


__u8 bDeviceClass __u8 bDeviceSubClass __u8 bDeviceProtocol

Definetheclass,subclass,andprotocolofthedevice,respectively.Thesenumbersaredefinedin theUSBspecification.Thesevaluesspecifythebehaviorforthewholedevice,includingallinterfaces onthisdevice.

__u8 bInterfaceClass __u8 bInterfaceSubClass __u8 bInterfaceProtocol

Muchlikethedevicespecificvalues,thesedefinetheclass,subclass,andprotocolofthe individualinterface,respectively.ThesenumbersaredefinedintheUSBspecification. AswithPCIdevices,thereareanumberofmacrosthatareusedtoinitializethisstructure:


USB_DEVICE(vendor, product)

Createsastructusb_device_idthatcanbeusedtomatchonlythespecifiedvendorandproduct IDvalueswithinaversionrange.
USB_DEVICE_INFO(class, subclass, protocol)

Createsastr uctusb_device_idthatcanbeusedtomatchaspecificclassofUSBdevices.
USB_INTERFACE_INFO(class, subclass, protocol)

Createsastr uctusb_device_idthatcanbeusedtomatchaspecificclassofUSBinterfaces. REGISTERING A USB DRIVER ThemainstructurethatallUSBdriversmustcreateisastr uctusb_driver.Thisstructures containsthefollowingfields:


struct module *owner

Pointertothemoduleownerofthisdriver.Thevariableshouldbesettothe THIS_MODULEmacro.
const char *name

Pointertothenameofthedriver.ItmustbeuniqueamongallUSBdriversinthekernelandis normallysettothesamenameasthemodulenameofthedriver.
const struct usb_device_id *id_table

Pointertothestructusb_device_idtablethatcontainsalistofallofthedifferentkindsof USBdevicesthisdrivercanaccept.
int (*probe) (struct usb_interface *intf, const struct usb_device_id *id)

PointertotheprobefunctionintheUSBdriver.ThisfunctioniscalledbytheUSBcorewhen it thinksithasastructusb_inter facethatthisdrivercanhandle.Apointertothestruct

usb_device_idthattheUSBcoreusedtomakethisdecisionisalsopassedtothisfunction.
void (*disconnect) (struct usb_interface *intf)

PointertothedisconnectfunctionintheUSBdriver.ThisfunctioniscalledbytheUSBcore whenthestructusb_inter facehasbeenremovedfromthesystemorwhenthedriverisbeing unloadedfromtheUSBcore.


static struct usb_driver skel_driver = { .owner = THIS_MODULE, .name = "skeleton", .id_table = skel_table, .probe = skel_probe, .disconnect = skel_disconnect, }; int (*ioctl) (struct usb_interface *intf, unsigned int code, void *buf)

PointertoanioctlfunctionintheUSBdriver.Inpratice,onlytheUSBhubdriverusesthis ioctl.
int (*suspend) (struct usb_interface *intf, u32 state)

PointertoasuspendfunctionintheUSBdriver.Itiscalledwhenthedeviceistobe suspendedbytheUSBcore.
int (*resume) (struct usb_interface *intf)

PointertoaresumefunctionintheUSBdriver.Itiscalledwhenthedeviceisbeingresumed bytheUSBcore. Toregisterthestr uctusb_dr iverwiththeUSBcore,acalltousb_register_driveris madewithapointertothestr uctusb_dr iver.


static int __init usb_skel_init(void) { int result; /* register this driver with the USB subsystem */ result = usb_register(&skel_driver); if (result) err("usb_register failed. Error number %d", result); return result; }

WhentheUSBdriveristobeunloaded,thestr uctusb_driverneedstobeunregistered fromthekernel.Thisisdonewithacalltousb_deregister_driver.


static void __exit usb_skel_exit(void) { /* deregister this driver with the USB subsystem */ usb_deregister(&skel_driver); } PROBE AND DISCONNECT IN DETAILS

TheprobefunctioniscalledwhenadeviceisinstalledthattheUSBcorethinksthisdriver shouldhandle;theprobefunctionshouldperformchecksontheinformationpassedtoitaboutthe deviceanddecidewhetherthedriverisreallyappropriateforthatdevice.Thedisconnectfunctionis calledwhenthedrivershouldnolongercontrolthedeviceforsomereasonandcandocleanup. Intheprobefunctioncallback ,theUSBdrivershouldinitializeanylocalstructuresthatit mightusetomanagetheUSBdevice.Itshouldalsosaveanyinformationthatitneedsaboutthe devicetothelocalstructure. IftheUSBdriverisnotassociatedwithanothertypeofsubsystemthathandlestheuser interactionwiththedevice,thedrivercanusetheUSBmajornumberinordertousethetraditional chardriverinterfacewithuserspace.Todothis,theUSBdrivermustcalltheusb_register_dev functionintheprobefunctionwhenitwantstoregisteradevicewiththeUSBcore.
/* we can register the device now, as it is ready */ retval = usb_register_dev(interface, &skel_class); if (retval) { /* something prevented us from registering this driver */ err("Not able to get a minor for this device."); usb_set_intfdata(interface, NULL); goto error; }

Theusb_register_devfunctionrequiresapointertoastructusb_interfaceandapointertoa structusb_class_driver.Thisstructusb_class_driverisusedtodefineanumberofdifferent parametersthattheUSBdriverwantstheUSBcoretoknowwhenregisteringforaminornumber. Thisstructureconsistsofthefollowingvariables:


char *name

Thenamethatsysfsusestodescribethedevice.Ifthenumberofthedeviceneedstobeinthe name,thecharacters%dshouldbeinthenamestring(usb/foo%d).

struct file_operations *fops;

Pointertothestructfile_operationsthatthisdriverhasdefinedtousetoregisterasthe characterdevice.
mode_t mode;

Themodeforthedevfsfiletobecreatedforthisdriver;unusedotherwise.Atypicalsetting forthisvariablewouldbethevalueS_IRUSRcombinedwiththevalueS_IWUSR,whichwould provideonlyreadandwriteaccessbytheownerofthedevicefile.


int minor_base;

Thisisthestartoftheassignedminorrangeforthisdriver.Only16devicesareallowedtobe associatedwiththisdriveratanyonetimeunlesstheCONFIG_USB_D YNAMIC_MINORS configurationoptionhasbeenenabledforthekernel.Ifso,thisvariableisignored,andallminor numbersforthedeviceareallocatedonafirstcome,firstservedmanner. Inthedisconnectfunction,itisalsoimportanttoretrievefromtheinterfaceanydatathat waspreviouslysetwithacalltousb_set_intfdata.Thensetthedatapointerinthestruct usb_inter facestructuretoNULLtopreventanyfurthermistakesinaccessingthedata improperly:
static void skel_disconnect(struct usb_interface *interface) { struct usb_skel *dev; int minor = interface->minor; /* prevent skel_open( ) from racing skel_disconnect( ) */ lock_kernel( ); // save this structure pointer bet-n probe & open invocation threads dev = usb_get_intfdata(interface); usb_set_intfdata(interface, NULL); /* give back our minor */ usb_deregister_dev(interface, &skel_class); unlock_kernel( ); /* decrement our usage count */ kref_put(&dev->kref, skel_delete); info("USB Skeleton #%d now disconnected", minor); } SUBMITTING AND CONTROLLING A URB

WhenthedriverhasdatatosendtotheUSBdevice,aurbmustbeallocatedfortransmitting

thedatatothedevice:
urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) { retval = -ENOMEM; goto error; }

Aftertheurbisallocatedsuccessfully,aDMAbuffershouldalsobecreatedtosendthedata tothedeviceinthemostefficientmanner,andthedatathatispassedtothedrivershouldbecopied intothatbuffer:


buf = usb_buffer_alloc(dev->udev, count, GFP_KERNEL, &urb->transfer_dma); if (!buf) { retval = -ENOMEM; goto error; } if (copy_from_user(buf, user_buffer, count)) { retval = -EFAULT; goto error; }

Oncethedataisproperlycopiedfromtheuserspaceintothelocalbuffer,theurbmustbe initializedcorrectlybeforeitcanbesubmittedtotheUSBcore:
/* initialize the urb properly */ usb_fill_bulk_urb(urb, dev->udev, usb_sndbulkpipe(dev->udev, dev-> bulk_out_endpointAddr), buf, count, skel_write_bulk_callback, dev); urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;

Nowthattheurbisproperlyallocated,thedataisproperlycopied,andtheurbisproperly initialized,itcanbesubmittedtotheUSBcoretobetransmittedtothedevice:
/* send the data out the bulk port */ retval = usb_submit_urb(urb, GFP_KERNEL); if (retval) { err("%s - failed submitting write urb, error %d", __FUNCTION__, retval); goto error; }

AftertheurbissuccessfullytransmittedtotheUSBdevice,theurbcallbackiscalledbythe USBcore.Inourexample,weinitializedtheurbtopointtothefunction skel_wr ite_bulk_callback,andthatisthefunctionthatiscalled:


static void skel_write_bulk_callback(struct urb *urb, struct pt_regs *regs) { /* sync/async unlink faults aren't errors */ if (urb->status && !(urb->status = = -ENOENT || urb->status = = -ECONNRESET || urb->status = = -ESHUTDOWN)) { dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status); }

/* free up our allocated buffer */ usb_buffer_free(urb->dev, urb->transfer_buffer_length, urb->transfer_buffer, urb-> transfer_dma); }

Thefirstthingthecallbackfunctiondoesischeckthestatusoftheurbtodetermineifthisurb completedsuccessfullyornot.Theerrorvalues,ENOENT,ECONNRESET,and ESHUTDOWNarenotrealtransmissionerrors,justreportsaboutconditionsaccompanyinga successfultransmission.Thenthecallbackfreesuptheallocatedbufferthatwasassignedtothisurb totransmit. USB TRANSFERS WITHOUT URBS SometimesaUSBdriverjustwanttosendorreceivesomesimpleUSBdata.Twofunctions areavailabletoprovideasimpleinterface.
usb_bulk_msg

usb_bulk_msgcreatesaUSBbulkurbandsendsittothespecifieddevice,thenwaitsforit

tocompletebeforereturningtothecaller.Itisdefinedas:
int usb_bulk_msg(struct usb_device *usb_dev, unsigned int pipe, void *data, int len, int *actual_length, int timeout);

Theparametersofthisfunctionare:
struct usb_device *usb_dev

ApointertotheUSBdevicetosendthebulkmessageto. ThespecificendpointoftheUSBdevicetowhichthisbulkmessageistobesent.Thisvalueis

unsigned int pipe

createdwithacalltoeitherusb_sndbulkpipeorusb_rcvbulkpipe.
void *data

ApointertothedatatosendtothedeviceifthisisanOUTendpoint.IfthisisanIN

endpoint,thisisapointertowherethedatashouldbeplacedafterbeingreadfromthedevice.
int len

Thelengthofthebufferthatispointedtobythedataparameter. Apointertowherethefunctionplacestheactualnumberofbytesthathaveeitherbeen

int *actual_length

transferredtothedeviceorreceivedfromthedevice,dependingonthedirectionoftheendpoint.

int timeout

Theamountoftime,injiffies,thatshouldbewaitedbeforetimingout.Ifthisvalueis0,the functionwaitsforeverforthemessagetocomplete.Ifthefunctionissuccessful,thereturnvalueis0; otherwise,anegativeerrornumberisreturned.


usb_control_msg

Theusb_control_msgfunctionworksjustliketheusb_bulk_msgfunction,exceptit

allowsadrivertosendandreceiveUSBcontrolmessages:
int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, __u8 requesttype, __u16 value, __u16 index, void *data, __u16 size, int timeout);

Theparametersofthisfunctionarealmostthesameasusb_bulk_msg,withafewimportant differences:
struct usb_device *dev

ApointertotheUSBdevicetosendthecontrolmessageto. ThespecificendpointoftheUSBdevicethatthiscontrolmessageistobesentto.Thisvalueis

unsigned int pipe

createdwithacalltoeitherusb_sndctrlpipeorusb_rcvctrlpipe.
__u8 request

TheUSBrequestvalueforthecontrolmessage. TheUSBrequesttypevalueforthecontrolmessage TheUSBmessagevalueforthecontrolmessage. TheUSBmessageindexvalueforthecontrolmessage. ApointertothedatatosendtothedeviceifthisisanOUTendpoint.IfthisisanIN

__u8 requesttype

__u16 value

__u16 index

void *data

endpoint,thisisapointertowherethedatashouldbeplacedafterbeingreadfromthedevice.
__u16 size

Thesizeofthebufferthatispointedtobythedataparameter.

int timeout

Theamountoftime,injiffies,thatshouldbewaitedbeforetimingout.Ifthisvalueis0,the functionwillwaitforeverforthemessagetocomplete.Ifthefunctionissuccessful,itreturnsthe numberofbytesthatweretransferredtoorfromthedevice.Ifitisnotsuccessful,itreturnsa negativeerrornumber.

PCI CODING #include<linux/init.h> #include<linux/types.h> #include<linux/module.h> #include<linux/kernel.h> #include<linux/spinlock.h> #include<linux/netdevice.h> #include<linux/etherdevice.h> #include<linux/pci.h> #include<linux/sockios.h> #include<linux/if.h> #include<asm/io.h> #include<linux/delay.h>

//struct netdevice items //struct pci_driver is defined

#include<linux/interrupt.h> #include<linux/dma-mapping.h> #include<linux/spinlock.h> #define TX_FIFO_THRESH 256 #define RX_BUF_LEN (8192 << 2) #define RX_BUF_PAD 16 #define RX_BUF_WRAP_PAD 2048 #define RX_BUF_TOT_LEN (RX_BUF_LEN + RX_BUF_PAD + RX_BUF_WRAP_PAD) #define TX_BUF_SIZE 1348 #define TX_BUF_TOT_LEN (TX_BUF_SIZE * 4) #define RX_FIFO_THRESH 7 #define RX_DMA_BURST 7 #define TX_DMA_BURST 6 #define TX_RETRY 8 #define atnbits 0x1000 MODULE_LICENSE("Dual BSD/GPL"); /* THE SET OF PCI CARDS THAT THIS DRIVER SUPPORTS */ static struct pci_device_id pci_table[]= { { PCI_VENDOR_ID_REALTEK, PCI_DEVICE_ID_REALTEK_8139, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, {0}, }; enum RTL8139_registers { ChipCmd IntrMask Cfg9346 = 0x50, BasicModeCtrl = 0x62, RxBuf RxConfig TxConfig TxAddr0 TxStatus0 IntrStatus RxBufPtr Mpc Misr }; enum Cfg9346Bits { Cfg9346_Lock = 0x00, Cfg9346_Unlock = 0xC0, }; enum TxStatusBits { TxHostOwns

= 0x37, = 0x3C, = 0x30, = 0x44, = 0x40, = 0x20, = 0x10, = 0x3E, = 0x38, = 0x4c, = 0x5c,

= 0x2000,

TxUnderrun = 0x4000, TxStatOK = 0x8000, TxOutOfWindow = 0x20000000, TxAborted = 0x40000000, TxCarrierLost = 0x80000000, }; enum ChipCmdBits { CmdReset CmdRxEnb CmdTxEnb RxBufEmpty }; = 0x10, = 0x08, = 0x04, = 0x01,

enum IntBits { MultiIntrClear = 0xF000, DisInt =0, PCIErr = 0x8000, PCSTimeout = 0x4000, RxFIFOOver = 0x40, RxUnderrun = 0x20, RxOverflow = 0x10, TxErr = 0x08, TxOK = 0x04, RxErr = 0x02, RxOK = 0x01, RxAckBits = RxFIFOOver | RxOverflow | RxOK, }; enum RxConfigBits { RxCfgFIFOShift= 13, RxCfgFIFONone = (7 << RxCfgFIFOShift), RxCfgDMAShift= 8, RxCfgDMAUnlimited = (7 << RxCfgDMAShift), RxCfgRcv8K = 0, RxCfgRcv32K = (1 << 12), RxNoWrap = (1 << 7), }; enum rx_mode_bits { AcceptErr = 0x20, AcceptRunt = 0x10, AcceptBroadcast = 0x08, AcceptMulticast = 0x04, AcceptMyPhys = 0x02, AcceptAllPhys = 0x01, }; enum RxStatusBits { RxMulticast RxPhysical RxBroadcast RxBadSymbol RxRunt RxTooLong RxCRCErr RxBadAlign = 0x8000, = 0x4000, = 0x2000, = 0x0020, = 0x0010, = 0x0008, = 0x0004, = 0x0002,

RxStatusOK }; enum tx_config_bits { TxClearAbt TxDMAShift TxRetryShift };

= 0x0001,

= (1 << 0), = 8, = 4,

static const u16 rtl_norx_intr_mask = PCIErr | PCSTimeout | RxUnderrun | TxErr | TxOK | RxErr ; static const u16 rtl_intr_mask = PCIErr | PCSTimeout | RxOverflow | RxUnderrun | RxFIFOOver | TxErr | TxOK | RxErr | RxOK; static const unsigned int Rx_config = (RxCfgFIFONone | RxCfgDMAUnlimited | RxCfgRcv32K | RxNoWrap); static const unsigned int Tx_config = (TX_DMA_BURST << TxDMAShift); struct prv_data { struct net_device *netdev; struct net_device_stats stats; struct pci_dev *pci_dev; void *regaddr; unsigned int tx_flag; unsigned char *tx_buf[4]; unsigned char *tx_bufs; unsigned char *rx_ring; unsigned int rx_curr; unsigned long cur_tx; unsigned long dirty_tx; u32 rx_reg; dma_addr_t tx_bufs_dma; dma_addr_t rx_ring_dma; spinlock_t lock; spinlock_t rx_lock; }; struct recieve_packet { void *data; unsigned int datalen; }; /* FUNCTION DECLARATIONS */ struct prv_data *adapter; struct recieve_packet packet; int probe1(struct pci_dev *pdev,const struct pci_device_id *id); void remove1(struct pci_dev *pdev); int close1(struct net_device *net_dev); int open1(struct net_device *net_dev); static int transmit(struct sk_buff *skb,struct net_device *net_dev); static irqreturn_t interrupt1_handler(int irq, void *dev_id,struct pt_regs *); static int rtl_recieve(struct net_device *dev); void rtl_rx_error(u32 rx_status,struct net_device *dev); struct net_device_stats* pci_get_stats(struct net_device *dev);

MODULE_DEVICE_TABLE(pci,pci_table); /* PCI DRIVER REGISTRATION */ static struct pci_driver driver1= { .name="pci", .id_table=pci_table, .probe=probe1, .remove=remove1, }; /*ETHERNET DRIVER INITIALIZATION */ static int init1(void) { int i; i=pci_register_driver(&driver1); //after reg call probe if Vid and id matched if(i==0) { printk(KERN_NOTICE "PCI REGISTERED\n"); return i; } return -1; } /* ETHERNET DRIVER EXIT */ static void exit1(void) { pci_unregister_driver(&driver1); printk(KERN_NOTICE "PCI UNREGISTERED\n"); } /* PROBE METHOD */ int probe1(struct pci_dev *pdev,const struct pci_device_id *id) { /* NET DEVICE STRUCTURE */ struct net_device *net_dev=NULL; printk("probe start\n"); /* Ask low-level PCI code to enable I/O and memory regions for this device.*/ if(pci_enable_device(pdev)!=0) { printk("ERROR0\n"); return -1; } /* Use this device in bus mastering mode, this card is capable of DMA */ pci_set_master(pdev); if(!pci_dma_supported(pdev,0xffffffff)) { printk("DMA NOT SUPPORTED\n"); return-1; } else pdev->dma_mask=0xffffffff; /* Allocate an Ethernet interface and fill in generic values in the net_dev structure.*/ net_dev = alloc_etherdev(sizeof(struct prv_data)); adapter=(struct prv_data*)net_dev->priv; adapter=netdev_priv(net_dev); memset(adapter,0,sizeof(struct prv_data)); adapter->netdev=net_dev;

adapter->pci_dev=pdev; pci_set_drvdata(pdev,net_dev); /* Claim I/O region */ if(pci_request_regions(pdev,"pci")) { printk("ERROR\n"); return -1; } adapter->regaddr=ioremap(pci_resource_start(pdev,1),pci_resource_len(pdev 1)); //device memm is mapp to kernel virtual address if(adapter->regaddr==NULL) { printk("ERROR1\n"); return -1; } else net_dev->base_addr=pci_resource_start(pdev,1); memcpy(net_dev->name,"myeth0",6); //setting interface name memcpy(net_dev->dev_addr,adapter->regaddr,6); //getting MAC addr net_dev->open=&open1; net_dev->stop=&close1; net_dev->hard_start_xmit=&transmit; net_dev->get_stats=pci_get_stats; net_dev->irq=pdev->irq; //getting interrupt line spin_lock_init (&adapter->lock); net_dev->mtu=1000; SET_MODULE_OWNER(net_dev); /* Register the driver with the network layer. This will allot an unused ethX interface */ if(register_netdev(net_dev)) { printk("ERROR2\n"); return -1; } printk(KERN_NOTICE "NETDEV REGISTERED\n"); printk("probe end\n"); return 0; } /* REMOVE METHOD */ void remove1(struct pci_dev *pdev) { struct net_device *net_dev=pci_get_drvdata (pdev); pci_release_regions(pdev); pci_disable_device (pdev); unregister_netdev(net_dev); printk(KERN_NOTICE "NETDEV UNREGISTERED\n"); } int open1(struct net_device *net_dev) { int i; u8 b; u16 f; u32 k; struct prv_data *adapter=netdev_priv(net_dev); printk("open start\n"); /*reset*/ iowrite8(CmdReset,adapter->regaddr+ChipCmd);

//gets private str //release the allocated regions //disable the pci dev //unregister the netdev str

wmb(); for (i = 1000; i > 0; i--) { barrier(); if ((ioread8(adapter->regaddr+ChipCmd)&CmdReset)==0) break; udelay (10); } /*disable interrupt*/ f=ioread16(adapter->regaddr+IntrMask); rmb(); printk("\nBEFORE INTERRUPT DISABLING: %x\n",f); iowrite16(DisInt,adapter->regaddr+IntrMask); wmb(); udelay(50); f=ioread16(adapter->regaddr+IntrMask); rmb(); printk("\nAFTER INTERRUPT DISABLING: %x\n",f); /* unlock*/ b=ioread8(adapter->regaddr+Cfg9346); rmb(); printk("\nBEFORE unlock: %x\n",b); iowrite8(Cfg9346_Unlock,adapter->regaddr+Cfg9346); wmb(); udelay(50); b=ioread8(adapter->regaddr+Cfg9346); rmb(); printk("\nAFTER unlock: %x\n",b); /*auto nego*/ f=ioread16(adapter->regaddr+BasicModeCtrl); rmb(); printk("\nBEFORE AUTONOGOTIATION: %x\n",f); iowrite16(atnbits,adapter->regaddr+BasicModeCtrl); wmb(); udelay(50); f=ioread16(adapter->regaddr+BasicModeCtrl); rmb(); printk("\nAFTER AUTONOGOTIATION: %x\n",f); /*lock*/ b=ioread8(adapter->regaddr+Cfg9346); rmb(); printk("\nBEFORE lock: %x\n",b); iowrite8(Cfg9346_Lock,adapter->regaddr+Cfg9346); wmb(); udelay(50); b=ioread8(adapter->regaddr+Cfg9346); rmb(); printk("\nAFTER lock: %x\n",b); /*register the interrupt handler*/ if(request_irq(net_dev->irq,interrupt1_handler,IRQF_SHARED,net_dev->name,net_dev)) { printk("INTERRUPT LINE REQUEST FAILED\n"); return -1; } /*initialise tx rx buf*/ adapter->tx_bufs=dma_alloc_coherent(&adapter->pci_dev->dev,TX_BUF_TOT_LEN,&adapter-

>tx_bufs_dma, GFP_KERNEL); adapter->rx_ring = dma_alloc_coherent(&adapter->pci_dev->dev,RX_BUF_TOT_LEN,&adapter>rx_ring_dma, GFP_KERNEL); if (adapter->tx_bufs==NULL||adapter->rx_ring==NULL) return -1; adapter->rx_curr = 0; adapter->cur_tx = 0; adapter->dirty_tx = 0; for (i=0;i<4;i++) adapter->tx_buf[i] = &adapter->tx_bufs[i*TX_BUF_SIZE]; /*enable tx rx*/ b=ioread8(adapter->regaddr+ChipCmd); rmb(); printk("\nBEFORE RESETING TX AND RX REG: %x\n",b); iowrite8(CmdRxEnb|CmdTxEnb,adapter->regaddr+ChipCmd); wmb(); udelay(50); b=ioread8(adapter->regaddr+ChipCmd); rmb(); printk("\nAFTER RESETING TX AND RX REG: %x\n",b); /*init rx config reg*/ k=ioread32(adapter->regaddr+RxConfig); rmb(); printk("\nBEFORE RX CONFIG REG: %x\n",k); iowrite32(Rx_config|AcceptBroadcast|AcceptMyPhys,adapter->regaddrRxConfig); wmb(); udelay(50); k=ioread32(adapter->regaddr+RxConfig); rmb(); printk("\nAFETR RX CONFIG REG: %x\n",k); /*init tx config reg*/ k=ioread32(adapter->regaddr+TxConfig); rmb(); printk("\nBEFORE TX CONFIG REG: %x\n",k); iowrite32(Tx_config,adapter->regaddr+TxConfig); wmb(); udelay(50); k=ioread32(adapter->regaddr+TxConfig); rmb(); printk("\nAFTER TX CONFIG REG: %x\n",k); /* init Rx ring buffer DMA address */ iowrite32(adapter->rx_ring_dma,adapter->regaddr+RxBuf); wmb(); udelay(50); printk("\nAFTER RX RING DMA\n"); /* init tx buffer DMA address */ for (i=0;i<4;i++) { iowrite32(adapter->tx_bufs_dma+(adapter->tx_buf[i]-adapter->tx_bufs),adapter>regaddr+TxAddr0+(i*4)); wmb();

udelay(50); } printk("\nAFTER TX RING DMA\n"); /*missed packet*/ iowrite32(0x0000,adapter->regaddr+Mpc); wmb(); /*setting multi` ple intr to 0*/ iowrite32(0x0000,adapter->regaddr+Misr); wmb(); /*enable interrupt*/ f=ioread16(adapter->regaddr+IntrMask); rmb(); printk("\nBEFORE INTERRUPT ENABLING: %x\n",f); iowrite16(rtl_intr_mask,adapter->regaddr+IntrMask); wmb(); udelay(50); f=ioread16(adapter->regaddr+IntrMask); rmb(); printk("\nAFTER INTERRUPT ENABLIG: %x\n",f); netif_start_queue(net_dev); printk("open end\n"); return 0; } static void tx_interrupt (struct net_device *dev) { struct prv_data *adapter=netdev_priv(dev); unsigned long dirty_tx, tx_left; dirty_tx=adapter->dirty_tx; tx_left= adapter->cur_tx-dirty_tx; while (tx_left > 0) { int entry=dirty_tx%4; int txstatus; txstatus=ioread32(adapter->regaddr+TxStatus0+(entry * sizeof (u32))); if (!(txstatus&(TxStatOK|TxUnderrun|TxAborted))) break; if (txstatus&(TxOutOfWindow|TxAborted)) { adapter->stats.tx_errors++; if(txstatus&TxAborted) { adapter->stats.tx_aborted_errors++; iowrite32(TxClearAbt,adapter->regaddr+TxConfig); iowrite32(TxErr,adapter->regaddr+IntrStatus); wmb(); } if(txstatus & TxCarrierLost) adapter->stats.tx_carrier_errors++; if (txstatus & TxOutOfWindow) adapter->stats.tx_window_errors++; } else {

if (txstatus & TxUnderrun) adapter->stats.tx_fifo_errors++; adapter->stats.collisions += (txstatus >> 24) & 15; adapter->stats.tx_bytes += txstatus & 0x7ff; adapter->stats.tx_packets++; } dirty_tx++; tx_left--; } if (adapter->cur_tx - dirty_tx > 4) { dirty_tx += 4; } if (adapter->dirty_tx!=dirty_tx) { adapter->dirty_tx=dirty_tx; mb(); netif_wake_queue (dev); } printk("Tx_interrupt is called\n"); } static irqreturn_t interrupt1_handler(int irq, void *dev_id,struct pt_regs *regs) { u16 status; struct net_device *dev=(struct net_device *)dev_id; struct prv_data *adapter=netdev_priv(dev); int handled = 0; printk("Interrupt Handler\n"); spin_lock(&adapter->lock); status=ioread16(adapter->regaddr+IntrStatus); printk("\nintr status=: %x\n",status); rmb(); if(status==0) goto out; handled=1; if(status==0xffff) goto out; if (status & (RxOK|RxOverflow|RxFIFOOver)) { rtl_recieve(dev); iowrite16(RxOK,adapter->regaddr+IntrStatus); ioread16(adapter->regaddr+IntrStatus); } if(status&(TxOK|TxErr)) { tx_interrupt(dev); if (status&TxOK) { iowrite16(TxOK,adapter->regaddr+IntrStatus); ioread16(adapter->regaddr+IntrStatus); } if (status&TxErr) { iowrite16(TxErr,adapter->regaddr+IntrStatus); ioread16(adapter->regaddr+IntrStatus);

} } out: spin_unlock (&adapter->lock); printk("\n----------------------------------------------------------------------------------------------------\n"); return IRQ_RETVAL(handled); } static int transmit(struct sk_buff *skb,struct net_device *net_dev) { unsigned int entry; unsigned int len = skb->len; unsigned long flags; u32 b; u16 txstatus; struct prv_data *adapter=netdev_priv(net_dev); entry=adapter->cur_tx%4; printk("TRANSMIT CALLED wit dec %d\n",entry); if (len<TX_BUF_SIZE) { if (len<ETH_ZLEN) memset(adapter->tx_buf[entry],0,ETH_ZLEN); skb_copy_and_csum_dev(skb, adapter->tx_buf[entry]); dev_kfree_skb(skb); } else { dev_kfree_skb(skb); return NETDEV_TX_OK; } spin_lock_irqsave(&adapter->lock, flags); b=entry*(sizeof(u32)); len=max(len,(unsigned int)ETH_ZLEN); printk("\npacket len=: %x\n",len); adapter->tx_flag=0x10000; iowrite32(adapter->tx_flag|len,adapter->regaddr+TxStatus0+b); wmb(); txstatus=ioread32(adapter->regaddr+TxStatus0+b); printk("\ntxstatus=: %x\n",txstatus); adapter->stats.tx_bytes+=len; adapter->cur_tx++; adapter->stats.tx_packets++; if ((adapter->cur_tx-4)==adapter->dirty_tx) netif_stop_queue (net_dev); spin_unlock_irqrestore(&adapter->lock, flags); printk("TRANSMIT END wit des %d\n",entry); return 0; } static int rtl_recieve(struct net_device *dev) { struct prv_data *adapter=netdev_priv(dev); int recieved = 0; unsigned int rx_curr = adapter->rx_curr; unsigned int rx_size = 0; u32 rx_status; u16 status;

unsigned int pkt_size; printk("RECEIVE CALLED\n"); while((ioread8(adapter->regaddr+ChipCmd)&0x01)==0) { rmb(); struct sk_buff *skb; u32 ring_offset=rx_curr%RX_BUF_LEN; rx_status = le32_to_cpu(* (u32*) (adapter->rx_ring+ring_offset)); rmb(); rx_size = rx_status >> 16; pkt_size = rx_size - 4; printk(KERN_NOTICE"\nstatus %x pktsize %x",rx_status,rx_size); if(((rx_size>(TX_BUF_SIZE+4))||(rx_size<8)||(!(rx_status&RxStatusOK)))) { rtl_rx_error(rx_status,dev); return -1; } skb=netdev_alloc_skb(dev,pkt_size+2); if(!skb) { printk(KERN_NOTICE"\nLow memory"); adapter->stats.rx_dropped++; goto out; } memcpy(skb_put(skb,pkt_size),adapter->rx_ring+ring_offset+4,pkt_size); skb->dev = dev; skb -> protocol = eth_type_trans(skb,dev); adapter -> stats.rx_bytes += (pkt_size); adapter -> stats.rx_packets++; netif_rx(skb); recieved++; rx_curr = (rx_curr + rx_size + 4 + 3)& ~3; iowrite16((u16)(rx_curr-16),adapter->regaddr+RxBufPtr); wmb(); status= ioread16(adapter->regaddr+IntrStatus) & RxAckBits; rmb(); if(status !=0) { if((status & (RxFIFOOver | RxOverflow))) { adapter->stats.rx_errors++; if (status & RxFIFOOver) adapter->stats.rx_fifo_errors++; } iowrite16(RxAckBits,adapter->regaddr+IntrStatus); wmb(); } } adapter->rx_curr=rx_curr; printk("RECEIVE END\n"); out: return recieved; } void rtl_rx_error(u32 rx_status,struct net_device *dev) { u8 reg8; int wait = 200;

struct prv_data *adapter=netdev_priv(dev); adapter->stats.rx_errors++; if (!(rx_status & RxStatusOK)) { /*Received packet is more than 4k (bit - 3)*/ if(rx_status & RxTooLong) printk("\nlong packet"); /*Rx Packet with invalid symbol and frame alignment error *bits( 5,1) */ if(rx_status & (RxBadSymbol | RxBadAlign)) adapter->stats.rx_frame_errors++; /* Rx packets with length < 64 bits(4,3) */ if (rx_status & (RxRunt | RxTooLong)) adapter->stats.rx_length_errors++; /* CRC error in the Rx Packet bit(2) */ if (rx_status & RxCRCErr) adapter->stats.rx_crc_errors++; } /* Disable Receive */ iowrite8(CmdTxEnb,adapter->regaddr+ ChipCmd); wmb(); while (--wait > 0) { udelay(1); reg8 = ioread8(adapter->regaddr+ChipCmd); if (!(reg8 & CmdRxEnb)) break; } wait=200; /* Enable Receive */ while (--wait > 0) { iowrite8(CmdRxEnb | CmdTxEnb,adapter->regaddr+ ChipCmd); wmb(); udelay(1); reg8 = ioread8(adapter->regaddr+ ChipCmd); if ((reg8 & CmdRxEnb) && (reg8 & CmdTxEnb)) break; } iowrite8(Cfg9346_Unlock,adapter->regaddr+Cfg9346 ); wmb(); /* Must enable Tx/Rx before setting transfer thresholds */ iowrite8(CmdTxEnb|CmdRxEnb,adapter->regaddr+ ChipCmd); adapter->rx_reg = Rx_config| AcceptBroadcast | AcceptMyPhys; iowrite32(adapter->rx_reg,adapter->regaddr+RxConfig); adapter->rx_curr =0; iowrite8(Cfg9346_Lock,adapter->regaddr+Cfg9346); wmb(); /* Set Rx Dma address */ iowrite32(adapter->rx_ring_dma,adapter->regaddr+RxBuf); return; } struct net_device_stats* pci_get_stats(struct net_device *dev) { struct prv_data *adapter=netdev_priv(dev); return &adapter->stats;

} int close1(struct net_device *net_dev) { unsigned long flags; struct prv_data *adapter=netdev_priv(net_dev); netif_stop_queue(net_dev); spin_lock_irqsave(&adapter->lock, flags); iowrite16(DisInt,adapter->regaddr+IntrMask); wmb(); iowrite8(0x00,adapter->regaddr+ChipCmd); wmb(); spin_unlock_irqrestore(&adapter->lock, flags); free_irq(net_dev->irq,net_dev); adapter->cur_tx=0; adapter->rx_curr=0; dma_free_coherent(&adapter->pci_dev->dev,RX_BUF_TOT_LEN,adapter->rx_ring,adapter->rx_ring_dma); dma_free_coherent(&adapter->pci_dev->dev,TX_BUF_TOT_LEN,adapter->tx_bufs,adapter->tx_bufs_dma); adapter->tx_bufs=NULL; adapter->rx_ring=NULL; printk("close success\n"); return 0; } module_init(init1); module_exit(exit1);

USB CODING /* This is a simple char driver code to read and write. * This contains license under GNU GPL and BSD. */ #include<linux/init.h> #include<linux/types.h> #include<linux/module.h> #include<linux/kernel.h> #include<linux/usb.h> #include<linux/netdevice.h> #include<linux/etherdevice.h> #include<linux/sockios.h> #include<linux/if.h> #include<asm/io.h> #include<linux/delay.h>

#include<linux/spinlock.h> MODULE_LICENSE("Dual BSD/GPL"); #define USB_VENOR_ID 0x0A46 #define USB_PRODUCT_ID 0x9601 #define READ_REGS 0x00 #define WRITE_REGS 0x01 #define WRITE_REG 0x03 #define READ_MEMS 0x02 #define WRITE_MEMS 0x05 #define WRITE_MEM 0x07 #define NCR 0x00 #define NSR 0x01 #define TCR 0x02 #define TSR 0x03 #define RCR 0x05 #define RSR 0x06 #define BPTR 0x08 #define FCTR 0x09 #define FCR 0x0A #define DPA 0x10 #define MAR 0x16 #define GPCR 0x1E #define USBDA 0xF0 #define USBC 0xF4 #define TX_OVERHEAD 2 #define RX_OVERHEAD 7 #define MAX_MTU 1536 //#define MALIGN(x) x __attribute__((aligned(L1_CACHE_BYTES))) typedef struct usb_skel{ __u8 bulk_in_endpointAddr; __u8 bulk_out_endpointAddr; unsigned char rx_buff; unsigned char *tx_buff; struct usb_device *udev; struct usb_interface *intf; struct net_device *netdev; struct net_device_stats stats; struct urb tx_urb,rx_urb; spinlock_t lock; }USB_DEV;

/* IN endpoint address */ /* OUT endpoint address */ /* Receive data buffer */ /* Transmit data buffer */ /* Device representation */ /* Interface representation*/ /* Statistics */ /* URBs for register access */ /* Lock */

static int usb_probe(struct usb_interface *intf,const struct usb_device_id *id); static void usb_disconnect(struct usb_interface *intf); static int unet_open(struct net_device *dev); static int unet_close(struct net_device *dev); static int unet_xmit(struct sk_buff *skb,struct net_device *dev); //void unet_multi_cast_list(struct net_device *dev); static struct net_device_stats *unet_stats (struct net_device *dev); static int read_mac(USB_DEV *dev, u8 reg, u16 length, void *data); static int reset(USB_DEV *dev,u8 reg); static void tx_complete(struct urb *urb); static void rx_complete(struct urb *urb); static int setNCR(USB_DEV *dev,u8 reg);

static int setFCTR(USB_DEV *dev,u8 reg); static int setFCR(USB_DEV *dev,u8 reg); static int setBPTR(USB_DEV *dev,u8 reg); static int setRCR(USB_DEV *dev,u8 reg); static int disRCR(USB_DEV *dev,u8 reg);

/* USB ID Table specifying the devices that this driver supports */ static struct usb_device_id usb_table[]= { { USB_DEVICE(USB_VENOR_ID,USB_PRODUCT_ID), //creates a usb_device_id from the vendor and product IDs supplied to it. }, {0}, }; MODULE_DEVICE_TABLE(USB,usb_table); static struct usb_driver usb_dm9601= // USB driver structure( Entry Points) { .name="usb_skell", .id_table=usb_table, .probe=usb_probe, .disconnect=usb_disconnect, }; static int usb_init(void) { int retval= usb_register(&usb_dm9601); if(retval==0) printk(KERN_NOTICE"Usb registerd successfully\n"); return retval; } static void usb_exit(void) { usb_deregister(&usb_dm9601); printk(KERN_NOTICE"Usb unregistered successfully\n"); }

// Terminate

static int usb_probe(struct usb_interface *intf,const struct usb_device_id *id) { int i; struct usb_device *udev=interface_to_usbdev(intf); struct usb_host_interface *iface_desc=intf->cur_altsetting; struct net_device *net_dev=NULL; USB_DEV *dev=NULL; printk(KERN_NOTICE"Probe has been started\n"); usb_get_dev(udev); /* Fill the usb_device and usb_interface */ net_dev=alloc_etherdev(sizeof(USB_DEV)); if(net_dev==NULL) { printk(KERN_NOTICE"Error in netdev allocation\n"); return -1; } dev=netdev_priv(net_dev); memset(dev,0,sizeof(USB_DEV));

dev->netdev=net_dev; dev->intf=intf; dev->udev=udev; usb_init_urb(&dev->rx_urb); usb_init_urb(&dev->tx_urb); for(i=0;i < iface_desc->desc.bNumEndpoints;++i) { struct usb_endpoint_descriptor *endpoint=&iface_desc->endpoint[i].desc; if(!dev->bulk_in_endpointAddr && (endpoint->bEndpointAddress & USB_DIR_IN)&&((endpoint>bmAttributes & USB_ENDPOINT_XFERTYPE_MASK)== USB_ENDPOINT_XFER_BULK)) { printk(KERN_NOTICE"\n bulk in"); dev->bulk_in_endpointAddr=endpoint->bEndpointAddress; } if(!dev->bulk_out_endpointAddr && !(endpoint->bEndpointAddress & USB_DIR_IN)&&((endpoint>bmAttributes & USB_ENDPOINT_XFERTYPE_MASK)== USB_ENDPOINT_XFER_BULK)) { printk(KERN_NOTICE"\n bulk out"); dev->bulk_out_endpointAddr=endpoint->bEndpointAddress; } } if(!(dev->bulk_in_endpointAddr && dev->bulk_out_endpointAddr)) { printk(KERN_NOTICE"Could not find the bulk in and bulk out\n"); goto error; } net_dev->open=unet_open; net_dev->stop=unet_close; net_dev->hard_start_xmit=unet_xmit; net_dev->get_stats=unet_stats; memcpy(net_dev->name,"rtl%d",4); read_mac(dev,DPA,ETH_ALEN,net_dev->dev_addr); SET_MODULE_OWNER(net_dev); if(register_netdev(net_dev)) { printk(KERN_NOTICE"Netdev registration failed\n"); goto error; } usb_set_intfdata(intf,dev); spin_lock_init(&dev->lock); netif_device_attach(net_dev); printk(KERN_NOTICE"Device attached\n"); return 0; error: printk(KERN_NOTICE"Error in attching\n"); free_netdev(net_dev); return -1; } /* Disconnect method. Called when the device is unplugged or when the module is unloaded */ static void usb_disconnect(struct usb_interface *intf) // save data pointer in this interface

{ // save this structure pointer bet-n threads USB_DEV *dev=usb_get_intfdata(intf); struct usb_device *udev = interface_to_usbdev (intf); unregister_netdev(dev->netdev); usb_put_dev(udev); //free_netdev(dev->netdev); //usb_set_intfdata(intf,NULL); kfree(dev); dev = NULL; printk(KERN_NOTICE"Disconnected Successfully\n"); return; } static int unet_open(struct net_device *net_dev) { USB_DEV *dev=netdev_priv(net_dev); dev->tx_buff=usb_buffer_alloc(dev->udev,MAX_MTU,GFP_KERNEL,&dev->tx_urb.transfer_dma); /* allocate Tx Buffer */ if(!dev->tx_buff) return -1; dev->tx_urb.transfer_flags |=URB_NO_TRANSFER_DMA_MAP; dev->rx_buff=usb_buffer_alloc(dev->udev,MAX_MTU,GFP_KERNEL,&dev->rx_urb.transfer_dma); /* allocate Rx Buffer */ if(!dev->rx_buff) return -1; dev->rx_urb.transfer_flags |=URB_NO_TRANSFER_DMA_MAP; usb_fill_bulk_urb( &dev->rx_urb, dev->udev,usb_rcvbulkpipe(dev->udev,dev->bulk_in_endpointAddr), dev>rx_buff,MAX_MTU,rx_complete,dev); if (usb_submit_urb(&dev->rx_urb,GFP_ATOMIC)) { printk(KERN_NOTICE"Error in submitting urb\n"); return -1; } reset(dev,NCR); /* Device reset */ udelay(10); //setNCR(dev,NCR); //setBPTR(dev,BPTR); /* set 200us */ //setFCTR(dev,FCTR); //setFCR(dev,FCR); setRCR(dev,RCR); /* Active Rx */ netif_start_queue(net_dev); printk(KERN_NOTICE"Open called\n"); return 0; } static int unet_close(struct net_device *net_dev) { USB_DEV *dev=netdev_priv(net_dev); netif_stop_queue(net_dev); disRCR(dev,RCR); /* disable Recieve */ //usb_unlink_urb(&dev->rx_urb); //usb_unlink_urb(&dev->tx_urb); /*usb_free_urb(&dev->rx_urb); usb_free_urb(&dev->tx_urb);*/ probe & open invocation

/* Zero out interface data */

// Cancels pending URB operations // used to free a reference to a completed URB // used to free a reference to a completed URB

module_put(net_dev->dev.class->owner); printk(KERN_NOTICE"Close called"); return 0; } static int unet_xmit(struct sk_buff *skb,struct net_device *net_dev) { USB_DEV *dev = netdev_priv(net_dev); __u16 length=skb->len; int count=skb->len+TX_OVERHEAD; int retval = NET_XMIT_SUCCESS; netif_stop_queue(net_dev); if (!(count & 0x3f)) { count++; length++; } ((__u16 *)dev->tx_buff)[0]=cpu_to_le16(length); memcpy(dev->tx_buff + 2, skb->data, skb->len); usb_fill_bulk_urb (&dev->tx_urb, dev->udev, usb_sndbulkpipe(dev->udev,dev-> bulk_out_endpointAddr),dev->tx_buff,count,tx_complete, dev); spin_lock(&dev->lock); if (usb_submit_urb(&dev->tx_urb,GFP_KERNEL)) { printk(KERN_NOTICE"Submission error\n"); dev->stats.tx_errors++; netif_start_queue( net_dev); } else { dev->stats.tx_packets++; dev->stats.tx_bytes += skb->len; } dev_kfree_skb(skb); spin_unlock(&dev->lock); return retval; } static struct net_device_stats *unet_stats (struct net_device *ndev) { USB_DEV *dev=netdev_priv(ndev); return &dev->stats; } static int read_mac(USB_DEV *dev, u8 reg, u16 length, void *data) { void *buf; u16 len; buf = kmalloc(length, GFP_KERNEL); if (!buf) goto out; len = usb_control_msg(dev->udev, usb_rcvctrlpipe(dev->udev, 0), READ_REGS, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, reg, buf, length, USB_CTRL_SET_TIMEOUT); if (len == length) memcpy(data, buf, length);

kfree(buf); return 0; out: return -1; } static int reset(USB_DEV *dev,u8 reg) { u8 val=0x01; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00, reg, &val,1, 0); printk(KERN_NOTICE"reset=%d",retval); return retval; } static int setNCR(USB_DEV *dev,u8 reg) { u8 val=0x00; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00,reg, &val,1,0); printk(KERN_NOTICE"NCR=%d",retval); return retval; } static int setBPTR(USB_DEV *dev,u8 reg) { u8 val=0x37; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00,reg, &val,1,0); printk(KERN_NOTICE"BPTR=%d",retval); return retval; } static int setFCTR(USB_DEV *dev,u8 reg) { u8 val=0x38; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00,reg,&val,1,0); printk(KERN_NOTICE"FCTR=%d",retval); return retval; } static int setFCR(USB_DEV *dev,u8 reg) { u8 val=0xff; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00,reg,&val,1,0); printk(KERN_NOTICE"FCR=%d",retval); return retval; } static int setRCR(USB_DEV *dev,u8 reg) { u8 val=0x01; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00,reg,&val,1,0); printk(KERN_NOTICE"RCR=%d",retval); return retval;

} static int disRCR(USB_DEV *dev,u8 reg) { u8 val=0x00; u8 retval; retval= usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev, 0),WRITE_REGS,0x40,0x00,reg,&val,1,0); printk(KERN_NOTICE"dis RCR=%d",retval); return retval; } static void tx_complete(struct urb *urb) { USB_DEV *dev=urb->context; if ( urb->status ) printk(KERN_NOTICE"\n%s: TX status %x", dev->netdev->name, urb->status); netif_wake_queue(dev->netdev); } static void rx_complete(struct urb *urb) { USB_DEV *dev=urb->context; int count = urb->actual_length; __u8 rx_status; struct sk_buff *skb; __u16 pkt_len; unsigned char *buf; if(!count) goto goon; //printk(KERN_NOTICE"\n%d bytes coming.....",count); buf=dev->rx_buff; rx_status = *(__u8 *)(buf); pkt_len = *(__u16 *)(buf + 1) - 4; dev->stats.rx_bytes += pkt_len; if ( (rx_status & 0xbf) || (pkt_len > MAX_MTU) ) { dev->stats.rx_errors++; goto goon; } if ( !(skb = dev_alloc_skb(pkt_len + 2)) ) goto goon; skb->dev = dev->netdev; skb_reserve(skb, 2); memcpy(skb_put(skb, pkt_len), buf + 3, pkt_len); skb->protocol = eth_type_trans(skb, dev->netdev); netif_rx(skb); dev->stats.rx_packets++; dev->stats.rx_bytes += pkt_len; //printk(KERN_NOTICE"\n%d bytes recieved.....",count); goon: usb_fill_bulk_urb( &dev->rx_urb, dev->udev,usb_rcvbulkpipe(dev->udev,dev->bulk_in_endpointAddr), dev->rx_buff,MAX_MTU,rx_complete,dev); if (usb_submit_urb(&dev->rx_urb,GFP_ATOMIC)) { printk(KERN_NOTICE"Recieve error in submitting urb\n");

} } module_init(usb_init); module_exit(usb_exit);

USB DEVICE: dm9601 CODING dm9601.h #ifndef DM9601_DEV #define HAS_HOME_PNA #define DM9601_MTU #define DM9601_MAX_MTU #define EPROM_WRITE #define EPROM_READ #define EPROM_LOAD #define MII_BMCR #define MII_BMSR #define PHY_READ #define PHY_WRITE #define DM9601_PRESENT #define DM9601_RUNNING #define DM9601_TX_BUSY 0x40000000 1500 1536 0x01 0x02 0x20 0x00 0x01 0x40 0x20 0x00000001 0x00000002 0x00000004

#define DM9601_RX_BUSY 0x00000008 #define CTRL_URB_RUNNING 0x00000010 #define CTRL_URB_SLEEP 0x00000020 #define DM9601_UNPLUG 0x00000040 #define DM9601_RESET_WAIT 0x00800000 #define NET_CTRL_CHANGE 0x04000000 #define NET_CTRL_CHANGED 0x08000000 #define RX_CTRL_CHANGE 0x10000000 #define RX_CTRL_CHANGED 0x20000000 #define HASH_REGS_CHANGE 0x40000000 #define HASH_REGS_CHANGED 0x80000000 #define ALL_REGS_CHANGE (NET_CTRL_CHANGE | RX_CTRL_CHANGE | HASH_REGS_CHANGE) #define ALL_REGS_CHANGED (NET_CTRL_CHANGED | RX_CTRL_CHANGED | HASH_REGS_CHANGED) #define DEFAULT_GPIO_RESET #define LINKSYS_GPIO_RESET #define DEFAULT_GPIO_SET #define RX_PASS_MULTICAST #define RX_PROMISCUOUS #define REG_TIMEOUT #define DM9601_TX_TIMEOUT #define TX_UNDERRUN #define EXCESSIVE_COL #define LATE_COL #define NO_CARRIER #define LOSS_CARRIER #define JABBER_TIMEOUT #define DM9601_REQT_READ #define DM9601_REQ_GET_REGS #define DM9601_REQ_GET_MEMS #define DM9601_REQT_WRITE #define DM9601_REQ_SET_REGS #define DM9601_REQ_SET_REG #define DM9601_REQ_SET_MEMS #define DM9601_REQ_SET_MEM #define DM9601_10MHF #define DM9601_100MHF #define DM9601_10MFD #define DM9601_100MFD #define DM9601_AUTO #define DM9601_1M_HPNA #define DM9601_REG5 #define DM9601_REG8 #define DM9601_REG9 #define DM9601_REGA #define DM9801_NOISE_FLOOR #define DM9802_NOISE_FLOOR 0x24 0x24 0x26 8 2 (HZ) (HZ*10) 0x80 0x40 0x20 0x10 0x08 0x04 0xc0 0x00 0x02 0x40 0x01 0x03 0x05 0x07 0 1 4 5 8 0x10 0x30 0x27 0x38 0xff 0x08 0x05

enum DM9601_NIC_TYPE { FASTETHER_NIC = 0, HOMERUN_NIC = 1, LONGRUN_NIC = 2 }; enum DM9601_MII_TYPE { MII_TYPE_INT = 0, MII_TYPE_EXT = 1 }; /* edit */ #define DMALIGN(x) x __attribute__((aligned(L1_CACHE_BYTES)))

typedef struct dm9601_board_info { struct usb_device *usb; /* Device representation */ struct net_device *net; struct net_device_stats stats; unsigned long rx_longf_errors, rx_runtf_errors, rx_lc_errors, rx_wdt_errors, rx_ple_errors; unsigned flags; unsigned features; int dev_index; int intr_interval; struct urb ctrl_urb; // communicate with device configuration register struct urb rx_urb, tx_urb, intr_urb, dump_urb; struct usb_ctrlrequest dr; wait_queue_head_t ctrl_wait; struct semaphore ctrl_sem; unsigned char DMALIGN(rx_buff[DM9601_MAX_MTU]); unsigned char DMALIGN(rx_buff2[DM9601_MAX_MTU]); unsigned char DMALIGN(tx_buff[DM9601_MAX_MTU]); unsigned char DMALIGN(intr_buff[8]); unsigned char DMALIGN(dump_buff[8]); __u16 hash_table[4]; __u8 rx_ctrl_reg, net_ctrl_reg, reg08, reg09, reg0a; __u8 phy; __u8 gpio_res; __u8 rx_buf_flag; __u8 nic_type; __u8 op_mode; } dm9601_board_info_t; struct usb_eth_dev { char *name; __u16 vendor; __u16 device; __u32 private; }; #define VENDOR_ACCTON #define VENDOR_ADMTEK #define VENDOR_BILLIONTON #define VENDOR_COREGA #define VENDOR_DLINK1 #define VENDOR_DLINK2 #define VENDOR_IODATA #define VENDOR_LANEED #define VENDOR_LINKSYS #define VENDOR_MELCO #define VENDOR_SMC #define VENDOR_SOHOWARE #else /* DM9601_DEV */ 0x083a 0x07a6 0x08dd 0x07aa 0x2001 0x07b8 0x04bb 0x056e 0x066b 0x0411 0x0707 0x15e8

/* LSB is gpio reset value */

DM9601_DEV( "Accton USB 10/100 Ethernet Adapter", VENDOR_ACCTON, 0x1046, DEFAULT_GPIO_RESET ) DM9601_DEV( "ADMtek AN986 \"Pegasus\" USB Ethernet (eval board)", VENDOR_ADMTEK, 0x0986, DEFAULT_GPIO_RESET | HAS_HOME_PNA ) DM9601_DEV( "Davicom USB-100", 0x0a46, 0x9601, DEFAULT_GPIO_RESET )

DM9601_DEV( "Davicom USB-100", 0x3334, 0x1701,DEFAULT_GPIO_RESET ) DM9601_DEV( "Billionton USB-100", VENDOR_BILLIONTON, 0x0986, DEFAULT_GPIO_RESET ) DM9601_DEV( "Billionton USBLP-100", VENDOR_BILLIONTON, 0x0987, DEFAULT_GPIO_RESET | HAS_HOME_PNA ) DM9601_DEV( "Billionton USBEL-100", VENDOR_BILLIONTON, 0x0988, DEFAULT_GPIO_RESET ) DM9601_DEV( "Corega FEter USB-TX", VENDOR_COREGA, x0004, DEFAULT_GPIO_RESET ) DM9601_DEV( "Corega FEter USB-TXC", VENDOR_COREGA, 0x9601, DEFAULT_GPIO_RESET ) DM9601_DEV( "D-Link DSB-650TX", VENDOR_DLINK1, 0x4001, LINKSYS_GPIO_RESET ) DM9601_DEV( "D-Link DSB-650TX", VENDOR_DLINK1, 0x4002, LINKSYS_GPIO_RESET ) DM9601_DEV( "D-Link DSB-650TX(PNA)", VENDOR_DLINK1, 0x4003, DEFAULT_GPIO_RESET | HAS_HOME_PNA ) DM9601_DEV( "D-Link DSB-650", VENDOR_DLINK1, 0xabc1, DEFAULT_GPIO_RESET ) DM9601_DEV( "D-Link DU-E10", VENDOR_DLINK2, 0xabc1, DEFAULT_GPIO_RESET ) DM9601_DEV( "D-Link DU-E100", VENDOR_DLINK2, 0x4002, DEFAULT_GPIO_RESET ) DM9601_DEV( "IO DATA USB ET/TX", VENDOR_IODATA, 0x0904, DEFAULT_GPIO_RESET ) DM9601_DEV( "LANEED USB Ethernet LD-USB/TX", VENDOR_LANEED, 0x4002, DEFAULT_GPIO_RESET ) DM9601_DEV( "Linksys USB10TX", VENDOR_LINKSYS, 0x2202, LINKSYS_GPIO_RESET ) DM9601_DEV("Linksys USB100TX", VENDOR_LINKSYS, 0x2203,LINKSYS_GPIO_RESET ) DM9601_DEV( "Linksys USB100TX", VENDOR_LINKSYS, 0x2204, LINKSYS_GPIO_RESET | HAS_HOME_PNA ) DM9601_DEV( "Linksys USB Ethernet Adapter", VENDOR_LINKSYS, 0x2206, LINKSYS_GPIO_RESET ) DM9601_DEV( "MELCO/BUFFALO LUA-TX", VENDOR_MELCO, 0x0001, DEFAULT_GPIO_RESET ) DM9601_DEV( "SMC 202 USB Ethernet", VENDOR_SMC, 0x0200, DEFAULT_GPIO_RESET ) DM9601_DEV( "SOHOware NUB100 Ethernet", 0x0a46, 0x9601, DEFAULT_GPIO_RESET ) #endif /* _DEV */

dm9601.c
#include <linux/configfs.h> #include <linux/sched.h> #include <linux/slab.h> #include <linux/init.h> #include <linux/delay.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/usb.h> #include <linux/module.h> #include <linux/crc32.h> #include "dm9601.h" extern int usb_set_configuration(struct usb_device *dev, int configuration); #define DM9601_USE_INTR

static struct usb_eth_dev usb_dev_id[] = { #define DM9601_DEV(pn, vid, pid, flags) {name:pn, vendor:vid, device:pid, private:flags}, #include "dm9601.h" #undef DM9601_DEV {NULL, 0, 0, 0} }; static struct usb_device_id dm9601_ids[] = { #define DM9601_DEV(pn, vid, pid, flags) \ {match_flags: USB_DEVICE_ID_MATCH_DEVICE, idVendor:vid, idProduct:pid}, #include "dm9601.h" #undef DM9601_DEV {} }; /* For module input parameter */ static int mode = DM9601_10MFD, dm9601_mode; /* edit : mode DM9601_AUTO-> DM9601_10MFD */ static u8 reg5 = DM9601_REG5, reg8 = DM9601_REG8, reg9 = DM9601_REG9, rega = DM9601_REGA, nfloor = 0; MODULE_DESCRIPTION("DAVICOM DM9601 USB Fast Ethernet driver"); MODULE_LICENSE("Dual BSD/GPL"); #define USB_ST_NOERROR #define USB_ST_CRC #define USB_ST_BITSTUFF #define USB_ST_NORESPONSE #define USB_ST_DATAOVERRUN 0 // no error (-EILSEQ) //CRC mismatch in the urb transfer (-EPROTO) //No response packet was received (-ETIMEDOUT) //device not responding/handshaking (-EOVERFLOW) //endpoint receive > data than max packetize. #define USB_ST_DATAUNDERRUN (-EREMOTEIO) //full data requested was not received #define USB_ST_BUFFEROVERRUN (-ECOMM) //Data was received faster #define USB_ST_BUFFERUNDERRUN (-ENOSR) #define USB_ST_INTERNALERROR (-EPROTO) // unknown error #define USB_ST_SHORT_PACKET (-EREMOTEIO) #define USB_ST_PARTIAL_ERROR (-EXDEV) // ISO transfer only partially completed #define USB_ST_URB_KILLED (-ENOENT) // URB canceled by user #define USB_ST_URB_PENDING (-EINPROGRESS) //urb is still being processed #define USB_ST_REMOVED (-ENODEV) //device not existing or removed #define USB_ST_TIMEOUT (-ETIMEDOUT) //communication timed out, also in urb->status #define USB_ST_NOTSUPPORTED (-ENOSYS) #define USB_ST_BANDWIDTH_ERROR (-ENOSPC) //too much bandwidth used #define USB_ST_URB_INVALID_ERROR (-EINVAL) //invalid value/transfer type #define USB_ST_URB_REQUEST_ERROR (-ENXIO) //invalid endpoint #define USB_ST_STALL (-EPIPE) //pipe stalled, also in urb->status /*edit */ module_param(mode, int,0644); module_param(reg5,int,0644); module_param(reg8,int,0644); module_param(reg9,int,0644); module_param(rega,int,0644); module_param(nfloor,int,0644); MODULE_PARM_DESC(mode, "Media mode select: 0:10MH 1:100MHF 4:10MF 5:100MF 8:AUTO"); MODULE_DEVICE_TABLE (usb, dm9601_ids); //marks dm9601_ids in the module image so that the module can be loaded on demand

static int write_eprom_word(dm9601_board_info_t *, __u8, __u16); static int update_eth_regs_async(dm9601_board_info_t *); /* static void ctrl_callback(struct urb *urb, struct pt_regs* regs) */ /* Callback function */ static void ctrl_callback(struct urb *urb) { dm9601_board_info_t *dbi = urb->context; if (!dbi) return; switch (urb->status & 0xff) { case USB_ST_NOERROR: case 0x92: if (dbi->flags & ALL_REGS_CHANGE) { update_eth_regs_async(dbi); return; } break; case USB_ST_URB_PENDING: case 0x8d: return; case USB_ST_URB_KILLED: break; default: warn("%s: status %x",__FUNCTION__, urb->status); } dbi->flags &= ~ALL_REGS_CHANGED; if (dbi->flags & CTRL_URB_SLEEP) { dbi->flags &= ~CTRL_URB_SLEEP; wake_up_interruptible(&dbi->ctrl_wait); } } static int get_registers(dm9601_board_info_t *dbi, __u16 indx, __u16 size, void *data) { int ret; DECLARE_WAITQUEUE(wait, current); while ( dbi->flags & ALL_REGS_CHANGED ) { dbi->flags |= CTRL_URB_SLEEP; interruptible_sleep_on( &dbi->ctrl_wait ); } /* Qualifies the request by encoding the data transfer direction */ dbi->dr.bRequestType = DM9601_REQT_READ; dbi->dr.bRequest = DM9601_REQ_GET_REGS; /* Vendor defined values */ dbi->dr.wValue = cpu_to_le16 (0); // Holds data to be written to the register dbi->dr.wIndex = cpu_to_le16p(&indx); // desired offset into the register space dbi->dr.wLength = cpu_to_le16p(&size); // number of bytes to be transferred dbi->ctrl_urb.transfer_buffer_length = size; usb_fill_control_urb( &dbi->ctrl_urb, dbi->usb, usb_rcvctrlpipe(dbi->usb,0), (char *)&dbi->dr, data, size, ctrl_callback , dbi ); add_wait_queue( &dbi->ctrl_wait, &wait ); set_current_state( TASK_INTERRUPTIBLE ); dbi->flags |= CTRL_URB_SLEEP; if ( (ret = usb_submit_urb( &dbi->ctrl_urb ,GFP_ATOMIC)) ) { err("%s: BAD CTRLs %d",__FUNCTION__,ret);

goto out; } schedule(); remove_wait_queue( &dbi->ctrl_wait, &wait ); out: return ret; } static int set_registers(dm9601_board_info_t *dbi, __u16 indx, __u16 size, void *data) { int ret; DECLARE_WAITQUEUE(wait, current); while (dbi->flags & ALL_REGS_CHANGED) { dbi->flags |= CTRL_URB_SLEEP ; interruptible_sleep_on(&dbi->ctrl_wait); } /* Qualifies the request by encoding the data transfer direction */ dbi->dr.bRequestType = DM9601_REQT_WRITE; dbi->dr.bRequest = DM9601_REQ_SET_REGS; /* Vendor defined values */ dbi->dr.wValue = cpu_to_le16(0); // Holds data to be written to the register dbi->dr.wIndex = cpu_to_le16p(&indx); // desired offset into the register space dbi->dr.wLength = cpu_to_le16p(&size); // number of bytes to be transferred dbi->ctrl_urb.transfer_buffer_length = size; /* Control urbs are initialized, populate the URB */ usb_fill_control_urb(&dbi->ctrl_urb, dbi->usb, usb_sndctrlpipe(dbi->usb, 0), (char *)&dbi->dr, data, size, ctrl_callback , dbi); add_wait_queue(&dbi->ctrl_wait, &wait); set_current_state(TASK_INTERRUPTIBLE); dbi->flags |= CTRL_URB_SLEEP; /* URB has been submitted to USB core */ if ( (ret = usb_submit_urb(&dbi->ctrl_urb,GFP_ATOMIC)) ) { err("%s: BAD CTRL %d",__FUNCTION__,ret); return ret; } schedule(); remove_wait_queue( &dbi->ctrl_wait, &wait ); return ret; } static int set_register( dm9601_board_info_t *dbi, __u16 indx, __u8 data ) { int ret; __u16 dat = data; DECLARE_WAITQUEUE(wait, current); while ( dbi->flags & ALL_REGS_CHANGED ) { dbi->flags |= CTRL_URB_SLEEP; interruptible_sleep_on( &dbi->ctrl_wait ); } /* Qualifies the request by encoding the data transfer direction */ dbi->dr.bRequestType = DM9601_REQT_WRITE; dbi->dr.bRequest = DM9601_REQ_SET_REG; /* Vendor defined values */ dbi->dr.wValue = cpu_to_le16p( &dat); // Holds data to be written to the register dbi->dr.wIndex = cpu_to_le16p( &indx );// desired offset into the register space dbi->dr.wLength = cpu_to_le16( 0 ); // number of bytes to be transferred

dbi->ctrl_urb.transfer_buffer_length = 0; usb_fill_control_urb( &dbi->ctrl_urb, dbi->usb, usb_sndctrlpipe(dbi->usb,0), (char *)&dbi->dr, &data, 0, ctrl_callback , dbi ); add_wait_queue( &dbi->ctrl_wait, &wait ); set_current_state( TASK_INTERRUPTIBLE ); dbi->flags |= CTRL_URB_SLEEP; /* URB has been submitted to USB core */ if ( (ret = usb_submit_urb( &dbi->ctrl_urb ,GFP_ATOMIC)) ) { err("%s: BAD CTRL %d",__FUNCTION__,ret); return ret; } schedule(); remove_wait_queue( &dbi->ctrl_wait, &wait ); return ret; } static int update_eth_regs_async( dm9601_board_info_t *dbi ) { int ret; if (dbi->flags & HASH_REGS_CHANGE) { dbi->flags &= ~HASH_REGS_CHANGE; dbi->flags |= HASH_REGS_CHANGED; /* Qualifies the request by encoding the data transfer direction */ dbi->dr.bRequestType = DM9601_REQT_WRITE; dbi->dr.bRequest = DM9601_REQ_SET_REGS; // Vendor defined values dbi->dr.wValue = cpu_to_le16(0); // Holds data to be written to the register dbi->dr.wIndex = cpu_to_le16(0x16);// desired offset into register space dbi->dr.wLength = cpu_to_le16(8); // number of bytes to be transferred dbi->ctrl_urb.transfer_buffer_length = 8; /* Control urbs are initialized, Populate the URB */ usb_fill_control_urb( &dbi->ctrl_urb, dbi->usb, usb_sndctrlpipe(dbi->usb,0), (char *)&dbi->dr, dbi->hash_table, 8, ctrl_callback , dbi ); } else if (dbi->flags & RX_CTRL_CHANGE) { dbi->flags &= ~RX_CTRL_CHANGE; dbi->flags |= RX_CTRL_CHANGED; /* Qualifies the request by encoding the data transfer direction */ dbi->dr.bRequestType = DM9601_REQT_WRITE; dbi->dr.bRequest = DM9601_REQ_SET_REG; //Vendor defined values dbi->dr.wValue = cpu_to_le16(dbi->rx_ctrl_reg); dbi->dr.wIndex = cpu_to_le16(0x5); // desired offset into register space dbi->dr.wLength = cpu_to_le16(0); // number of bytes to be transferred dbi->ctrl_urb.transfer_buffer_length = 0; usb_fill_control_urb( &dbi->ctrl_urb, dbi->usb, usb_sndctrlpipe(dbi->usb,0), (char *)&dbi->dr, &dbi->rx_ctrl_reg, 0, ctrl_callback , dbi ); } else { dbi->flags &= ~NET_CTRL_CHANGE; dbi->flags |= NET_CTRL_CHANGED; /* Qualifies the request by encoding the data transfer direction */ dbi->dr.bRequestType = DM9601_REQT_WRITE; dbi->dr.bRequest = DM9601_REQ_SET_REG; // Vendor defined values dbi->dr.wValue = cpu_to_le16(dbi->net_ctrl_reg); dbi->dr.wIndex = cpu_to_le16(0x0); // desired offset into register space dbi->dr.wLength = cpu_to_le16(0); // number of bytes to be transferred dbi->ctrl_urb.transfer_buffer_length = 0; usb_fill_control_urb( &dbi->ctrl_urb, dbi->usb, usb_sndctrlpipe(dbi->usb,0),

(char *)&dbi->dr, &dbi->net_ctrl_reg, 0, ctrl_callback , dbi ); } if ( (ret = usb_submit_urb( &dbi->ctrl_urb ,GFP_KERNEL)) ) err("%s: BAD CTRL %d, flags %x",__FUNCTION__,ret,dbi->flags ); return ret; } static int read_mii_word( dm9601_board_info_t *dbi, __u8 phy, __u8 index, __u16 *regd ) { set_register( dbi, 0x0c, index | 0x40 ); set_register( dbi, 0x0b, 0x0c ); udelay(100); set_register( dbi, 0x0b, 0x0 ); get_registers( dbi, 0xd, 2, regd); return 0; } static int write_mii_word( dm9601_board_info_t *dbi, __u8 phy, __u8 index, __u16 regd ) { set_register( dbi, 0x0c, index | 0x40 ); set_registers( dbi, 0xd, 2, &regd); set_register( dbi, 0x0b, 0x0a ); udelay(100); set_register( dbi, 0x0b, 0x0 ); return 0; } static int read_eprom_word( dm9601_board_info_t *dbi, __u8 index, __u16 *retdata ) { set_register( dbi, 0x0c, index ); set_register( dbi, 0x0b, 0x4 ); udelay(100); set_register( dbi, 0x0b, 0x0 ); get_registers( dbi, 0xd, 2, retdata); return 0; } static int write_eprom_word( dm9601_board_info_t *dbi, __u8 index, __u16 data ) { set_register(dbi, 0x0c, index); set_registers(dbi, 0x0d, 2, &data); set_register(dbi, 0x0b, 0x12); udelay(100); set_register(dbi, 0x0b, 0x0); return 0; } /* Read callback */ static void read_bulk_callback( struct urb *urb /*, struct pt_regs* reg*/ ) { dm9601_board_info_t *dbi = urb->context; // Get the address of dm9601_device struct net_device *net = dbi->net; int count = urb->actual_length, res; __u8 rx_status; struct sk_buff *skb; __u16 pkt_len; unsigned char * bufptr; if ( !dbi || !(dbi->flags & DM9601_RUNNING) )

return; if ( !netif_device_present(net) ) return; if ( dbi->flags & DM9601_RX_BUSY ) { dbi->stats.rx_errors++; dbg("DM9601 Rx busy"); return; } dbi->flags |= DM9601_RX_BUSY; switch ( urb->status ) { case USB_ST_NOERROR: break; case USB_ST_NORESPONSE: dbg( "reset MAC" ); dbi->flags &= ~DM9601_RX_BUSY; break; default: #ifdef RX_IMPROVE dbg("%s: RX status %d",net->name, urb->status ); goto goon; #endif ; } /* For RX improve ---------------------------*/ #ifdef RX_IMPROVE if (dbi->rx_buf_flag) { bufptr = dbi->rx_buff; /* Bulk urbs are initialized */ usb_fill_bulk_urb( &dbi->rx_urb, dbi->usb, usb_rcvbulkpipe(dbi->usb, 1), dbi->rx_buff2, DM9601_MAX_MTU, read_bulk_callback, dbi ); } else { bufptr = dbi->rx_buff2; usb_fill_bulk_urb( &dbi->rx_urb, dbi->usb, usb_rcvbulkpipe(dbi->usb, 1), dbi->rx_buff, DM9601_MAX_MTU, read_bulk_callback, dbi ); } if ( (res = usb_submit_urb(&dbi->rx_urb,GFP_ATOMIC)) ) warn("%s: failed submit rx_urb %d",__FUNCTION__,res); dbi->flags &= ~DM9601_RX_BUSY; dbi->rx_buf_flag = dbi->rx_buf_flag ? 0:1; #else bufptr = dbi->rx_buff; #endif /* ----------------------------------------------------------*/ if ( !count ) goto goon; rx_status = *(__u8 *)(bufptr); pkt_len = *(__u16 *)(bufptr + 1) - 4; dbi->stats.rx_bytes += pkt_len; if ( (rx_status & 0xbf) || (pkt_len > 1518) ) { dbi->stats.rx_errors++;

if (pkt_len > 1518) dbi->rx_longf_errors++; if (rx_status & 0x80) dbi->rx_runtf_errors++; if (rx_status & 0x20) dbi->rx_lc_errors++; if (rx_status & 0x10) dbi->rx_wdt_errors++; if (rx_status & 0x08) dbi->rx_ple_errors++; if (rx_status & 0x04) dbi->stats.rx_frame_errors++; if (rx_status & 0x02) dbi->stats.rx_crc_errors++; if (rx_status & 0x1) dbi->stats.rx_fifo_errors++; goto goon; } /* Allocates memory for an sk_buff and associates it with a packet payload buffer */ if ( !(skb = dev_alloc_skb(pkt_len + 2)) ) goto goon; skb->dev = net; skb_reserve(skb, 2); memcpy(skb_put(skb, pkt_len), bufptr + 3, pkt_len); skb->protocol = eth_type_trans(skb, net); netif_rx(skb); dbi->stats.rx_packets++; dbi->stats.rx_bytes += pkt_len; goon: #ifndef RX_IMPROVE usb_fill_bulk_urb( &dbi->rx_urb, dbi->usb,usb_rcvbulkpipe(dbi->usb, 1), dbi->rx_buff, DM9601_MAX_MTU, read_bulk_callback, dbi ); if ( (res = usb_submit_urb(&dbi->rx_urb,GFP_ATOMIC)) ) warn("%s: failed submit rx_urb %d",__FUNCTION__,res); dbi->flags &= ~DM9601_RX_BUSY; #endif ; } static void write_bulk_callback( struct urb *urb /*, struct pt_regs* reg */) { dm9601_board_info_t *dbi = urb->context; if ( !dbi || !(dbi->flags & DM9601_RUNNING) ) return; if ( !netif_device_present(dbi->net) ) return; if ( urb->status ) info("%s: TX status %d", dbi->net->name, urb->status); dbi->net->trans_start = jiffies; netif_wake_queue( dbi->net ); } #ifdef DM9601_USE_INTR static void intr_callback( struct urb *urb /* ,struct pt_regs* pt */) { dm9601_board_info_t *dbi = urb->context; struct net_device *net; __u8 *d; if ( !dbi ) return; switch ( urb->status ) {

case USB_ST_NOERROR: break; case USB_ST_URB_KILLED: return; default: info("intr status %d", urb->status); } d = urb->transfer_buffer; net = dbi->net; if ( !(d[6] & 0x04) && (d[0] & 0x10) ) { printk("<WARN> TX FULL %x %x\n", d[0], d[6]); dbi->flags |= DM9601_RESET_WAIT; } /* Auto Sense Media Policy: Fast EtherNet NIC: don't need to do. Force media mode: don't need to do. HomeRun/LongRun NIC and AUTO_Mode: INT_MII not link, select EXT_MII EXT_MII not link, select INT_MII */ if (!(d[0] & 0x40) && (dbi->nic_type != FASTETHER_NIC) && (dbi->op_mode == DM9601_AUTO) ) { dbi->net_ctrl_reg ^= 0x80; netif_stop_queue(net); dbi->flags |= NET_CTRL_CHANGE; ctrl_callback(&dbi->ctrl_urb /*, NULL*/ ); netif_wake_queue(net); } if ( (d[1] | d[2]) & 0xf4 ) { dbi->stats.tx_errors++; if ( (d[0] | d[1]) & 0x84) /* EXEC & JABBER */ dbi->stats.tx_aborted_errors++; if ( (d[0] | d[1]) & 0x10 ) /* LATE COL */ dbi->stats.tx_window_errors++; if ( (d[0] | d[1]) & 0x60 ) /* NO or LOST CARRIER */ dbi->stats.tx_carrier_errors++; } } #endif static void dm9601_tx_timeout( struct net_device *net ) { dm9601_board_info_t *dbi = net->priv; if ( !dbi ) return; warn("%s: Tx timed out.", net->name); dbi->tx_urb.transfer_flags |= URB_NO_FSBR; usb_unlink_urb( &dbi->tx_urb ); dbi->stats.tx_errors++; } static int dm9601_start_xmit( struct sk_buff *skb, struct net_device *net ) { dm9601_board_info_t *dbi = net->priv; //USB_ASYNC_UNLINK;

int count = skb->len + 2; int res; __u16 l16 = skb->len; netif_stop_queue( net ); if (!(count & 0x3f)) { count++; l16++; } ((__u16 *)dbi->tx_buff)[0] = cpu_to_le16(l16); memcpy(dbi->tx_buff + 2, skb->data, skb->len); #if 0 FILL_BULK_URB_TO( &dbi->tx_urb, dbi->usb, usb_sndbulkpipe(dbi->usb, 2), dbi->tx_buff, count, write_bulk_callback, dbi, jiffies + HZ ); #else usb_fill_bulk_urb( &dbi->tx_urb, dbi->usb, usb_sndbulkpipe(dbi->usb, 2), dbi->tx_buff, count, write_bulk_callback, dbi); #endif if ((res = usb_submit_urb(&dbi->tx_urb,GFP_KERNEL))) { warn("failed tx_urb %d", res); dbi->stats.tx_errors++; netif_start_queue( net ); } else { dbi->stats.tx_packets++; dbi->stats.tx_bytes += skb->len; net->trans_start = jiffies; } dev_kfree_skb(skb); return 0; } /* Enable user land to collect network statistics */ static struct net_device_stats *dm9601_netdev_stats( struct net_device *dev ) { return &((dm9601_board_info_t *)dev->priv)->stats; } static inline void disable_net_traffic( dm9601_board_info_t *dbi ) { __u8 reg5; write_mii_word(dbi, 1, 0, 0x8000); get_registers(dbi, 0x5, 1, &reg5); reg5 &= 0xfe; set_register(dbi, 0x5, reg5); set_register(dbi, 0x1f, 0x01); } static void set_phy_mode(dm9601_board_info_t *dbi) { __u16 phy_reg0 = 0x1000, phy_reg4 = 0x01e1; /* PHY media mode setting */ if ( !(dbi->op_mode & DM9601_AUTO) ) { switch(dbi->op_mode) { case DM9601_10MHF: phy_reg4 = 0x0021; break; case DM9601_10MFD: phy_reg4 = 0x0041; break; case DM9601_100MHF: phy_reg4 = 0x0081; break;

// RESET PHY // RX disable // PHY power down

case DM9601_100MFD: phy_reg4 = 0x0101; break; default: phy_reg0 = 0x8000; break; } write_mii_word(dbi, 1, 4, phy_reg4); write_mii_word(dbi, 1, 0, phy_reg0); } /* Active PHY */ set_register( dbi, 0x1e, 0x01 ); set_register( dbi, 0x1f, 0x00 ); } /* Init HomeRun DM9801 */ static void program_dm9801(dm9601_board_info_t *dbi, u16 HPNA_rev) { __u16 reg16, reg17, reg24, reg25; if ( !nfloor ) nfloor = DM9801_NOISE_FLOOR; read_mii_word(dbi, 1, 16, &reg16); read_mii_word(dbi, 1, 17, &reg17); read_mii_word(dbi, 1, 24, &reg24); read_mii_word(dbi, 1, 25, &reg25); switch(HPNA_rev) { case 0xb900: reg16 |= 0x1000; reg25 = ( (reg24 + nfloor) & 0x00ff) | 0xf000; break; case 0xb901: reg25 = ( (reg24 + nfloor) & 0x00ff) | 0xc200; reg17 = (reg17 & 0xfff0) + nfloor + 3; break; case 0xb902: case 0xb903: default: reg16 |= 0x1000; reg25 = ( (reg24 + nfloor - 3) & 0x00ff) | 0xc200; reg17 = (reg17 & 0xfff0) + nfloor; break; } write_mii_word(dbi, 1, 16, reg16); write_mii_word(dbi, 1, 17, reg17); write_mii_word(dbi, 1, 25, reg25); } /* Init LongRun DM9802 */ // Let GPIO0 output // Power_on PHY // Set PHY capability

// DM9801 E3

// DM9801 E4

// DM9801 E5 // DM9801 E6

static void program_dm9802(dm9601_board_info_t *dbi) { __u16 reg25; if ( !nfloor ) nfloor = DM9802_NOISE_FLOOR; read_mii_word(dbi, 1, 25, &reg25); reg25 = (reg25 & 0xff00) + nfloor; write_mii_word(dbi, 1, 25, reg25); } /* Identify NIC type */

static void identify_nic(dm9601_board_info_t* dbi) { __u16 phy_tmp; /* Select EXT_MII */ dbi->net_ctrl_reg |= 0x80; set_register(dbi, 0x00, dbi->net_ctrl_reg); /* EXT-MII */ read_mii_word(dbi, 1, 3, &phy_tmp); switch(phy_tmp & 0xfff0) { case 0xb900: read_mii_word(dbi, 1, 31, &phy_tmp); if (phy_tmp == 0x4404) { dbi->nic_type = HOMERUN_NIC; program_dm9801(dbi, phy_tmp); } else { dbi->nic_type = LONGRUN_NIC; program_dm9802(dbi); } break; default: dbi->nic_type = FASTETHER_NIC; } /* Select INT_MII */ dbi->net_ctrl_reg &= ~0x80; set_register(dbi, 0x00, dbi->net_ctrl_reg); } static void init_dm9601(struct net_device *net) { dm9601_board_info_t *dbi = (dm9601_board_info_t *)net->priv; /* User passed argument */ dbi->rx_ctrl_reg = reg5 | 0x01; dbi->net_ctrl_reg = 0x00; dbi->reg08 = reg8; dbi->reg09 = reg9; dbi->reg0a = rega; /* RESET device */ set_register(dbi, 0x00, 0x01); udelay(100); /* Reset */

/* NIC type: FASTETHER, HOMERUN, LONGRUN */ identify_nic(dbi); /* Set PHY */ dbi->op_mode = dm9601_mode; set_phy_mode(dbi); /* MII selection */ if ( (dbi->nic_type != FASTETHER_NIC) && (dbi->op_mode == DM9601_1M_HPNA) ) dbi->net_ctrl_reg |= 0x80; /* Program operating register */

set_register(dbi, 0x00, dbi->net_ctrl_reg); set_register(dbi, 0x08, dbi->reg08); set_register(dbi, 0x09, dbi->reg09); set_register(dbi, 0x0a, dbi->reg0a); set_register(dbi, 0xf4, 0x26); /* Reset EP1/EP2, INT always return */ set_registers(dbi, 0x10, 0x06, net->dev_addr); /* MAC addr */ dbi->hash_table[3] = 0x8000; /* Broadcast Address */ set_registers(dbi, 0x16, 0x08, dbi->hash_table); /* Hash Table */ set_register(dbi, 0x05, dbi->rx_ctrl_reg); /* Active RX */ } static int dm9601_open(struct net_device *net) { dm9601_board_info_t *dbi = (dm9601_board_info_t *)net->priv; int res; int owner; down(&dbi->ctrl_sem); // MOD_INC_USE_COUNT; owner = try_module_get(net->dev.class->owner); usb_fill_bulk_urb( &dbi->rx_urb, dbi->usb, usb_rcvbulkpipe(dbi->usb, 1), dbi->rx_buff, DM9601_MAX_MTU, read_bulk_callback, dbi ); if ( (res = usb_submit_urb(&dbi->rx_urb,GFP_ATOMIC)) ) warn("%s: failed rx_urb %d",__FUNCTION__,res); dbi->rx_buf_flag = 1; #ifdef DM9601_USE_INTR usb_fill_int_urb( &dbi->intr_urb, dbi->usb, usb_rcvintpipe(dbi->usb, 3),dbi->intr_buff, sizeof(dbi->intr_buff),intr_callback, dbi, dbi->intr_interval ); if ( (res = usb_submit_urb(&dbi->intr_urb,GFP_ATOMIC)) ) warn("%s: failed intr_urb %d",__FUNCTION__,res); #endif init_dm9601(net); netif_start_queue( net ); dbi->flags |= DM9601_RUNNING; up(&dbi->ctrl_sem); return 0; } static int dm9601_close( struct net_device *net ) { dm9601_board_info_t *dbi = net->priv; dbi->flags &= ~DM9601_RUNNING; netif_stop_queue(net); if ( !(dbi->flags & DM9601_UNPLUG) ) disable_net_traffic(dbi); usb_unlink_urb(&dbi->rx_urb); usb_unlink_urb(&dbi->tx_urb); usb_unlink_urb(&dbi->ctrl_urb); #ifdef DM9601_USE_INTR usb_unlink_urb(&dbi->intr_urb); #endif //MOD_DEC_USE_COUNT; module_put(net->dev.class->owner);

#ifdef STS_DBUG printk("<DM9601> rx errors: %lx \n", dbi->stats.rx_errors); printk("<DM9601> fifo over errors: %lx \n", dbi->stats.rx_fifo_errors); printk("<DM9601> crc errors: %lx \n", dbi->stats.rx_crc_errors); printk("<DM9601> alignment errors: %lx \n", dbi->stats.rx_frame_errors); printk("<DM9601> physical layer errors: %lx \n", dbi->rx_ple_errors); printk("<DM9601> watchdog errors: %lx \n", dbi->rx_wdt_errors); printk("<DM9601> late collision errors: %lx \n", dbi->rx_lc_errors); printk("<DM9601> runt frame errors: %lx \n", dbi->rx_runtf_errors); printk("<DM9601> long frame errors: %lx \n", dbi->rx_longf_errors); #endif return 0; } static int dm9601_ioctl( struct net_device *net, struct ifreq *rq, int cmd ) { __u16 *data = (__u16 *)&rq->ifr_data; dm9601_board_info_t *dbi = net->priv; switch(cmd) { case SIOCDEVPRIVATE: data[0] = dbi->phy; case SIOCDEVPRIVATE+1: read_mii_word(dbi, data[0], data[1]&0x1f, &data[3]); return 0; case SIOCDEVPRIVATE+2: if ( !capable(CAP_NET_ADMIN) ) return -EPERM; write_mii_word(dbi, dbi->phy, data[1] & 0x1f, data[2]); return 0; default: return -EOPNOTSUPP; } } /* Calculate the CRC valude of the Rx packet flag = 1 : return the reverse CRC (for the received packet CRC) 0 : return the normal CRC (for Hash Table index) */ static unsigned long cal_CRC(unsigned char * Data, unsigned int Len, u8 flag) { u32 crc = ether_crc_le(Len, Data); if (flag) return ~crc; return crc; } static void dm9601_set_multicast( struct net_device *net ) { dm9601_board_info_t *dbi = net->priv; struct dev_mc_list *mcptr = net->mc_list; int count = net->mc_count, i, hash_val; netif_stop_queue(net); if (net->flags & IFF_PROMISC) { dbi->rx_ctrl_reg |= RX_PROMISCUOUS;

info("%s: Promiscuous mode enabled", net->name); } else if (net->flags & IFF_ALLMULTI) { dbi->rx_ctrl_reg &= ~RX_PROMISCUOUS; info("%s set allmulti", net->name); } else { dbi->rx_ctrl_reg &= ~RX_PASS_MULTICAST; dbi->rx_ctrl_reg &= ~RX_PROMISCUOUS; /* Clear Hash Table */ for (i = 0; i < 4; i++) dbi->hash_table[i] = 0; /* Set Broadcast Address */ dbi->hash_table[3] = 0x8000; /* the multicast address in Hash Table : 64 bits */ for (i = 0; i < count; i++, mcptr = mcptr->next) { hash_val = cal_CRC((char *)mcptr->dmi_addr, 6, 0) & 0x3f; dbi->hash_table[hash_val / 16] |= (u16) 1 << (hash_val % 16); } info("%s: set Rx mode", net->name); } dbi->flags |= HASH_REGS_CHANGE | RX_CTRL_CHANGE; ctrl_callback(&dbi->ctrl_urb /*, NULL*/); netif_wake_queue(net); } static int dm9601_probe( struct usb_interface *udev, const struct usb_device_id *id) { struct net_device *net; dm9601_board_info_t *dbi; int dev_index = id - dm9601_ids; struct usb_device *dev = interface_to_usbdev (udev); int status; #if 0 if (usb_set_configuration(dev, dev->config[0].desc.bConfigurationValue)) { err("usb_set_configuration() failed"); return -ENODEV; } #endif // Allocates memory for the device-specific structure if(!(dbi = kmalloc(sizeof(dm9601_board_info_t), GFP_KERNEL))) { err("out of memory allocating device structure"); return -ENOMEM; } // // usb_inc_dev_use( dev ); printk("dev_index %d dbi %x\n",dev_index,dbi); usb_get_dev(dev); memset(dbi, 0, sizeof(dm9601_board_info_t)); // initialize dbi struct { usb_init_urb(&dbi->ctrl_urb); usb_init_urb(&dbi->rx_urb); usb_init_urb(&dbi->tx_urb); usb_init_urb(&dbi->intr_urb); usb_init_urb(&dbi->dump_urb); } dbi->dev_index = dev_index; init_waitqueue_head( &dbi->ctrl_wait );

/* Fill the usb_device and usb_interface */

//

net = init_etherdev( NULL, 0 ); net = alloc_etherdev(0); if ( !net ) { kfree( dbi ); return -ENOMEM; } init_MUTEX(&dbi->ctrl_sem); down(&dbi->ctrl_sem); dbi->usb = dev; dbi->net = net; net->priv = dbi; net->open = dm9601_open; net->stop = dm9601_close; net->watchdog_timeo = DM9601_TX_TIMEOUT; net->tx_timeout = dm9601_tx_timeout; net->do_ioctl = dm9601_ioctl; net->hard_start_xmit = dm9601_start_xmit; net->set_multicast_list = dm9601_set_multicast; net->get_stats = dm9601_netdev_stats; net->mtu = DM9601_MTU; dbi->intr_interval = 0xff;/* Default is 0x80 */ /* Get Node Address */ read_eprom_word(dbi, 0, (__u16 *)net->dev_addr); read_eprom_word(dbi, 1, (__u16 *)(net->dev_addr + 2)); read_eprom_word(dbi, 2, (__u16 *)(net->dev_addr + 4)); dbi->features = usb_dev_id[dev_index].private; info( "%s: %s", net->name, usb_dev_id[dev_index].name ); usb_set_intfdata (udev, dbi); // save data pointer in this interface SET_NETDEV_DEV(net, &udev->dev); status = register_netdev (net); up(&dbi->ctrl_sem); if (status) return status; // start as if the link is up netif_device_attach (net); return 0;

} static void dm9601_disconnect( struct usb_interface *intf/*struct usb_device *dev, void *ptr*/ ) { struct usb_device *dev = interface_to_usbdev (intf); // save this structure pointer bet-n probe & open invocation threads dm9601_board_info_t *dbi = usb_get_intfdata(intf); if ( !dbi ) { warn("unregistering non-existant device"); return; } dbi->flags |= DM9601_UNPLUG; unregister_netdev( dbi->net ); usb_dec_dev_use( dev ); usb_put_dev(dev);

//

kfree( dbi ); dbi = NULL; } static struct usb_driver dm9601_driver = { name: "dm9601", probe: dm9601_probe, disconnect: dm9601_disconnect, id_table: dm9601_ids, }; int __init dm9601_init(void) //Module initialisation { //info( "%s", version ); switch(mode) { case DM9601_10MHF: case DM9601_100MHF: case DM9601_10MFD: case DM9601_100MFD: case DM9601_1M_HPNA: dm9601_mode = mode; break; default: dm9601_mode = DM9601_AUTO; } nfloor = (nfloor > 15) ? 0:nfloor; return usb_register( &dm9601_driver ); } void __exit dm9601_exit(void) { usb_deregister( &dm9601_driver ); } module_init( dm9601_init ); module_exit( dm9601_exit ); // Module Exit // Unregister from the USB core //Register with USB core

NAMITA USB DEVICE : dm9601 CODING with single URB


#include <linux/init.h> #include <linux/module.h> #include <linux/types.h> #include <linux/errno.h> #include <asm/io.h> #include <linux/kernel.h> #include <linux/usb.h> #include <linux/etherdevice.h> #include <linux/gfp.h> #include <linux/slab.h> #include <asm/uaccess.h>

#include <linux/delay.h> MODULE_LICENSE("Dual BSD/GPL"); #define USB_SKEL_VENDOR_ID 0x0A46 #define USB_SKEL_PRODUCT_ID 0x9601 #define MAX_MTU 1536 /* Request */ #define READ_REGISTER 0x00 #define WRITE_REGISTERS 0x01 #define WRITE_REGISTER 0x03 #define READ_MEMORY 0x02 #define WRITE_MEMORYS 0x05 #define WRITE_MEMORY 0x07 /* Message value */ #define READ 0 #define WRITE 1 /* Device Registers. */ #define NWCTRLREG 0x00 #define NWSTATREG 0x01 #define TXCTRLREG 0x02 #define RXCTRLREG 0x05 #define RXSTATREG 0x06 #define BPTHRSREG 0x08 #define FCTRLTHSREG 0x09 #define RXTXFLCTRLREG 0x0A #define PHYCTRLREG 0x0B #define WKCTRLREG 0x0F #define PHYADDREG 0x10 #define GENPRCTRLREG 0x1E #define GENPRREG 0x1F /* Structure to describe the types of usb devices this driver support. */ static struct usb_device_id skel_table[]= { { USB_DEVICE(USB_SKEL_VENDOR_ID,USB_SKEL_PRODUCT_ID) }, {} }; /* Export the devices to the user space with which the module works. */ MODULE_DEVICE_TABLE(usb,skel_table); /* Private structure of the usb device. */ struct usb_skel { struct usb_device *udev; struct net_device *netdev; struct net_device_stats stats; struct usb_interface *interface; unsigned int bulk_in_endpointAddr; unsigned int bulk_out_endpointAddr; struct urb transmit_urb; struct urb receive_urb; unsigned char *tx_buf; unsigned char *rx_buf; spinlock_t lock;

/* Device representation */

}; /* Methods to implement in private device. */ static int skel_probe(struct usb_interface *intf,const struct usb_device_id *id); static struct net_device_stats *skel_get_stats (struct net_device *dev); static int skel_open(struct net_device *ndev); static int skel_transmit(struct sk_buff *skb,struct net_device *dev); static int skel_close(struct net_device *ndev); static void skel_disconnect(struct usb_interface *intf); static void skel_transmit_complete(struct urb *urb,struct pt_regs *regs); static void skel_receive_complete(struct urb *urb,struct pt_regs *regs); /* Initialising usb_driver structure. */ static struct usb_driver skel_driver= { .name = "skeleton", .id_table = skel_table, .probe = skel_probe, .disconnect = skel_disconnect, }; /* INITIALIZATION METHOD */ static int skel_init(void) { int result; result = usb_register(&skel_driver); //registration with usb core. if(result) printk(KERN_NOTICE "Module registration failed.\n"); else printk(KERN_NOTICE "Module loaded successfully.\n"); return result; } /* EXIT FUNCTION */ static void skel_exit(void) { usb_deregister(&skel_driver); printk(KERN_NOTICE "Module unloaded successfully.\n"); return; } /* Perform checks of information passed to it about the device and decide whether the driver is appropriate for this device. */ static int skel_probe(struct usb_interface *intf,const struct usb_device_id *id) { struct usb_host_interface *iface_desc; int i; struct usb_skel *dev; struct net_device *ndev = NULL; struct usb_device *udev; /* Device representation */ printk(KERN_NOTICE "Probe called.\n"); //allocation of netdevice ndev = alloc_etherdev(sizeof(struct usb_skel)); if(ndev == NULL) { printk(KERN_NOTICE "Allocation error.\n"); return -ENOMEM;

} dev = (struct usb_skel *)ndev->priv; memset(dev,0,sizeof(struct usb_skel)); dev->netdev = ndev; dev->interface = intf; udev = interface_to_usbdev(intf); dev->udev = udev; iface_desc = intf->cur_altsetting; usb_get_dev(udev); for(i = 0;i < iface_desc->desc.bNumEndpoints;++i) { //assigning local pointer to the endpoint structure for easier access. struct usb_endpoint_descriptor *endpoint = &iface_desc->endpoint[i].desc; if(!dev->bulk_in_endpointAddr && (endpoint->bEndpointAddress & USB_DIR_IN) && //check in or out endpoint. ((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) //check endpoint type. == USB_ENDPOINT_XFER_BULK))//if bulk in endpoint. { printk(KERN_NOTICE "set up in endpoint address.\n"); dev->bulk_in_endpointAddr = endpoint->bEndpointAddress; } if(!dev->bulk_out_endpointAddr && !(endpoint->bEndpointAddress & USB_DIR_IN) && ((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_BULK)) { printk(KERN_NOTICE "set up out endpoint address.\n"); dev->bulk_out_endpointAddr = endpoint->bEndpointAddress; } } if(!(dev->bulk_in_endpointAddr && dev->bulk_out_endpointAddr)) { printk(KERN_NOTICE "Could not find bulk in and bulk out endpoint.\n"); free_netdev(ndev); return -ENODEV; } //initialising the net_device structure ndev->open = skel_open; ndev->stop = skel_close; ndev->hard_start_xmit = skel_transmit; ndev->get_stats = skel_get_stats; SET_MODULE_OWNER(ndev); spin_lock_init(&dev->lock); usb_init_urb(&dev->transmit_urb); usb_init_urb(&dev->receive_urb); memcpy(ndev->name,"usb%d",4); //setting the MAC address usb_control_msg(dev->udev,usb_rcvctrlpipe(dev->udev,0),READ_REGISTER, (USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE),READ, PHYADDREG,ndev->dev_addr,6,USB_CTRL_SET_TIMEOUT);

/* Fill the usb_device and usb_interface */ //walk through every endpoints of interf.

//registering the net_device if(register_netdev(ndev)) { skel_disconnect(intf); return -ENODEV; } // save data pointer in this interface bet-n open & probe usb_set_intfdata(intf,dev); netif_device_attach(ndev); return 0; } /* To return device's statistics. */ static struct net_device_stats *skel_get_stats(struct net_device *dev) { struct usb_skel *udev = netdev_priv(dev); return &udev->stats; } /* To open device. */ static int skel_open(struct net_device *ndev) { struct usb_skel *dev = netdev_priv(ndev); int retval; u8 data = 0x01; //provide a faade of synchronous URB submission retval = usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS, 0x40,0x00, NWCTRLREG,&data,1,0); if(retval == 1) printk(KERN_NOTICE "Resetting done.\n"); udelay(10); retval = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS,0x40,0x00,RXCTRLREG,&data,1,0); if(retval == 1) printk(KERN_NOTICE "Receiver enable.\n"); //allocating transmit buffer dev->tx_buf=usb_buffer_alloc(dev->udev,MAX_MTU,GFP_KERNEL,&dev-> transmit_urb.transfer_dma); if(!dev->tx_buf) return -ENOMEM; dev->transmit_urb.transfer_flags |=URB_NO_TRANSFER_DMA_MAP; //allocating receive buffer dev->rx_buf=usb_buffer_alloc(dev->udev,MAX_MTU,GFP_KERNEL,&dev-> receive_urb.transfer_dma); if(!dev->rx_buf) return -ENOMEM; dev->receive_urb.transfer_flags |=URB_NO_TRANSFER_DMA_MAP; //filling the receive urb usb_fill_bulk_urb( &dev->receive_urb, dev->udev,usb_rcvbulkpipe(dev->udev,dev-> bulk_in_endpointAddr), dev->rx_buf,MAX_MTU,skel_receive_complete,dev); //submitting the receive urb to the usb core if (usb_submit_urb(&dev->receive_urb,GFP_ATOMIC))

{ printk(KERN_NOTICE"Failed submitting receive urb.\n"); return -EFAULT; } //start the transmit queue netif_start_queue(ndev); printk(KERN_NOTICE "Open called.\n"); return 0; } /* To transmit data between devices. */ static int skel_transmit(struct sk_buff *skb,struct net_device *ndev) { struct usb_skel *dev = netdev_priv(ndev); int size = skb->len + 2; int retval; printk(KERN_NOTICE "Transmit called.\n"); netif_stop_queue(ndev); dev->tx_buf[0] = skb->len; dev->tx_buf[1] = (skb->len) >> 8; //copying user space data to the DMA buffer if(memcpy(dev->tx_buf + 2,skb->data,skb->len) == NULL) { printk(KERN_NOTICE "Failed to copy.\n"); return -EFAULT; } //initialising the transmit urb usb_fill_bulk_urb(&dev->transmit_urb,dev->udev, usb_sndbulkpipe(dev->udev,dev->bulk_out_endpointAddr), dev->tx_buf,size,skel_transmit_complete,dev); spin_lock(&dev->lock); //submit the transmit urb to the usb core if((retval = usb_submit_urb(&dev->transmit_urb,GFP_KERNEL))) { printk(KERN_NOTICE "Failed submitting transmit urb"); dev->stats.tx_errors++; netif_start_queue(ndev); } else { dev->stats.tx_packets++; dev->stats.tx_bytes += skb->len; } dev_kfree_skb(skb); spin_unlock(&dev->lock); return NET_XMIT_SUCCESS; } /* Transmit callback. */ static void skel_transmit_complete(struct urb *urb,struct pt_regs *regs) { struct usb_skel *dev = urb->context; if(urb->status)

printk(KERN_NOTICE "Tx status : %x\n",urb->status); netif_wake_queue(dev->netdev); printk(KERN_NOTICE "Transmit callback called.\n"); } /* Receive callback. */ static void skel_receive_complete(struct urb *urb,struct pt_regs *regs) { struct usb_skel *dev = urb->context; struct sk_buff *skb ; __u8 rx_status; __u16 pkt_size; unsigned char *buf; int len = urb->actual_length; buf = dev->rx_buf; printk(KERN_NOTICE "recv com \n"); if(!len) goto out; rx_status = *(__u8 *)(buf); pkt_size = *(__u16 *)(buf + 1) - 4; printk(KERN_NOTICE "Receive status : %x\n",rx_status); dev->stats.rx_bytes += pkt_size; if ( (rx_status & 0xbf) || (pkt_size > MAX_MTU) ) { dev->stats.rx_errors++; if(rx_status & 0x02) dev->stats.rx_crc_errors++; if(rx_status & 0x1) dev->stats.rx_fifo_errors++; goto out; } if ( !(skb = dev_alloc_skb(pkt_size + 2)) ) goto out; skb->dev = dev->netdev; skb_reserve(skb, 2); memcpy(skb_put(skb, pkt_size), buf + 3, pkt_size); skb->protocol = eth_type_trans(skb, dev->netdev); netif_rx(skb); dev->stats.rx_packets++; dev->stats.rx_bytes += pkt_size; printk(KERN_NOTICE "Received %d bytes.\n",len); out: usb_fill_bulk_urb( &dev->receive_urb, dev->udev, usb_rcvbulkpipe(dev->udev,dev->bulk_in_endpointAddr), dev->rx_buf, MAX_MTU,skel_receive_complete,dev); if (usb_submit_urb(&dev->receive_urb,GFP_ATOMIC)) { printk(KERN_NOTICE"Failed submitting receive urb(rcom).\n"); } } /* To close the device. */ static int skel_close(struct net_device *ndev) { struct usb_skel *dev = netdev_priv(ndev); u8 data = 0x00;

int retval; netif_stop_queue(ndev); retval = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS,0x40,0x00, RXCTRLREG, &data, 1, 0); if(retval == 1) printk(KERN_NOTICE "Receiver disable.\n"); retval = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS,0x40,0x00,NWCTRLREG,&data,1,0); //freeing the transmit buffer usb_buffer_free(dev->transmit_urb.dev,dev->transmit_urb.transfer_buffer_length, &dev->transmit_urb.transfer_buffer,dev->transmit_urb.transfer_dma); //freeing the receive buffer usb_buffer_free(dev->receive_urb.dev,dev->receive_urb.transfer_buffer_length, &dev->receive_urb.transfer_buffer,dev->receive_urb.transfer_dma); usb_unlink_urb(&dev->transmit_urb); usb_unlink_urb(&dev->receive_urb); printk(KERN_NOTICE "Close called.\n"); return 0; } /* To clean up the device. */ static void skel_disconnect(struct usb_interface *intf) { // save this structure pointer bet-n probe & open invocation threads struct usb_skel *dev = usb_get_intfdata(intf); int minor = intf->minor; struct net_device *ndev = dev->netdev; struct usb_device *udev; udev = interface_to_usbdev (intf); usb_set_intfdata(intf,NULL); unregister_netdev(ndev);

/* Device representation */ /* Zero out interface data */

//free_netdev(ndev); usb_put_dev (udev); kfree(dev); dev = NULL; printk(KERN_NOTICE "USB skeleton %d disconnected.\n",minor); } module_init(skel_init); module_exit(skel_exit);

multiple URB
#include <linux/init.h> #include <linux/module.h>

#include <linux/types.h> #include <linux/errno.h> #include <asm/io.h> #include <linux/kernel.h> #include <linux/usb.h> #include <linux/etherdevice.h> #include <linux/gfp.h> #include <linux/slab.h> #include <asm/uaccess.h> #include <linux/delay.h> MODULE_LICENSE("Dual BSD/GPL"); #define USB_SKEL_VENDOR_ID 0x0A46 #define USB_SKEL_PRODUCT_ID 0x9601 #define MAX_MTU 1536 /* Request */ #define READ_REGISTER 0x00 #define WRITE_REGISTERS 0x01 #define WRITE_REGISTER 0x03 #define READ_MEMORY 0x02 #define WRITE_MEMORYS 0x05 #define WRITE_MEMORY 0x07 /* Request Type */ #define READREG 0xC0 #define WRITEREG 0x40 /* Message value */ #define READ 0 #define MSGVAL 0x00 /* Device Registers. */ #define NWCTRLREG 0x00 #define NWSTATREG 0x01 #define TXCTRLREG 0x02 #define RXCTRLREG 0x05 #define RXSTATREG 0x06 #define BPTHRSREG 0x08 #define FCTRLTHSREG 0x09 #define RXTXFLCTRLREG 0x0A #define PHYCTRLREG 0x0B #define WKCTRLREG 0x0F #define PHYADDREG 0x10 #define GENPRCTRLREG 0x1E #define GENPRREG 0x1F #define USBCTRLREG 0xF4 #define RXCRCERR 0x02 #define RXFIOVFLERR 0x01 #define RXSTATUS 0xBF #define DVCRESET 0x01 #define BPTHRS 0x37 #define FLCTRLTHRS 0x38 #define RXTXFLCTRL 0xFF #define USBCTRL 0x26 #define RCVENB 0x01 /* Structure to describe the types of usb devices this driver support. */ static struct usb_device_id skel_table[]=

{ { USB_DEVICE(USB_SKEL_VENDOR_ID,USB_SKEL_PRODUCT_ID) }, {} }; /* Export the devices to the user space with which the module works. */ MODULE_DEVICE_TABLE(usb,skel_table); /* Private structure of the usb device. */ struct usb_skel { struct usb_device *udev; struct net_device *netdev; struct net_device_stats stats; struct usb_interface *interface; unsigned int bulk_in_endpointAddr; unsigned int bulk_out_endpointAddr; spinlock_t lock; };

/* Device representation */

/* Methods to implement in private device. */ static int skel_probe(struct usb_interface *intf,const struct usb_device_id *id); static struct net_device_stats *skel_get_stats (struct net_device *dev); static int skel_open(struct net_device *ndev); static int skel_transmit(struct sk_buff *skb,struct net_device *dev); static int skel_close(struct net_device *ndev); static void skel_disconnect(struct usb_interface *intf); static void skel_transmit_complete(struct urb *urb,struct pt_regs *regs); static void skel_receive_complete(struct urb *urb,struct pt_regs *regs); /* Initialising usb_driver structure. */ static struct usb_driver skel_driver= { .name = "skeleton", .id_table = skel_table, .probe = skel_probe, .disconnect = skel_disconnect, }; static int skel_init(void) { int result; result = usb_register(&skel_driver);//registration with usb core. if(result) printk(KERN_NOTICE "Module registration failed.\n"); else printk(KERN_NOTICE "Module loaded successfully.\n"); return result; } static void skel_exit(void) { usb_deregister(&skel_driver); printk(KERN_NOTICE "Module unloaded successfully.\n"); return; } /* Perform checks of information passed to it about the device

and decide whether the driver is appropriate for this device. */ static int skel_probe(struct usb_interface *intf,const struct usb_device_id *id) { struct usb_host_interface *iface_desc; int i; struct usb_skel *dev; struct net_device *ndev = NULL; struct usb_device *udev; /* Device representation */ printk(KERN_NOTICE "Probe called.\n"); //allocation of netdevice ndev = alloc_etherdev(sizeof(struct usb_skel)); if(ndev == NULL) { printk(KERN_NOTICE "Allocation error.\n"); return -ENOMEM; } dev = (struct usb_skel *)ndev->priv; memset(dev,0,sizeof(struct usb_skel)); dev->netdev = ndev; dev->interface = intf; udev = interface_to_usbdev(intf); dev->udev = udev; iface_desc = intf->cur_altsetting; usb_get_dev(udev); /* Fill the usb_device and usb_interface */ printk(KERN_NOTICE "No of end points : %d \n",iface_desc->desc.bNumEndpoints); for(i = 0;i < iface_desc->desc.bNumEndpoints;++i) //walk through every endpoints of interf. { //assigning local pointer to the endpoint structure for easier access. struct usb_endpoint_descriptor *endpoint = &iface_desc->endpoint[i].desc; if(!dev->bulk_in_endpointAddr && (endpoint->bEndpointAddress & USB_DIR_IN) && //check in or out endpoint. ((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) //check endpoint type. == USB_ENDPOINT_XFER_BULK))//if bulk in endpoint. { printk(KERN_NOTICE "set up in endpoint address.\n"); dev->bulk_in_endpointAddr = endpoint->bEndpointAddress; } if(!dev->bulk_out_endpointAddr && !(endpoint->bEndpointAddress & USB_DIR_IN) &&((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_BULK)) { printk(KERN_NOTICE "set up out endpoint address.\n"); dev->bulk_out_endpointAddr = endpoint->bEndpointAddress; } } if(!(dev->bulk_in_endpointAddr && dev->bulk_out_endpointAddr)) { printk(KERN_NOTICE "Could not find bulk in and bulk out endpoint.\n"); free_netdev(ndev); return -ENODEV; } //initialising the net_device structure

ndev->open = skel_open; ndev->stop = skel_close; ndev->hard_start_xmit = skel_transmit; ndev->get_stats = skel_get_stats; SET_MODULE_OWNER(ndev); spin_lock_init(&dev->lock); memcpy(ndev->name,"usb%d",4); //setting the MAC address usb_control_msg(dev->udev,usb_rcvctrlpipe(dev->udev,0),READ_REGISTER, (USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE),MSGVAL, PHYADDREG,ndev->dev_addr,ETH_ALEN,USB_CTRL_SET_TIMEOUT); //registering the net_device if(register_netdev(ndev)) { skel_disconnect(intf); return -ENODEV; } // save data pointer in this interface usb_set_intfdata(intf,dev); netif_device_attach(ndev); return 0; } /* To return device's statistics. */ static struct net_device_stats *skel_get_stats(struct net_device *dev) { struct usb_skel *udev = netdev_priv(dev); return &udev->stats; } /* To open device. */ static int skel_open(struct net_device *ndev) { struct usb_skel *dev = netdev_priv(ndev); int retval; u8 data = DVCRESET; unsigned char *rx_buf1,*rx_buf2; struct urb *rx_urb1,*rx_urb2; retval = usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS, WRITEREG,MSGVAL, NWCTRLREG, &data, 1,USB_CTRL_SET_TIMEOUT); if(retval == 1) printk(KERN_NOTICE "Resetting done.\n"); udelay(10); //configuring the device registers data = BPTHRS; usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev,0),WRITE_REGISTERS,WRITEREG, MSGVAL, BPTHRSREG,&data,1,USB_CTRL_SET_TIMEOUT); data = FLCTRLTHRS; usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev,0),WRITE_REGISTERS, WRITEREG,MSGVAL,FCTRLTHSREG,&data,1,USB_CTRL_SET_TIMEOUT);

data = RXTXFLCTRL; usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev,0),WRITE_REGISTERS, WRITEREG, MSGVAL,RXTXFLCTRLREG,&data,1, USB_CTRL_SET_TIMEOUT); data = USBCTRL; usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev,0),WRITE_REGISTERS, WRITEREG, MSGVAL,USBCTRLREG,&data,1, USB_CTRL_SET_TIMEOUT); data = RCVENB; retval = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev,0),WRITE_REGISTERS, WRITEREG, MSGVAL, RXCTRLREG,&data,1,USB_CTRL_SET_TIMEOUT); if(retval == 1) printk(KERN_NOTICE "Receiver enable.\n"); //allocating receive urbs rx_urb1 = usb_alloc_urb(0,GFP_KERNEL); // function allocates and 0s-out URB memory rx_urb1->transfer_flags |=URB_NO_TRANSFER_DMA_MAP; rx_urb2 = usb_alloc_urb(0,GFP_KERNEL); // function allocates and 0s-out URB memory rx_urb2->transfer_flags |=URB_NO_TRANSFER_DMA_MAP; //allocating receive buffers rx_buf1=usb_buffer_alloc(dev->udev,MAX_MTU,GFP_KERNEL,&rx_urb1-> transfer_dma); if(!rx_buf1) return -ENOMEM; rx_buf2=usb_buffer_alloc(dev->udev,MAX_MTU,GFP_KERNEL,&rx_urb2-> transfer_dma); if(!rx_buf2) return -ENOMEM; //filling the receive urbs usb_fill_bulk_urb(rx_urb1, dev->udev, usb_rcvbulkpipe(dev->udev,dev->bulk_in_endpointAddr), rx_buf1,MAX_MTU,skel_receive_complete,dev); usb_fill_bulk_urb(rx_urb2, dev->udev, usb_rcvbulkpipe(dev->udev,dev->bulk_in_endpointAddr), rx_buf2,MAX_MTU,skel_receive_complete,dev); //submitting the receive urbs to the usb core if (usb_submit_urb(rx_urb1,GFP_ATOMIC)) { printk(KERN_NOTICE"Failed submitting receive urb.\n"); return -EFAULT; } //submitting the receive urbs to the usb core

if (usb_submit_urb(rx_urb2,GFP_ATOMIC)) { printk(KERN_NOTICE"Failed submitting receive urb.\n"); return -EFAULT; } //start the transmit queue netif_start_queue(ndev); printk(KERN_NOTICE "Open called.\n"); return 0; } /* To transmit data between devices. */ static int skel_transmit(struct sk_buff *skb,struct net_device *ndev) { struct usb_skel *dev = netdev_priv(ndev); struct urb *tx_urb; unsigned char *tx_buf; int size = skb->len + 2; int retval; printk(KERN_NOTICE "Transmit called.\n"); //allocating transmit urb tx_urb = usb_alloc_urb(0,GFP_KERNEL); // function allocates and 0s-out URB memory if(!tx_urb) { return -ENOMEM; } tx_urb->transfer_flags |=URB_NO_TRANSFER_DMA_MAP; //allocating transmit buffer tx_buf = usb_buffer_alloc(dev->udev,size,GFP_KERNEL,&tx_urb->transfer_dma); if(!tx_buf) { return -ENOMEM; } //if packet size is multiple of 64,last pkt will be of zero size //so in such case increase the size by 1 to avoid empty packet transfer if(!(size & 0x3f)) { size++; skb->len++; } tx_buf[0] = skb->len; tx_buf[1] = (skb->len) >> 8; //copying user space data to the DMA buffer if(memcpy(tx_buf + 2,skb->data,skb->len) == NULL) { printk(KERN_NOTICE "Failed to copy.\n"); return -EFAULT; } //initialising the transmit urb, Populate the URB usb_fill_bulk_urb(tx_urb,dev->udev,

usb_sndbulkpipe(dev->udev,dev->bulk_out_endpointAddr), tx_buf,size,skel_transmit_complete,dev); spin_lock(&dev->lock); //submit the transmit urb to the usb core if((retval = usb_submit_urb(tx_urb,GFP_KERNEL))) { printk(KERN_NOTICE "Failed submitting transmit urb"); return -EFAULT; } dev_kfree_skb(skb); spin_unlock(&dev->lock); return NET_XMIT_SUCCESS; } /* Transmit callback. */ static void skel_transmit_complete(struct urb *urb,struct pt_regs *regs) { struct usb_skel *dev = urb->context; struct sk_buff *skb = urb->context; if(urb->status == 0) { dev->stats.tx_bytes += skb->len; dev->stats.tx_packets++; } else dev->stats.tx_errors++; printk(KERN_NOTICE "Tx status : %x\n",urb->status); usb_buffer_free(urb->dev,urb->transfer_buffer_length, urb->transfer_buffer,urb-> transfer_dma); printk(KERN_NOTICE "Transmit callback called.\n"); } /* Receive callback. */ static void skel_receive_complete(struct urb *urb,struct pt_regs *regs) { struct usb_skel *dev = urb->context; struct sk_buff *skb; __u8 rx_status; __u16 pkt_size; int len = urb->actual_length; printk(KERN_NOTICE "recv com \n"); if(!len) goto out; rx_status = *(__u8 *)(urb->transfer_buffer); pkt_size = *(__u16 *)(urb->transfer_buffer + 1) - 4; printk(KERN_NOTICE "Receive status : %x\n",rx_status); dev->stats.rx_bytes += pkt_size; if ( (rx_status & RXSTATUS) || (pkt_size > MAX_MTU) ) { dev->stats.rx_errors++; if(rx_status & RXCRCERR) dev->stats.rx_crc_errors++; if(rx_status & RXFIOVFLERR) dev->stats.rx_fifo_errors++; goto out; }

if ( !(skb = dev_alloc_skb(pkt_size + 2)) ) goto out; skb->dev = dev->netdev; skb_reserve(skb, 2); memcpy(skb_put(skb, pkt_size), urb->transfer_buffer + 3, pkt_size); skb->protocol = eth_type_trans(skb, dev->netdev); netif_rx(skb); dev->stats.rx_packets++; dev->stats.rx_bytes += pkt_size; printk(KERN_NOTICE "Received %d bytes.\n",len); out: if (usb_submit_urb(urb,GFP_ATOMIC)) //submitting the receive urbs to the usb core { printk(KERN_NOTICE"Failed submitting receive urb(rcom).\n"); } } /* To close the device. */ static int skel_close(struct net_device *ndev) { struct usb_skel *dev = netdev_priv(ndev); u8 data = 0x00; int retval; netif_stop_queue(ndev); //disabling the receiver. retval = usb_control_msg(dev->udev,usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS, WRITEREG,MSGVAL,RXCTRLREG,&data, 1,USB_CTRL_SET_TIMEOUT); if(retval == 1) printk(KERN_NOTICE "Receiver disable.\n"); //clearing the network control register retval = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev,0), WRITE_REGISTERS,WRITEREG,MSGVAL,NWCTRLREG,&data, 1,USB_CTRL_SET_TIMEOUT); printk(KERN_NOTICE "Close called.\n"); return 0; } /* Disconnect method. Called when the device is unplugged or when the module is unloaded */ static void skel_disconnect(struct usb_interface *intf) { // save this structure pointer bet-n probe & open invocation threads struct usb_skel *dev = usb_get_intfdata(intf); int minor = intf->minor; struct net_device *ndev = dev->netdev; struct usb_device *udev; udev = interface_to_usbdev (intf); usb_set_intfdata(intf,NULL);

/* Device representation */ /* Zero out interface data */

unregister_netdev(ndev); //free_netdev(ndev); usb_put_dev (udev); kfree(dev); dev = NULL; printk(KERN_NOTICE "USB skeleton %d disconnected.\n",minor); } module_init(skel_init); module_exit(skel_exit);

Вам также может понравиться