Вы находитесь на странице: 1из 4

Numbers

Word - size of pointer. Some computers are 32-bit and some are 64-bit.

Char - 1 byte, Short - 2 byte, int - 4 byte, long - 8 byte, float - 4 byte, double - 8 byte.

Little endian is written with the bytes "backwards". Still address it w the address of the first byte in memory

Logical right shift >> fills leftmost digits with zero, arithmetic fills leftmost digits with ones

When you cast between sign and unsigned, you keep the same bits but reinterpret them.

C implicitly casts when u compare signed and unsigned data. It casts signed to unsigned. This is confusing

!
!
!

Addition has same bit-level behavior on unsigned and signed data.



Floats: Represent number as (-1)^S * M * 2^(E). There are three parts - sign bit, exponent, and fraction. In float, these are 1, 8, and 23 bits

In double, these are 1, 11, and 52 bits

There are three cases 1: Normalized, when exponent is not all zeroes nor all ones. E = exponent-bias | bias - if there are k bits in exponent field,
2^(k-1) - 1. M = 1 + fraction 2: Denormalized, when exponent field is all zeroes. E = 1 - bias. M = f 3: Special values, when exponent field is
all ones. Positive infinity, fraction field all zeroes, sign bit is zero. Negative infinity fraction field all zeroes, sign bit is one. NaN fraction field is
nonzero.

Machine programming First 6 arguments in registers: rdi, rsi, rdx, rcx, r8, r9 More than 6 arguments: allocate stack space as needed

Program counter contains address of next instruction, called rip

$0x4 is constant, 0x4 is memory pointer. %rax is constant, (%rax) is memory pointer.

D(Rb, Ri, S) = Mem[Reg[Rb] + S*Reg[Ri] + D]

jmp = 1 = unconditional. je = ZF = equal/zero. jne = ~ZF = not equal/not zero. js = SF = negative. jns = ~SF = nonnegative. jg = ~(SF^)F) & ~ZF
= greater(signed). jge = ~(SF^)F) greater or equal (signed). jl (SF^OF) less (signed). jle = (SF^OF) | ZF = less or equal. ja = ~CF & ~ZF, above
(unsigned). jb = CF = below (unsigned.)

CF - unsigned overflow. Z if result of operation = 0. SF if result < 0. OF if twos complement overflow

cmpl b,a is like computing a-b without setting destination. ZF set if a==b, SF set if (a-b) < 0.

testl b,a is like computing a & b without setting destination. ZF set if a&b === 0 SF if a&b < 0. testl %rax, %rax check if %rax is +,0,-

Convention: %rax, return value. %rbx callee saved. %rcx 4th argument. %rdx 3rd argument. %rsi 2nd argument. %rdi 1st argument. %rbp callee
saved. %rsp stack pointer. %r8 5th argument. %r9 6th argument. %r10 callee saved. %r11 used for linking. %r12 unused for C. %r13, %r14,
%r15, callee saved.

Switch statements use jump tables

Alignment - 1 byte char, no restruction. 2 bytes short lowest bit must be 0. 4 bytes int, float, lowest two bits must be 00. 8 bytes double long char
* lowest 3 bits must be 000. Any struct has alignment requirement of largest alignment of any element

Union in C allocates according to largest element.

pushq Src fetches operand at SRC, decrements rsp by 8, writes operand at rsp.

call pushes return address and jumps to label. ret pops address and jumps to that address.

!
!
!

Caches Static RAM (SRAM) is faster, more expensive than DRAM. Retains value indefinitely as long as it is kept powered.

DRAM is slower, cheaper, and denser than SRAM. Value must be constantly refreshed and is sensitive to disturbances.

Both lose information if powered off.

Caches are organized where a cache is S * E * B bytes. There are 2^S sets and 2^E lines per set.

The address of a word will have (from left to right) a tag of t bits, a set index of s bits, and a block offset of b bits.

Each line of the cache will have a valid bit, followed by the tag, followed by the data

!
!

Optimization

{*xp += *yp; *xp += *yp; } is less efficient than { *xp += 2* *yp; }. You would think the compiler would make them the same, but it can't
because if yp and xp point to the same thing, then the former case will generate 4(*xp) and the latter 3(*xp). Similarly consider {x = 1000; y =
3000; *q = y; *p = x; t1 = *q;} t1 will equal 3000 unless q and p are the same in which case it will equal 1000.

!
!
!
!
!

Another case: you can't consider that x = f() + f() is the same as x = 2 * f() because what if f() has side effects like updating a global variable?

Eliminating loop inefficiencies with code motion: Instead of e.g. checking a vector's length every iteration, check it once and store it in an integer
if that result does not change.

Eliminating unneeded memory references: If you have in a loop like while(blah blah) { *a += i } change it to something like int d = * a; while
(blah blah) {d += i} *a = i;

Loop unrolling: Step through a loop several elements at a time

Enhancing parallelism: Basically combining loop unrolling and eliminating unneeded memory references

!
!
!
!

Linking C source files are individually compiled into .o relocatable object files, then a linker turns them into an executable program.

1. Symbol resolution. Object file has symbol table which is an array of structs with name, size, and location of symbo. During symbol resolution,
the linker associates each symbol reference with exactly one symbol definition.

2. Relocation. Merges code and data sections. Relocates symbols into the executable.

Three kinds of object file:

Relocatable (.o) comes from just one .c file

Executable

Shared object file (.so) which cna be loaded into memory and linked dynamically

!
!

ELF = standard binary format for object files, including all three of the above

Elf header, .text (code), .rodata (read only data, jump tables), .data (initialized global variables), .bss (unitialized global variables), .symtab
(symbol table, procedure and static variable names), .rel .text (relocation info for .text section), .rel .data section (relocation info for .data
section) .debug section, section header table

In a symbol table of a module m:



Global symbols defined by m and can be referenced by other modules

Global symbols referenced by m but defined by some other module

Local symbols defined and referenced exclusively by module m < not the same as local variables

!
!

Local non-static C variables: stored on stack



Local static c variables: in .data or .bss (same as global i guess, idk?)

A symbol is either strong or weak

Strong: procedures and initialized globals

Weak: uninitialized globals

Rules - Rule 1: Multiple strong symbols are not allowed



Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol

Rule 3: If there are multiple weak symbols, pick an arbitrary one

!
!
!
!!
!
!

Static libraries: Look in every file in the .a archive and if it resolves the symbol, link it into the executable

When u do gcc -L main.o whatever.a you have to do it in that order because it looks through the files in order

typdef is not part of the symbol table.

Shared libraries aka dlls are linked dynamically using the dlopen( ) function

ECF

Exceptional control flow exists at all levels of a computer system

1. Exceptions - implemented using combination of hardware and OS software

2. Process context switch - implemented by OS software and hardware timer

3. Signals - implemented by OS software

4. Nonlocal jumps - implemented by C runtime library

!
!
!

An exception transfers control to the kernal in response to some event



Theres an exception table with each type of exception number. The exception handler k is called each time exception k occurs.

Asynchronous exceptions (interrupts) are caused by timer interrupt so kernal can take back control, or I/O interrupt from external device. Handler
returns to next instruction

Synchronous exceptions:



Traps - intential in order to request services from the kernal. Returns control to next instruction



Faults - unintentional but possibly recoverable. Either reexecutes or aborts



Aborts - unrecoverable, aborts program

A process can be running, stopped, or terminated.


A process will be terminated by recieving a signal whose default action is to terminate, returning from the main function, or calling the exit
function.

Fork function is called once but returns twice. It returns 0 to the child, and child's PID to the parent.

Even when a process terminates, it still consumes memory.

You must reap it by using wait or waitpid

pid_t wait(int *child_statues)

pid_t waitpid(pid_t pid, int * child_status, int option)

The int child_status points to dill be set to a value to show the reason the child terminated. You can use WIFEXITED and WEXITSTATUS to get
this info

!
!

(come back to this later)



execve is called once and never returns

SIGKILL cannot be overriden or ignored

Signal is pending if sent but not recieved

U can block signals and theyll be recieved when unblocked

Signals are not queued, theres only one of each type

Use sigprocmask to block signals

Kernal blocks any pending signals of type currently being handled

!
!!
!

A function is async-signal-safe only if it's reentrant, or non-interruptible by signals


Malloc

Performance goal #1: Throughput, number of completed requests per unit time. If there are 5000 malloc calls and 5000 free calls in 10 seconds
the throughput is 1000 operations/second.

Performance goal #2: Memory utilization. Highest ratio between the aggregate payload and the size of the heap.

!
!

Internal fragmentation occurs if payload is smaller than block size.



External fragmentation occurs if there is enough aggregate heap memory, but no single free block is large enough

The 5 questions

1. Given just a pointer, how much memory do we free?

Standard method is to keep the length of a block in the word preceding the block

2. How do we keep track of the free blocks?

4 methods: Implicit list, explicit list, segregated free list, blocks sorted by size

3. When allocating a structure that is smaller than the free block it is placed in, what do we do with the extra space?

4. How do we pick a block to use for allocation?

5. How do we reinsert freed block?

Method 1: implicit list



You need size and allocation status

Instead of storing this in two words, you use lower order bits to describe allocated/free because they will always be zero.

Q4: When finding a free block, you can use first fit, next fit, or best fit (fewest bytes left over).

Q3: When allocating you split the block if theres enough left over

Q5: When freeing clear the allocated flag and then coalesce with nearby blocks. You need a header and a footer if you want bidirectional
coalescing

!
!
!!

Method 2: explicit free list



Free blocks contain pointers to the next free block and the previous free block

Method 3: segregated free list

Each size class of blocks has its own free list

Virtual memory

The CPU sends a virtual address to the MMU which converts it into a physical address in main memory.

Physical memory is cached in DRAM.

Dram is about 10x slower than SRAM, disk is about 10,000 times slower than DRAM

Therefore the DRAM cache is fully associative

A page table is an array of page table entries (PTEs) which map virtual pages to physical pages

Each process has its own virtual address space, which might be shared in main memory if it's read-only code

Virtual address space looks like: Kernal virtual memory, user stack (grows down), memory-mapped region for shared libraries, heap (grows
up), .data, .bss, .init, .text, .rodata. (This is in reverse numerical order I guess).

PTEs have permission bits of SUP, READ, WRITE, and EXEC

!
!

Address translation:

P = 2^p: page size

N = 2^n: number of addresses in virtual address space

M = 2^m: number of addresses in physical address space

!
!

TLB: Cache of PTEs in MMU



T = 2^t sets in TLB

Virtual address consists of: Virtual page number which consists of TLB tag from digits n-1 to p+t and TLB index from digits p+t-1 to p, and then
also virtual page offset which is from digits p-1 to 0.

Physical address consists of: Physical page offset (same as VPO), Physical page number

TLB hit: CPU sends virtual address to MMU. MMU sends VPN to TLB. TLB sends PTE back to MMU. MMU sends physical address to
memory. Memory sends data to CPU.

TLB miss: MMU has to get the PTE from main memory before it can get the physical address.

!
!
!
!
!

A PTE consists of VPN(index), PPN, and valid bit



Threads

Difference between thread and process:

Each thread has its own - Stack - Data registers - Condition codes - Stack pointer - Program counter

Threads share - Shared libraries - Heap - Read/write data - Read only code - Kernal context

Processes don't share any of that

Threads have pool of peers, processes form hierarchy

Threads are less expensive than processes

Pthreads interface:



pthread_create(thread id, thread attributes (usually NULL), thread routine, thread arguments)



pthread_join(thread id, return value) < reaps



pthread_self < determines one's thread id



pthread_cancel(), pthread exit() < terminates exit() < terminates all threads

!
!
!
!
!

Global variables are variables declared outside of a function. Virtual memory contains one instance of any global variable.

Local variables are declared inside function without static attribute. Each thread stack contains on instance.

Local static variables are like global.

Semaphores: integer variables manipulated by P and V operations

P(s):







V(s):




!
!!
!

- if s is nonzero, then decrement s by 1 and return immediately



- if s is zero, then suspend thread until s becomes nonzero and the thread is restarted by a V operation

- after restarting, the P operaiton decrements s and returns control to the caller

- Increment s by 1

- If there are any threads blocked in a P operation, restart exactly one of those threads

P locks mutex, V unlocks


Thread safety: A function is thread-unsafe if - it does not protect shared variables - it "keeps state across multiple invocations - it returns a
pointer to a static variable - it calls a thread-unsafe function

A function is reentrant if it accesses no shared variables. Subcategory of thread safe functions. Another way to describe reentrant: all variables are
stored on the stack frame

Вам также может понравиться