Академический Документы
Профессиональный Документы
Культура Документы
Word - size of pointer. Some computers are 32-bit and some are 64-bit.
Char - 1 byte, Short - 2 byte, int - 4 byte, long - 8 byte, float - 4 byte, double - 8 byte.
Little endian is written with the bytes "backwards". Still address it w the address of the first byte in memory
Logical right shift >> fills leftmost digits with zero, arithmetic fills leftmost digits with ones
When you cast between sign and unsigned, you keep the same bits but reinterpret them.
C implicitly casts when u compare signed and unsigned data. It casts signed to unsigned. This is confusing
!
!
!
Machine programming First 6 arguments in registers: rdi, rsi, rdx, rcx, r8, r9 More than 6 arguments: allocate stack space as needed
Program counter contains address of next instruction, called rip
$0x4 is constant, 0x4 is memory pointer. %rax is constant, (%rax) is memory pointer.
D(Rb, Ri, S) = Mem[Reg[Rb] + S*Reg[Ri] + D]
jmp = 1 = unconditional. je = ZF = equal/zero. jne = ~ZF = not equal/not zero. js = SF = negative. jns = ~SF = nonnegative. jg = ~(SF^)F) & ~ZF
= greater(signed). jge = ~(SF^)F) greater or equal (signed). jl (SF^OF) less (signed). jle = (SF^OF) | ZF = less or equal. ja = ~CF & ~ZF, above
(unsigned). jb = CF = below (unsigned.)
CF - unsigned overflow. Z if result of operation = 0. SF if result < 0. OF if twos complement overflow
cmpl b,a is like computing a-b without setting destination. ZF set if a==b, SF set if (a-b) < 0.
testl b,a is like computing a & b without setting destination. ZF set if a&b === 0 SF if a&b < 0. testl %rax, %rax check if %rax is +,0,-
Convention: %rax, return value. %rbx callee saved. %rcx 4th argument. %rdx 3rd argument. %rsi 2nd argument. %rdi 1st argument. %rbp callee
saved. %rsp stack pointer. %r8 5th argument. %r9 6th argument. %r10 callee saved. %r11 used for linking. %r12 unused for C. %r13, %r14,
%r15, callee saved.
Switch statements use jump tables
Alignment - 1 byte char, no restruction. 2 bytes short lowest bit must be 0. 4 bytes int, float, lowest two bits must be 00. 8 bytes double long char
* lowest 3 bits must be 000. Any struct has alignment requirement of largest alignment of any element
Union in C allocates according to largest element.
pushq Src fetches operand at SRC, decrements rsp by 8, writes operand at rsp.
call pushes return address and jumps to label. ret pops address and jumps to that address.
!
!
!
Caches Static RAM (SRAM) is faster, more expensive than DRAM. Retains value indefinitely as long as it is kept powered.
DRAM is slower, cheaper, and denser than SRAM. Value must be constantly refreshed and is sensitive to disturbances.
Both lose information if powered off.
Caches are organized where a cache is S * E * B bytes. There are 2^S sets and 2^E lines per set.
The address of a word will have (from left to right) a tag of t bits, a set index of s bits, and a block offset of b bits.
Each line of the cache will have a valid bit, followed by the tag, followed by the data
!
!
Optimization
{*xp += *yp; *xp += *yp; } is less efficient than { *xp += 2* *yp; }. You would think the compiler would make them the same, but it can't
because if yp and xp point to the same thing, then the former case will generate 4(*xp) and the latter 3(*xp). Similarly consider {x = 1000; y =
3000; *q = y; *p = x; t1 = *q;} t1 will equal 3000 unless q and p are the same in which case it will equal 1000.
!
!
!
!
!
Another case: you can't consider that x = f() + f() is the same as x = 2 * f() because what if f() has side effects like updating a global variable?
Eliminating loop inefficiencies with code motion: Instead of e.g. checking a vector's length every iteration, check it once and store it in an integer
if that result does not change.
Eliminating unneeded memory references: If you have in a loop like while(blah blah) { *a += i } change it to something like int d = * a; while
(blah blah) {d += i} *a = i;
Loop unrolling: Step through a loop several elements at a time
Enhancing parallelism: Basically combining loop unrolling and eliminating unneeded memory references
!
!
!
!
Linking C source files are individually compiled into .o relocatable object files, then a linker turns them into an executable program.
1. Symbol resolution. Object file has symbol table which is an array of structs with name, size, and location of symbo. During symbol resolution,
the linker associates each symbol reference with exactly one symbol definition.
2. Relocation. Merges code and data sections. Relocates symbols into the executable.
Three kinds of object file:
Relocatable (.o) comes from just one .c file
Executable
Shared object file (.so) which cna be loaded into memory and linked dynamically
!
!
ELF = standard binary format for object files, including all three of the above
Elf header, .text (code), .rodata (read only data, jump tables), .data (initialized global variables), .bss (unitialized global variables), .symtab
(symbol table, procedure and static variable names), .rel .text (relocation info for .text section), .rel .data section (relocation info for .data
section) .debug section, section header table
!
!
!
!
!
!!
!
!
Static libraries: Look in every file in the .a archive and if it resolves the symbol, link it into the executable
When u do gcc -L main.o whatever.a you have to do it in that order because it looks through the files in order
typdef is not part of the symbol table.
Shared libraries aka dlls are linked dynamically using the dlopen( ) function
ECF
Exceptional control flow exists at all levels of a computer system
1. Exceptions - implemented using combination of hardware and OS software
2. Process context switch - implemented by OS software and hardware timer
3. Signals - implemented by OS software
4. Nonlocal jumps - implemented by C runtime library
!
!
!
A process will be terminated by recieving a signal whose default action is to terminate, returning from the main function, or calling the exit
function.
Fork function is called once but returns twice. It returns 0 to the child, and child's PID to the parent.
Even when a process terminates, it still consumes memory.
You must reap it by using wait or waitpid
pid_t wait(int *child_statues)
pid_t waitpid(pid_t pid, int * child_status, int option)
The int child_status points to dill be set to a value to show the reason the child terminated. You can use WIFEXITED and WEXITSTATUS to get
this info
!
!
!
!!
!
Malloc
Performance goal #1: Throughput, number of completed requests per unit time. If there are 5000 malloc calls and 5000 free calls in 10 seconds
the throughput is 1000 operations/second.
Performance goal #2: Memory utilization. Highest ratio between the aggregate payload and the size of the heap.
!
!
3. When allocating a structure that is smaller than the free block it is placed in, what do we do with the extra space?
4. How do we pick a block to use for allocation?
5. How do we reinsert freed block?
!
!
!!
Virtual memory
The CPU sends a virtual address to the MMU which converts it into a physical address in main memory.
Physical memory is cached in DRAM.
Dram is about 10x slower than SRAM, disk is about 10,000 times slower than DRAM
Therefore the DRAM cache is fully associative
A page table is an array of page table entries (PTEs) which map virtual pages to physical pages
Each process has its own virtual address space, which might be shared in main memory if it's read-only code
Virtual address space looks like: Kernal virtual memory, user stack (grows down), memory-mapped region for shared libraries, heap (grows
up), .data, .bss, .init, .text, .rodata. (This is in reverse numerical order I guess).
PTEs have permission bits of SUP, READ, WRITE, and EXEC
!
!
Address translation:
P = 2^p: page size
N = 2^n: number of addresses in virtual address space
M = 2^m: number of addresses in physical address space
!
!
TLB hit: CPU sends virtual address to MMU. MMU sends VPN to TLB. TLB sends PTE back to MMU. MMU sends physical address to
memory. Memory sends data to CPU.
TLB miss: MMU has to get the PTE from main memory before it can get the physical address.
!
!
!
!
!
Pthreads interface:
pthread_create(thread id, thread attributes (usually NULL), thread routine, thread arguments)
pthread_join(thread id, return value) < reaps
pthread_self < determines one's thread id
pthread_cancel(), pthread exit() < terminates exit() < terminates all threads
!
!
!
!
!
Global variables are variables declared outside of a function. Virtual memory contains one instance of any global variable.
Local variables are declared inside function without static attribute. Each thread stack contains on instance.
Local static variables are like global.
Semaphores: integer variables manipulated by P and V operations
P(s):
V(s):
!
!!
!
Thread safety: A function is thread-unsafe if - it does not protect shared variables - it "keeps state across multiple invocations - it returns a
pointer to a static variable - it calls a thread-unsafe function
A function is reentrant if it accesses no shared variables. Subcategory of thread safe functions. Another way to describe reentrant: all variables are
stored on the stack frame