Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

The Complete Friday Q&A: Volume III
The Complete Friday Q&A: Volume III
The Complete Friday Q&A: Volume III
Ebook2,206 pages8 hours

The Complete Friday Q&A: Volume III

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The Complete Friday Q&A is a collection of articles on advanced topics in macOS and iOS programming. With articles on multithreading, assembly language, debugging, Objective-C, and more, this book is your gateway to becoming fluent in complicated, obscure, and arcane corners of Mac and iOS programming.
LanguageEnglish
PublisherLulu.com
Release dateSep 19, 2017
ISBN9781387241026
The Complete Friday Q&A: Volume III

Related to The Complete Friday Q&A

Related ebooks

Computers For You

View More

Related articles

Reviews for The Complete Friday Q&A

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    The Complete Friday Q&A - Mike Ash

    Index

    About The Complete Friday Q&A: Volume III

    Friday Q&A is a biweekly series on Mac programming. It can be found online at https://mikeash.com/pyblog/. Volume III is a full archive of all posts from December 2012 to April 2016.

    The author gratefully acknowledges all of the topic and comment contributions to Friday Q&A from its readers.

    The Complete Friday Q&A: Volume III Copyright © 2012-2017 by Michael Ash

    Mike Ash

    mike@mikeash.com

    https://mikeash.com/

    Introduction

    It's been a long time since the release of The Complete Friday Q&A: Volume I, and the result is a massive backlog of articles. The contents of this volume were originally intended to be part of Volume II, but the result would have been too large. Instead, I'm releasing Volumes II and III simultaneously, which together contain the articles in question.

    Like Volume II, Volume III contains some articles from guest authors. Landon Fuller, Matthew Elton, and Gwynne Raskind contributed articles for this volume, and I'm delighted to present their articles next to mine. Their articles are indicated by bylines under the article title. Articles without bylines are my own.

    I hope you enjoy the unusual and occasionally absurd programming content collected here. As always, if you have an idea for a topic that you'd like to see covered in Friday Q&A, send it in!

    Acknowledgements

    Special thanks go to Landon Fuller, Matthew Elton, and Gwynne Raskind, who contributed articles included in this book. Letting someone write for your blog is like letting them stay in your house and borrow your car, and they lived up to the trust I placed in them and then some.

    I would like to thank my reviewers, whose valuable input dramatically improved this book. They are: Harry Jordan, Steven Vandeweghe, Matthias Neeracher, Phil Holland, Matthias Neeracher, Alex Blewitt, Landon Fuller, Joshua Pokotilow, and Cédric Luthi.

    I would also like to thank everyone who contributed the topic ideas used throughout this book. Their names can be found at the beginning of each chapter.

    Finally, I would like to thank everyone who has commented on one of my posts, e-mailed about Friday Q&A, or merely read it. No matter what your contribution, it is appreciated.

    Dedication

    The Complete Friday Q&A Volumes II and III are dedicated to the memory of my friend and fellow glider club member Steve Zaboji. Steve was killed in a plane crash the same afternoon that I received the proofs of these books. He was a central part of the club and will be deeply missed.

    Friday Q&A 2012-12-14:

    Objective-C Pitfalls

    Related Articles

    Windows and Window Controllers

    Proper Use of Asserts

    Let's Build stringWithFormat:

    Swifty Target/Action

    Performance Comparisons of Common Operations, 2016 Edition

    Objective-C is a powerful and extremely useful language, but it's also a bit dangerous. For today's article, my colleague Chris Denter suggested that I talk about pitfalls in Objective-C and Cocoa, inspired by Cay S. Horstmann's article on C++ pitfalls.

    Introduction

    I'll use the same definition as Horstmann: a pitfall is code that compiles, links, runs, but doesn't do what you might expect it to. He provides this example, which is just as problematic in Objective-C as it is in C++:

    if

    (

    -

    0.5

    <=

    x

    <=

    0.5

    )

    return

    0

    ;

    A naive reading of this code would be that it checks to see whether

    x

    is in the range [-0.5, 0.5]. However, that's not the case. Instead, the comparison gets evaluated like this:

    if

    ((

    -

    0.5

    <=

    x

    )

    <=

    0.5

    )

    In C, the value of a comparison expression is an

    int

    , either

    0

    or

    1

    , a legacy from when C had no built-in boolean type. It is that

    0

    or

    1

    , not the value of

    x

    , that is compared with 0.5. In effect, the second comparison works as an extremely weirdly phrased negation operator, such that the if statement's body will execute if and only if

    x

    is less than -0.5.

    Nil Comparison

    Objective-C is highly unusual in that sending messages to

    nil

    does nothing and simply returns

    0

    . In nearly every other language you're likely to encounter, the equivalent is either prohibited by the type system or produces a runtime error. This can be both good and bad. Given the subject of the article, we'll concentrate on the bad.

    First, let's look at equality testing:

    [

    nil

    isEqual:

    @string

    ]

    Messaging

    nil

    returns

    0

    , which in this case is equivalent to

    NO

    . That happens to be the correct answer here, so we're off to a good start! However, consider this:

    [

    nil

    isEqual:

    nil

    ]

    This also returns

    NO

    . It doesn't matter that the argument is the exact same value. The argument's value doesn't matter at all, because messages to

    nil

    always return

    0

    no matter what. So going by

    isEqual:

    ,

    nil

    never equals anything, including itself. Mostly right, but not always.

    Finally, consider one more permutation with

    nil

    :

    [

    @string

    isEqual:

    nil

    ]

    What does this do? Well, we can't be sure. It may return

    NO

    . It may throw an exception. It may simply crash. Passing

    nil

    to a method that doesn't explicitly say it's allowed is a bad idea, and

    isEqual:

    doesn't say that it accepts

    nil

    .

    Many Cocoa classes also include a

    compare:

    method. This takes another object of the same class and returns either

    NSOrderedAscending

    ,

    NSOrderedSame

    , or

    NSOrderedDescending

    , to indicate less than, equal, or greater than.

    What happens if we compare with

    nil

    ?

    [

    nil

    compare:

    nil

    ]

    This returns

    0

    , which happens to be equal to

    NSOrderedSame

    . Unlike

    isEqual:

    ,

    compare:

    thinks

    nil

    equals

    nil

    . Handy! However:

    [

    nil

    compare:

    @string

    ]

    This also returns

    NSOrderedSame

    , which is definitely the wrong answer.

    compare:

    will consider

    nil

    to be equal to anything and everything.

    Finally, like

    isEqual:

    , passing

    nil

    as the parameter is a bad idea:

    [

    @string

    compare:

    nil

    ]

    In short, be careful with

    nil

    and comparisons. It really doesn't work right. If there's any chance your code will encounter

    nil

    , you must check for and handle it separately before you start doing

    isEqual:

    or

    compare:

    .

    Hashing

    You write a little class to contain some data. You have multiple equivalent instances of this class, so you implement

    isEqual:

    so that those instances will be treated as equal. Then you start adding your objects to an

    NSSet

    and things start behaving strangely. The set claims to hold multiple objects after you just added one. It can't find stuff you just added. It may even crash or corrupt memory.

    This can happen if you implement

    isEqual:

    but don't implement

    hash

    . A lot of Cocoa code requires that if two objects compare as equal, they will also have the same hash. If you only override

    isEqual:

    , you violate that requirement. Any time you override

    isEqual:

    , always override

    hash

    at the same time. For more information, see my article on Implementing Equality and Hashing.

    Macros

    Imagine you're writing some unit tests. You have a method that's supposed to return an array containing a single object, so you write a test to verify that:

    STAssertEqualObjects

    ([

    obj

    method

    ],

    @

    [

    @expected

    ],

    @Didn't get the expected array

    );

    This uses the new literals syntax to keep things short. Nice, right?

    Now we have another method that returns two objects, so we write a test for that:

    STAssertEqualObjects

    ([

    obj

    methodTwo

    ],

    @

    [

    @expected1

    ,

    @expected2

    ],

    @Didn't get the expected array

    );

    Suddenly, the code fails to compile and produces completely bizarre errors. What's going on?

    What's going on is that

    STAssertEqualObjects

    is a macro. Macros are expanded by the preprocessor, and the preprocessor is an ancient and fairly dumb program that doesn't know anything about modern Objective-C syntax, or for that matter modern C syntax. The preprocessor splits macro arguments on commas. It's smart enough to know that parentheses can nest, so this is seen as three arguments:

    Macro

    (

    a

    ,

    (

    b

    ,

    c

    ),

    d

    )

    Where the first argument is

    a

    , the second is

    (b,

    c)

    , and the third is

    d

    . However, the preprocessor has no idea that it should do the same thing for

    []

    and

    {}

    . With the above macro, the preprocessor sees four arguments:

    [obj

    methodTwo]

    @[

    @expected1

    @"expected2

    ]

    @"Didn't

    get

    the

    expected

    array"

    This results in completely mangled code that not only doesn't compile, but confuses the compiler so much that it can't provide understandable diagnostics. The solution is easy, once you know what the problem is. Parenthesize the literal so the preprocessor treats it as one argument:

    STAssertEqualObjects

    ([

    obj

    methodTwo

    ],

    (

    @

    [

    @expected1

    ,

    @expected2

    ]),

    @Didn't get the expected array

    );

    Unit tests are where I've run into this most frequently, but it can pop up any time there's a macro. Objective-C literals will fall victim, as will C compound literals. Blocks can also be problematic if you use the comma operator within them, which is rare but legal. You can see that Apple thought about this problem with their

    Block_copy

    and

    Block_release

    macros in

    /usr/include/Block.h

    :

    #define Block_copy(...) ((__typeof(__VA_ARGS__))_Block_copy((const void *)(__VA_ARGS__)))

    #define Block_release(...) _Block_release((const void *)(__VA_ARGS__))

    These macros conceptually take a single argument, but they're declared to take variable arguments to avoid this problem. By taking

    ...

    and using

    __VA_ARGS__

    to refer to the argument, multiple arguments with commas are reproduced in the macro's output. You can take the same approach to make your own macros safe from this problem, although it only works on the last argument of a multi-argument macro.

    Property Synthesis

    Take the following class:

    @interface

    MyClass

    :

    NSObject

    {

    NSString

    *

    _myIvar

    ;

    }

    @property

    (

    copy

    )

    NSString

    *

    myIvar

    ;

    @end

    @implementation

    MyClass

    @synthesize

    myIvar

    ;

    @end

    Nothing wrong with this, right? The ivar declaration and

    @synthesize

    are a little redundant in this modern age, but do no harm.

    Unfortunately, this code will silently ignore

    _myIvar

    and synthesize a new variable called

    myIvar

    , without the leading underscore. If you have code that uses the ivar directly, it will see a different value from code that uses the property. Confusion!

    The rules for

    @synthesize

    variable names are a little weird. If you specify a variable name with

    @synthesize

    myIvar

    =

    _myIvar;

    , then of course it uses whatever you specify. If you leave out the variable name, then it synthesizes a variable with the same name as the property. If you leave out

    @synthesize

    altogether, then it synthesizes a variable with the same name as the property, but with a leading underscore.

    Unless you need to support 32-bit Mac, your best bet these days is to avoid explicitly declaring backing ivars for properties. Let

    @synthesize

    create the variable, and if you get the name wrong, you'll get a nice compiler error instead of mysterious behavior.

    Interrupted System Calls

    Cocoa code usually sticks to higher level constructs, but sometimes it's useful to drop down a bit and do some

    POSIX

    . For example, this code will write some data to a file descriptor:

    int

    fd

    ;

    NSData

    *

    data

    =

    ...;

    const

    char

    *

    cursor

    =

    [

    data

    bytes

    ];

    NSUInteger

    remaining

    =

    [

    data

    length

    ];

    while

    (

    remaining

    >

    0

    )

    {

    ssize_t

    result

    =

    write

    (

    fd

    ,

    cursor

    ,

    remaining

    );

    if

    (

    result

    <

    0

    )

    {

    NSLog

    (

    @Failed to write data: %s (%d)

    ,

    strerror

    (

    errno

    ),

    errno

    );

    return

    ;

    }

    remaining

    -=

    result

    ;

    cursor

    +=

    result

    ;

    }

    However, this can fail, and it will fail strangely and intermittently. POSIX calls like this can be interrupted by signals. Even harmless signals handled elsewhere in the app like

    SIGCHLD

    or

    SIGINFO

    can cause this.

    SIGCHLD

    can occur if you're using

    NSTask

    or are otherwise working with subprocesses. When

    write

    is interrupted by a signal, it returns

    -1

    and sets

    errno

    to

    EINTR

    to indicate that the call was interrupted. The above code treats all errors as fatal and will bail out, even though the call just needs to be tried again. The correct code checks for that separately and retries the call:

    while

    (

    remaining

    >

    0

    )

    {

    ssize_t

    result

    =

    write

    (

    fd

    ,

    cursor

    ,

    remaining

    );

    if

    (

    result

    <

    0

    &&

    errno

    ==

    EINTR

    )

    {

    continue

    ;

    }

    else

    if

    (

    result

    <

    0

    )

    {

    NSLog

    (

    @Failed to write data: %s (%d)

    ,

    strerror

    (

    errno

    ),

    errno

    );

    return

    ;

    }

    remaining

    -=

    result

    ;

    cursor

    +=

    result

    ;

    }

    String Lengths

    The same string, represented differently, can have different lengths. This is a relatively common but incorrect pattern:

    write

    (

    fd

    ,

    [

    string

    UTF8String

    ],

    [

    string

    length

    ]);

    The problem is that

    NSString

    computes length in terms of UTF-16 code units, while

    write

    wants a count of bytes. While the two numbers are equal when the string only contains ASCII (which is why people so frequently get away with writing this incorrect code), they're no longer equal once the string contains non-ASCII characters such as accented characters. Always compute the length of the same representation you're manipulating:

    const

    char

    *

    cStr

    =

    [

    string

    UTF8String

    ];

    write

    (

    fd

    ,

    cStr

    ,

    strlen

    (

    cStr

    ));

    Casting to BOOL

    Take this bit of code that checks to see whether an object pointer is

    nil

    :

    -

    (

    BOOL

    )

    hasObject

    {

    return

    (

    BOOL

    )

    _object

    ;

    }

    This works... usually. However, roughly 6% of the time, it will return

    NO

    even though

    _object

    is not

    nil

    . What gives?

    The

    BOOL

    type is, unfortunately, not a boolean. Here's how it's defined:

    typedef

    signed

    char

    BOOL

    ;

    This is another bit of legacy from the days when C had no boolean type. Cocoa predates C99's

    _Bool

    , so it defines its boolean type as a

    signed

    char

    , which is an 8-bit integer. When you cast a pointer to an integer, you get the numeric value of that pointer. When you cast a pointer to a small integer, you just get the numeric value of the lower bits of that pointer. When the pointer looks like this:

    ...

    .110011001110000

    The

    BOOL

    gets this:

    01110000

    This is not

    0

    , meaning that it evaluates as true, so what's the problem? The problem is when the pointer looks like this:

    ...

    .110011000000000

    Then the

    BOOL

    gets this:

    00000000

    This is

    0

    , also known as

    NO

    , even though the pointer wasn't

    nil

    . Oops!

    How often does this happen? There are

    256

    possible values in the

    BOOL

    , only one of which is

    NO

    , so we'd naively expect it to happen about 1/256 of the time. However, Objective-C objects are placed on aligned addresses, normally aligned to

    16

    bytes. This means that the bottom four bits of the pointer are always zero (something that tagged pointers take advantage of) and there are only four bits of freedom in the resulting

    BOOL

    . The odds of getting all zeroes there are about 1/16, or about 6%.

    To safely implement this method, perform an explicit comparison against

    nil

    :

    -

    (

    BOOL

    )

    hasObject

    {

    return

    _object

    !=

    nil

    ;

    }

    If you want to get clever and unreadable, you can also use the

    !

    operator twice. This

    !!

    construct is sometimes referred to as C's convert to boolean operator, although it's built from parts:

    -

    (

    BOOL

    )

    hasObject

    {

    return

    !!

    _object

    ;

    }

    The first

    !

    produces

    1

    or

    0

    depending on whether

    _object

    is

    nil

    , but backwards. The second

    !

    then puts it right, resulting in

    1

    if

    _object

    is not

    nil

    , and

    0

    if it is.

    You should probably stick to the

    !=

    nil

    version.

    Missing Method Argument

    Let's say you're implementing a table view data source. You add this to your class's methods:

    -

    (

    id

    )

    tableView:

    (

    NSTableView

    *

    )

    objectValueForTableColumn:

    (

    NSTableColumn

    *

    )

    aTableColumn

    row:

    (

    NSInteger

    )

    rowIndex

    {

    return

    [

    dataArray

    objectAtIndex:

    rowIndex

    ];

    }

    Then you run your app and

    NSTableView

    complains that you haven't implemented this method. But it's right there!

    As usual, the computer is correct. The computer is your friend.

    Look closer. The first parameter is missing. Why does this even compile?

    It turns out that Objective-C allows empty selector segments. The above does not declare a method named

    tableView:objectValueForTableColumn:row:

    with a missing argument name. It declares a method named

    tableView::row:

    , and the first argument is named

    objectValueForTableColumn

    . This is a particularly nasty way to typo the name of a method, and if you do it in a context where the compiler can't warn you about the missing method, you may be trying to debug it for a long time.

    Conclusion

    Objective-C and Cocoa have plenty of pitfalls ready to trap the unwary programmer. The above is just a sampling. However, it's a good list of things to be careful of.

    Friday Q&A 2012-12-28:

    What Happens When You Load a Byte of Memory

    Related Articles

    ARM64 and You

    Why Registers Are Fast and RAM Is Slow

    When an Autorelease Isn't

    A Heartbleed-Inspired Paranoid Memory Allocator

    Let's Build NSZombie

    Swift Struct Storage

    The hardware and software that our apps run on is almost frighteningly complicated, and there's no better place to see that than in the contortions that the system goes through when we load data from memory. What exactly happens when we load a byte of memory? Reader and friend of the blog Guy English suggested I dedicate an article to answering that question.

    Code

    Let's start with the code that loads the byte of memory. In C, it would look something like this:

    char

    *

    addr

    =

    ...;

    char

    value

    =

    *

    addr

    ;

    On

    x86-64

    , this compiles to something like:

    movsbl

    (

    %

    rdi

    ),

    %

    eax

    This instructs the CPU to load the byte located at the address stored in

    %rdi

    into the

    %eax

    register. On ARM, the compiler produces:

    ldrsb

    .

    w

    r0

    ,

    [

    r0

    ]

    Although the instruction name is different, the effect is the same. It loads the byte located at the address stored in

    r0

    , and puts the value into

    r0

    . (The compiler is reusing

    r0

    here, since the address isn't needed anymore.)

    Now that the CPU has its instruction, the software is done. Well, maybe.

    Instruction Decoding and Execution

    I don't want to go too in depth with how the CPU actually executes code in general. In short, the CPU loads the above instruction from memory and decodes it to figure out the opcode and operands. Once the CPU sees that incoming instruction is a load, it issues the memory load for the appropriate address.

    Virtual Memory

    On most hardware you're likely to program for today, and on any Apple platform from the past couple of decades, the system uses virtual memory. In short, virtual memory disconnects the memory addresses seen by your program from the physical memory addresses of the actual RAM in your computer. In other words, when your program accesses address

    42

    , that might actually access the physical RAM address

    977305

    .

    This mapping is done by page. Each page is a 4kB chunk of memory. The overhead of tracking virtual address mappings for every byte in memory would be far too great, so pages are mapped instead. They're small enough to provide decent granularity, but large enough to not incur too much overhead in maintaining the mapping.

    Modern virtual memory systems also have the ability to set permissions on a page. A page may be readable, writeable, or executable, or some combination thereof. If the program tries to do something with a page that it isn't allowed, or tries to access a page that has no mapping at all, the program is suspended and a fault is raised with the operating system. The OS can then take further action, such as killing the program and generating a crash report, which is what happens when you experience the common

    EXC_BAD_ACCESS

    error.

    The hardware that handles this work is called the Memory Management Unit, or MMU. The MMU intercepts all memory accesses and remaps the address according to the current page mappings.

    The first thing that happens when the CPU loads a byte of memory is to hand the address to the MMU for translation. (This is not always true. On some CPUs, there is a layer of cache that comes before the MMU. However, the overall principle remains.)

    The first thing the MMU does with the address is slice off the bottom 12 bits, leaving a plain page address. 2¹² equals 4096, so the bottom 12 bits describe the address's location within its page. Once the rest of the address is remapped, the bottom 12 bits can be added on to generate the full physical address.

    With the page address in hand, the MMU consults the Translation Lookaside Buffer, or TLB. The TLB is a cache for page mappings. If the page in question has been accessed recently, the TLB will remember the mapping, and quickly return the physical page address, at which point the MMU's work is done.

    When the TLB does not contain an entry for the given page, this is called a TLB miss, and the entry must be found by searching the entire page table. The page table is a chunk of memory that describes every page mapping in the current process. Most commonly, the page table is laid out in memory by the OS in a special format that the MMU can understand directly. Following a TLB miss, the MMU searches the page table for the appropriate entry. If it finds one, it loads it into the TLB and performs the remapping.

    On some architectures, the page table mapping is left entirely up to the OS. When a TLB miss occurs, the CPU passes control to the OS, which is then responsible for looking up the mapping and filling the TLB with it. This is more flexible but much slower, and isn't found much in modern hardware.

    If no entry is found in the page table, that means the given address doesn't exist in RAM at all. The CPU informs the OS, which then decides how to handle the situation. If the OS doesn't think that address is valid, it terminates the program and you get an

    EXC_BAD_ACCESS

    . In some cases, the OS does think the address is valid, but just doesn't have the data in RAM. This can happen if the data has been swapped out to disk, is part of a memory mapped file, or is freshly allocated with backing memory being provided on demand. In these cases, the OS loads the appropriate data into RAM, adds an entry to the page table, and then lets the MMU translate the virtual address into a physical address now that the backing data is available.

    Cache

    With the address in hand, the CPU consults its memory cache. In days of yore, the CPU would talk directly to RAM. However, CPU speeds have increased faster than memory speeds, and that's no longer practical. If a modern CPU had to talk directly to modern RAM for every memory access, our computers would slow to a relative crawl.

    The cache is a hardware map from a set of memory addresses to memory contents. Caches are organized into cache lines, which are typically in the region of 32-128 bytes each. Each entry in the cache holds an address and a single cache line corresponding to that address. When loading data from the cache, it checks to see if the requested address exists in the cache, and if so, returns the appropriate data from that address's cache line.

    There are typically several levels of cache. Due to hardware design constraints, larger caches are necessarily slower. By having multiple levels, a small, fast cache can be checked first, with slower, larger caches used later to avoid the cost of fetching from RAM. The CPU first checks with the L1 cache, which is the first level. This cache is small, typically around 16-64kB. If it contains the data in question, then the memory load is complete! Since that's boring, we'll assume the caches don't contain the data being loaded here.

    Next up is the L2 cache. This is bigger, generally anywhere from 256kB to several megabytes. In some CPUs, the L2 cache is the last level, and these typically have larger L2 caches. Other CPUs have an L3 cache as well, in which case the L2 is usually smaller, and it's supplemented by a large L3 cache, usually several megabytes, with some high performance chips having up to 20MB of L3 cache.

    Once all levels of cache have been tried, if none of them contain the necessary data, it's time to try main memory. Because caches work with entire cache lines, the entire cache line is loaded from main memory at once, even though we're only loading a single byte. This greatly increases efficiency in the common case of accessing other nearby memory, since subsequent nearby loads can come from cache, at the cost of wasting time when memory use is scattered.

    Memory

    It's finally time to start querying RAM. The CPU has been waiting quite a while by this point, and will have to wait a long time more before it gets the data it wants.

    The load is handed off to the memory controller, which is the bit of hardware that actually knows how to talk to RAM. On a lot of modern hardware, the memory controller is integrated directly into the CPU, while on some systems it's part of a separate chip called the northbridge.

    The memory controller then starts loading data from RAM. Modern SDRAM transfers 64 bits of data at a time, so several transfers have to be done to fill the entire cache line being requested.

    The memory controller places the load address on the address pins of the RAM and waits for the data to be returned. Internally, the RAM uses the values on the address pins to activate a row of memory cells, whose contents are then exposed on the RAM's output pins.

    RAM is not instantaneous, and there's an appreciable delay between when the memory controller requests an address and when the data is available, on the order of 10 nanoseconds in current hardware. It takes more time to perform the subsequent loads needed for the cache line, but the loads can be pipelined, so total transfer time is maybe 50% more.

    As the memory controller obtains data from RAM, it hands that data back to the caches, which store it in case other data from the same cache line is needed soon. Finally, the requested byte is handed to the CPU, which places the data into the register requested by the instruction. At last, after all of this work, the CPU can get on with running the code that needed that byte of data.

    Consequences

    There are a lot of practical consequences that result from how all of this stuff works. In particular, memory acccess is slow, relatively speaking. It's amazing that your computer can do all of the above work literally tens of millions of times per second, but it can do other things literally billions of times per second. Everything is relative.

    The total time required for all of this, assuming a TLB hit (the fast case for the MMU) is a couple of dozen nanoseconds. On a 2GHz CPU, that could mean something like 50 clock cycles with the potential to execute perhaps 150 instructions in that time. That's a lot. A TLB miss may double or triple this latency number.

    Modern CPUs are pipelined and parallelized. This means that they will likely see the need for the memory read ahead of time and initiate the load at that point, softening the blow. Parallel execution means that the CPU will probably be able to continue executing some code after the load instruction while waiting for the load, especially code that doesn't depend on the loaded value. However, this stuff has limits, and finding 150 instructions that can be executed while waiting for RAM is a tall order. You're almost certain to hit a point where program execution has to stop and wait for the memory load to complete.

    Incidentally, this is where hyperthreading gains its advantage. Instead of having an entire CPU core just idle while waiting for RAM, hyperthreading lets it switch over to a completely different thread of execution and run code from that instead, so that it can still get useful work done while it waits.

    Access patterns are key to performance. Discussions about micro-optimization tend to center on using some instructions rather than others, avoiding divisions, etc. Relatively few talk about memory access patterns. However, it doesn't matter how optimized your individual instructions are if they're operating on memory that's loaded in a way that isn't kind to the memory system. Saving a few cycles here and there is meaningless if you're waiting dozens of cycles for every new piece of data to load. For example, this is why, although it's the more natural way to express it, you should never write loops to access image data like this:

    for

    (

    int

    x

    =

    0

    ;

    x

    <

    width

    ;

    x

    ++

    )

    for

    (

    int

    y

    =

    0

    ;

    y

    <

    height

    ;

    y

    ++

    )

    // use the pixel at x, y

    Images are typically laid out in contiguous rows, and this loop does not take advantage of that fact. It accesses columns, only coming back to the next pixel in the first row after loading the entire first column. This causes cache and TLB misses. This loop will be vastly slower than if you iterate over rows first, then columns:

    for

    (

    int

    y

    =

    0

    ;

    y

    <

    height

    ;

    y

    ++

    )

    for

    (

    int

    x

    =

    0

    ;

    x

    <

    width

    ;

    x

    ++

    )

    // use the pixel at x, y

    In many cases, the top loop with fast code in the loop body will be massively outperformed by the bottom loop with slow code in the loop body, simply because memory access delays can be so punishing.

    To make things even worse, profilers, such as Apple's Time Profiler in Instruments, aren't good at showing these delays. They'll tell you what instructions took time, but because of the pipelined, parallel nature of modern CPUs, the instruction that takes the hit of the memory load may not be the actual load instruction. The CPU will hit the load instruction, mark its destination register as not having its data yet, and move on. When the CPU hits an instruction that actually needs that register's value, then it will stop and wait. The clue here is when the first instruction in a sequence of manipulations on the same value takes far longer than the rest, and far longer than it should. For example, if you have code that does

    load

    ,

    add

    ,

    mul

    ,

    add

    , and the profiler says that the first

    add

    takes the vast majority of the time, this is likely to be a memory delay, not actually a slow

    add

    .

    Conclusion

    Modern computers operate on time scales that are difficult to envision. To a human, the time required for a single CPU cycle and the time required to perform a hard disk seek are both indistingusihably instantaneous, yet they vary by many orders of magnitude. The computer is an incredibly complicated system that requires a huge number of things to happen in order to load a single chunk of data from memory. Knowing what goes on in the hardware when this happens is fascinating and can even help write better code. It's even more incredible once you think that this complicated set of operations happens literally millions of times every second in the computer you're using to read this.

    Friday Q&A 2013-01-11:

    Mach Exception Handlers

    by Landon Fuller

    Related Articles

    Swift Name Mangling

    Preprocessor Abuse and Optional Parentheses

    This is my first guest Friday Q&A article, dear readers, and I hope it will withstand your scrutiny. Today's topic is on Mach exception handlers, something I've recently spent some time exploring on Mac OS X and iOS for the purpose of crash reporting. While there is surprisingly little documentation available about Mach exception handlers, and they're considered by some to be a mystical source of mystery and power, the fact is that they're actually pretty simple to understand at a high level - something I hope to elucidate here. Unfortunately, they're also partially private API on iOS, despite being used in a number of new crash reporting solutions - something I'll touch on in the conclusion.

    Signals vs. Exceptions

    On most UNIX systems, the only mechanism available for handling crashes (such as dereferencing

    NULL

    , or writing to an unwritable page) are the standard UNIX signal handlers. When a fatal machine exception is generated, it is caught by the kernel, which then executes a user-space trampoline within the failing process, executing any function previously registered by that process via

    sigaction(2)

    or

    signal(3)

    .

    On OS X, however, a much more versatile API exists: Mach exceptions. Dating back to Avie Tevanian's work on the Mach OS (yes, that Avie Tevanian), Mach exceptions build on Mach IPC/RPC to provide an alternative to the UNIX signal handler API. The original design of the Mach exception handling facility was first described, as far as I'm aware, in a 1988 paper authored by Avie Tevanian, among others. It remains fairly accurate to this day, and I'd recommend reading it for more details (after finishing this post, of course).

    Mach exceptions differ from UNIX signals in three significant ways:

    Exception information is delivered as a Mach message via a Mach IPC port, rather than by the kernel calling into a userspace trampoline.

    Exception handlers may be registered by any process that has the appropriate mach port rights for the target process.

    Exception handlers may be registered for a specific thread, a specific task (process), or for the entire host. The kernel will search for handlers in that order.

    These differences introduce a number of properties that can be useful when implementing debuggers and crash reporters, and are what make the Mach API interesting as an alternative to BSD signals.

    Exceptions are Messages

    The Mach exception API is based on Mach RPC (which is, in itself, based on Mach IPC). There's a lot of confusion around Mach IPC, but at a high-level, it's not too dissimilar to UNIX sockets or other well-known IPC mechanisms that allow one to read/write messages between processes. Mach IPC communication occurs over mach ports, rather than via socket or other traditional UNIX mechanism; mach ports have unique names, and can be shared with other processes. They can be used to send and receive messages containing arbitrary data. There's a bit more complexity involved in their actual use, but conceptually, that's about all you need to know.

    To write a Mach exception handler using raw Mach IPC, you would need to wait for a new exception message by calling

    mach_msg()

    on a Mach port previously registered as an exception handler (how to do this is covered below). The call to

    mach_msg()

    will block until an exception message is received, or the thread is interrupted. Once a message is received, you are free to introspect it for the state of the thread that generated the exception. You can even correct the cause of the crash and restart the failing thread, if you feel like hacking register state at runtime.

    Since exceptions are provided as messages, rather than by calling a local function, exception messages can be forwarded to the previously registered Mach exception handler, even if that existing handler is completely out-of-process. This means that you can insert an exception handler without disturbing an existing one, whether it's the debugger or Apple's crash reporter. To forward the message to an existing handler, you also use

    mach_msg()

    to send the original message to a previously registered handler's mach port, using the

    MACH_SEND_MSG

    flag.

    However, if you wish to respond the Mach RPC request yourself, rather than forwarding it, you would need to reply to the message, informing the sender whether or not you handled the exception. Mach considers an exception handled if the crashing thread's state has been corrected such that its execution can be resumed. In this case, the kernel does not attempt to find any other exception handler, and considers the matter settled. However, if you reply to the RPC request informing the sender (usually the kernel) that the exception has not been handled, the sender will then try to find the next applicable Mach exception handler. Remember that the kernel attempts to send exceptions to thread-specific, task-specific, and host-global exception handlers, in that order.

    The fact that a reply is expected from the exception request can be used for interesting purposes. For example, if a debugger has its exception handler called when a breakpoint is hit, it can simply wait to reply to the Mach exception message until (and only if) you request that the debugger continue execution.

    Mach RPC, not IPC

    While above I described how one might implement mach exception handling with raw Mach IPC, the fact is that this is not how the interfaces are defined in Mach. Instead, Mach RPC uses an interface description language (called matchmaker in the original 1989 paper), to describe the format of Mach RPC requests (and their replies), and automatically generate code to handle received messages and generate a reply.

    On OS X, the Mach RPC interface descriptions for exception handling -

    mach_exc.defs

    and

    exc.defs

    - are available via

    /usr/include/mach

    . If you include these files in your Xcode project, it will automatically run the

    mig(1)

    tool (Mach Interface Generator), generating headers and C source files necessary to receive and handle Mach exception messages. The

    exc.defs

    file provides an API for working with 32-bit exceptions, whereas the

    mach_exc.defs

    file provides an API for working with 64-bit exceptions. Unfortunately, the Mach RPC defs are not provided on iOS, and only a subset of the necessary generated headers are provided. As a result, it's not possible to implement a fully correct Mach exception handler on iOS without relying on undocumented functionality.

    The code generated by MIG handles two things:

    Interpreting incoming RPC messages and calling out to an existing handler function with the decoded data.

    Initialize a response to the RPC messages using the return values from the handler function.

    The generated code does not handle registering a Mach exception handler, receiving the Mach message, or actually sending the reply. That is the implementor's responsibility. In addition, there are multiple supported exception behaviors that provide different sets of information about an exception; it is the implementor's responsibility to provide callback functions for all of them.

    This is best illustrated in the following 64-bit safe code, intended to work with RPC code generated by

    mach_exc.defs

    (I've left out error handling for simplicity):

    // Handle EXCEPTION_DEFAULT behavior

    kern_return_t

    catch_mach_exception_raise

    (

    mach_port_t

    exception_port

    ,

    mach_port_t

    thread

    ,

    mach_port_t

    task

    ,

    exception_type_t

    exception

    ,

    mach_exception_data_t

    code

    ,

    mach_msg_type_number_t

    codeCnt

    )

    {

    // Do smart stuff here.

    fprintf

    (

    stderr

    ,

    "My exception handler was called by exception_raise()

    \n

    "

    );

    // Inform the kernel that we haven't handled the exception, and the

    // next handler should be called.

    return

    KERN_FAILURE

    ;

    }

    extern

    boolean_t

    mach_exc_server

    (

    mach_msg_header_t

    *

    msg

    ,

    mach_msg_header_t

    *

    reply

    );

    static

    void

    exception_server

    (

    mach_port_t

    exceptionPort

    )

    {

    mach_msg_return_t

    rt

    ;

    mach_msg_header_t

    *

    msg

    ;

    mach_msg_header_t

    *

    reply

    ;

    msg

    =

    malloc

    (

    sizeof

    (

    union

    __RequestUnion__mach_exc_subsystem

    ));

    reply

    =

    malloc

    (

    sizeof

    (

    union

    __ReplyUnion__mach_exc_subsystem

    ));

    while

    (

    1

    )

    {

    rt

    =

    mach_msg

    (

    msg

    ,

    MACH_RCV_MSG

    ,

    0

    ,

    sizeof

    (

    union

    __RequestUnion__mach_exc_subsystem

    ),

    exceptionPort

    ,

    0

    ,

    MACH_PORT_NULL

    );

    assert

    (

    rt

    ==

    MACH_MSG_SUCCESS

    );

    // Call out to the mach_exc_server generated by mig and mach_exc.defs.

    // This will in turn invoke one of:

    // mach_catch_exception_raise()

    // mach_catch_exception_raise_state()

    // mach_catch_exception_raise_state_identity()

    // .. depending on the behavior specified when registering the Mach exception port.

    mach_exc_server

    (

    msg

    ,

    reply

    );

    // Send the now-initialized reply

    rt

    =

    mach_msg

    (

    reply

    ,

    MACH_SEND_MSG

    ,

    reply

    ->

    msgh_size

    ,

    0

    ,

    MACH_PORT_NULL

    ,

    0

    ,

    MACH_PORT_NULL

    );

    assert

    (

    rt

    ==

    MACH_MSG_SUCCESS

    );

    }

    }

    You'll note from the example code that our exception handler is called a server. In Mach RPC parlance, the kernel would be the client: it issues RPC requests to our exception server, and waits for our reply.

    Exception Behaviors

    As described above, exception messages come in multiple formats, containing varying types of data. It's the implementor's responsibility to register for the correct behavior; the

    mig

    -generated RPC code will interpret the messages and hand it off to a user-defined function for the specific type. There are three basic behaviors defined by the Mach Exception API:

    EXCEPTION_DEFAULT

    : Exception messages will contain a reference thread that triggered it. Handled by

    catch_exception_raise()

    .

    EXCEPTION_STATE

    : Exception messages will contain the register state of the triggering thread, but not a reference to the thread itself. Handled by

    catch_exception_raise_state()

    .

    EXCEPTION_STATE_IDENTITY

    : Exception messages will contain the register state of the triggering thread, as well as a reference to the triggering thread. Handled by

    catch_exception_raise_state_identity()

    .

    In addition to the above behaviors, an additional variant was added in later OS X releases to support 64-bit safety. The

    MACH_EXCEPTION_CODES

    flag may be set by OR'ing it with any of the listed behaviors, in which case 64-bit safe exception messages will be provided. This flag is used by LLDB/GDB even when targeting 32-bit processes. When using the

    MACH_EXCEPTION_CODES

    flag, one must also use the RPC functions generated by

    mach_exc.defs

    ; these use the

    mach_

    prefix for all functions and types.

    Generally speaking,

    EXCEPTION_DEFAULT

    or

    EXCEPTION_STATE_IDENTITY

    are sufficient for most purposes. Since

    EXCEPTION_DEFAULT

    behavior provides a reference to the triggering thread, you can also fetch the thread state that would normally be provided via

    EXCEPTION_STATE_IDENTITY

    via the Mach

    thread_state()

    API.

    When registering your exception handler, you are responsible for requesting the

    MACH_EXCEPTION_CODES

    behavior that matches the RPC implementation (

    exc.defs

    or

    mach_exc.defs

    ) that you intend to use.

    Putting it Together

    It's time to get down to brass tacks: actually registering an mach port to receive exception messages. As noted above, handlers can be registered for threads, tasks, and the host, and there are different sets of identical APIs for each:

    (thread|task|host)_get_exception_ports

    : Returns the currently registered set of exception ports.

    (thread|task|host)_set_exception_ports

    : Sets the exception port that will be used for all future exceptions.

    (thread|task|host)_swap_exception_ports

    : Atomically set a new exception port, and return the current ports. This can be used to avoid race conditions that could otherwise occur if multiple handlers are registered concurrently.

    To register your handler, you'll need to first allocate a mach port to receive the messages, insert a send right to permit sending responses, and then call one of the exception port

    set()

    or

    swap()

    functions to register it as a receiver of exception messages.

    For example (error handling again elided for conciseness):

    mach_port_t

    server_port

    ;

    kern_return_t

    kr

    =

    mach_port_allocate

    (

    mach_task_self

    (),

    MACH_PORT_RIGHT_RECEIVE

    ,

    &

    server_port

    );

    assert

    (

    kr

    ==

    KERN_SUCCESS

    );

    kr

    =

    mach_port_insert_right

    (

    mach_task_self

    (),

    &

    server_port

    ,

    &

    server_port

    ,

    MACH_MSG_TYPE_MAKE_SEND

    );

    assert

    (

    kr

    ==

    KERN_SUCCESS

    );

    kr

    =

    task_set_exception_ports

    (

    task

    ,

    EXC_MASK_BAD_ACCESS

    ,

    server_port

    ,

    EXCEPTION_DEFAULT

    |

    MACH_EXCEPTION_CODES

    ,

    THREAD_STATE_NONE

    );

    If you wish to preserve the previous exception handlers,

    task_swap_exception_ports()

    should be used in place of

    task_set_exception_ports()

    .

    Conclusion

    Mach exception handlers are a very useful tool, and using them requires a fair bit of moving pieces, but hopefully they don't seem dauntingly complex. At the end of the day, mach exceptions are just a simple exception message, coupled with a reply, sent over Mach ports.

    There are some signficiant advantages of the Mach API over signal handlers, including the ability to forward exceptions out-of-process, and handle all exceptions on a completely different stack - something that can be useful when handling an exception triggered by a stack overflow on the target thread.

    If you plan on implementing your own mach exception handler, there are certainly more details worth further investigation:

    When forwarding mach exceptions, you need to send an exception message that matches the previous registered handler's exception flavor. This may mean populating a new Mach exception message with additional thread state.

    It's not strictly necessary to use the MIG-generated

    exc_server()

    or

    mach_exc_server()

    functions for interpreting Mach messages (though it is probably a good idea). Since

    mig(1)

    generates structures that may be used to directly interpret the Mach exception messages, you can do so directly.

    If you forward exception messages for exceptions that occur in your own process, you need to be sure that the target for the reply is not also your own process. Single-stepping debuggers will only resume the thread they wish to step; that means that they won't resume your exception handler's thread, you'll never receive the reply, and the interrupted thread will never resume.

    Lastly, I should highlight that the headers and mach interfaces required to implement a correct mach exception handler on iOS are not available (though they are available and public on Mac OS X). I filed a radar requesting their addition (

    rdar://12939497

    ), as well as an Apple DTS support incident to clarify the situation. The radar is still open, but DTS provided the following guidance:

    Our engineers have reviewed your request and have determined that this would be best handled as a bug report, which you have already filed. There is no documented way of accomplishing this, nor is there a workaround possible.

    In the meantime, as far as I can determine through my own work, and as per DTS's feedback, it's not possible to implement Mach exception handling on iOS using only public API. Hopefully this will be resolved in a future release of iOS, such that we can safely adopt Mach exceptions.

    Friday Q&A 2013-01-25:

    Let's Build NSObject

    Related Articles

    Let's Build Key-Value Coding

    Let's Build UITableView

    Let's Build NSInvocation, Part I

    Let's Build NSInvocation, Part II

    Let's Build stringWithFormat:

    Let's Build Dispatch Groups

    Let's Build Swift Notifications

    Let's Build @synchronized

    Let's Build Swift.Array

    Let's Build dispatch_queue

    The

    NSObject

    class lies at the root of (almost) all classes we build and use as part of Cocoa programming. What does it actually do, though, and how does it do it? Today, I'm going to rebuild

    NSObject

    from scratch, as suggested by friend of the blog and occasional guest author Gwynne Raskind.

    Components of a Root Class

    What exactly does a root class do? In terms of Objective-C itself, there is precisely one requirement: the root class's first instance variable must be

    isa

    , which is a pointer to the object's class. The

    isa

    is used to figure out what class an object is when dispatching messages. That's all there has to be, from a strict language standpoint.

    A root class that only provides that wouldn't be very useful, of course.

    NSObject

    provides a lot more. The functionality it provides can be broken down into three categories:

    Memory management: standard memory management methods like

    retain

    and

    release

    are implemented in

    NSObject

    . The

    alloc

    and

    dealloc

    methods are also implemented there.

    Introspection:

    NSObject

    provides a bunch of methods that are essentially wrappers around Objective-C runtime functionality, such as

    class

    ,

    respondsToSelector:

    , and

    isKindOfClass:

    .

    Default implementations of miscellaneous methods: there are a bunch of methods that we count on every object implementing, such as

    isEqual:

    and

    description

    . In order to ensure that every object has an implementation,

    NSObject

    provides a default implementation that every subclass gets if it doesn't bring its own.

    Code

    I'll be reimplementing

    NSObject

    functionality as

    MAObject

    . I've posted the full code for this article on GitHub:

    https://github.com/mikeash/MAObject

    Note that this code is built without ARC. Although ARC is great and should be used whenever possible, it really gets in the way when implementing a root class, because a root class needs to implement memory management and ARC prefers that you leave memory management up to the compiler.

    Instance Variables

    MAObject

    has two instance variables. The first is the

    isa

    pointer. The second is the object's reference count:

    @implementation

    MAObject

    {

    Class

    isa

    ;

    volatile

    int32_t

    retainCount

    ;

    }

    The reference count will be managed using functions from

    OSAtomic.h

    to ensure thread safety, which is why it has a somewhat unusual definition rather than using

    NSUInteger

    or similar.

    NSObject

    actually holds reference counts externally. There's a global table which maps an object's address to its reference count. This saves memory, because most objects have a reference count of

    1

    , which the

    Enjoying the preview?
    Page 1 of 1