Академический Документы
Профессиональный Документы
Культура Документы
zygy"
First let's define the map
Key = URL, Value = webpage as a string
map(k, v)
{
String[] words = v.split(" ")
for each (w : words)
if(w == "syzygy")
emit(w, key)
}
Then we define reduce:
Key = word
Value = set of URLs
reduce(k, v)
{
for each(u : v)
emit(u, "found")
}
But here the issue is that we are't really uses reduce. We kinda just wa
nt map
KEY TAKEAWAY: YOU DON'T ALWAYS HAVE TO USE MAP-REDUCE
Example 3:
Inverted Index
Map: key = URL, value = String
map(k, v)
{
String[] words = v.split(" ")
for each (w : words)
emit(w, k)
}
Reduce: key = String, value = set of URLs
reduce(k, v)
{
emit(k,v)
}
Briefly, (R+W>N) ensures that any read quorum and any write quorum will overlap
in at least one node. Without this condition, it could happen that, say, one cli
ent writes to servers A, B, and C (W=3), and then another client reads from serv
ers D and E (R=2). The second client wouldn't see the data that the first client
has written, so this approach doesn't provide strong consistency.
Re: conflicting writes, you'll want 2*W>N, so that any two write quora will over
lap in at least one node. If two clients write concurrently, they'll at least di
scover the conflict, and one or both of them can then retry.
The larger your R or W, the longer the corresponding reads or writes will take,
since more nodes have to be contacted, and the chances that at least one straggl
er is in the set will increase. So a small R is good for fast reads, and a smal
l W is good for fast writes. If you also want strong consistency, you can't have
both, so fast reads will need to be paid for with slow writes, or vice versa.
To prevent data loss, you'll want as many copies of your most recent data as pos
sible, so you'll want a large W (in absolute terms, which requires a large N as
well).