You are on page 1of 3

5/13/2011

Flume Users | Google Groups

Flume Users Re: how to send separate files separately? flows?


Jonathan Hsieh <j...@cloudera.com> Wes, The auto stuff works in conjunction with the fully isolated flows. Here's an example (as a series of commands) # set some agents exec config node1a foo src1 autoBEChain exec config node1b bar src2 autoDFOChain exec config node2a foo src1 autoBEChain exec config node2b bar src2 autoDFOChain # set some collectors exec config collectorA foo autoCollectorSource 'collectorSink(xxx)' exec config collectorB bar autoCollectorSource 'collectorSink(yyy)' # physical node 3 and 4 are collectors exec spawn physicalNode3 collectorA exec spawn physicalNode4 collectorB # physical node 1 and node 2 both have two logical nodes, exec spawn physicalNode1 node1a exec spawn physicalNode1 node1b exec spawn physicalNode2 node2a exec spawn physicalNode2 node2b Since node1a, node2a and collectorA are all in the 'foo' flow group, traffic from node1a and node2a only goes to collectorA. Since node1b, node2b and collectorB are all in the 'bar' flow group, traffic from node1b and node2b only goes to collectorB. Thus we have kept different data isolated, obviating the need to sort it out later. Jon. On Mon, Aug 23, 2010 at 10:30 PM, Wes <wchengta...@gmail.com> wrote: > Thanks Jon! I'm exciting with all those coming changes! > Also for the first alternative, fully isolated flow, is it something > suggested by Henry in the post? (https://groups.google.com/a/ > cloudera.org/group/flume-user/browse_thread/thread/ > 79051d79fb114f82/29085e233aed8bf8?lnk=gst&q=flow#29085e233aed8bf8 ) , > which specify the nodes as: > config logical_node [optional flowid] source sink > Or the "fully isolated nodes" are still something under development? > Its advantages sound delicious though:) > Wes > On Aug 23, 9:47 pm, Jonathan Hsieh <j...@cloudera.com> wrote:
groups.google.com//cb17dce47817b0 1/3

5/13/2011

Flume Users | Google Groups

> > Wes, > > Cool! > > > > > > > > > > > > > > > > > > We'll try to make this easier in the future -- making all this simpler was > the original intent of all this auto* stuff. I generally make a first cut > and iterate approach, and it often takes a few passes to get all the kinks > worked out. I think the short version of the story is that since the > logical node wasn't assigned to a physical node yet, or since the master > didn't know the ip of the physical node yet, it couldn't infer the > connection. > Also, the main difference between 'flume node' and 'flume node_nowatch' is > that console input properly works with the nowatch version. In general the > no_watch version is good for interactive mode or one shot modes, while flume > node is more for daemon mode.

> > Jon. > > On Mon, Aug 23, 2010 at 5:35 PM, Wes <wchengta...@gmail.com> wrote: > > > Hey Jon, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Man, finally it works now. Thanks a lot for your Friday post!! Just for a note, there are some glitches on my side that need to clear before making it work. One of them to share: When start a physical node, even after spawning it to a logical node, it will show "logical node not found" messages. It will work after do a unconfig in flume shell by "exec unconfig PhysNode1" to let the master pick it up. Also some minor stuff (not sure if that matters), like using flume node_nowatch instead of sudo flume or flume node... Correct me if I'm wrong. Anyway it's working now:)

> > > Wes > > > On Aug 20, 3:51 pm, Jonathan Hsieh <j...@cloudera.com> wrote: > > > > Hey Wes, > > > > > > > > > > > > > > >> to >> >> >> > There is a work around for this issue. The short story is you need run a > 'refresh' on that agent (exec refresh agentA), or alternately 'refreshAll'.

> > > Basically, there are a few translations that happen and the > > > fail("logicalSink(\"collectorA\"))" is the result of the second one > > failing. > > > The first translation takes all the registered autoCollectorSources and > > > translates auto*Chain into the longer more complicated config with the > > > failovers selected from the nodes. Each agent node should have the order
2/3

groups.google.com//cb17dce47817b0

5/13/2011

Flume Users | Google Groups

> > > > > > > > > > > > > > > > > > > > >

>> >> the >> to >> >>

of > collectors randomly selected. These end up being logicalSink's i > configuration. A logicalSink is a sink that sends data from a node

the > specified logicalNode.

> > > There is another translation that takes logicalSnks and converts them to > > > rpcSinks that have actual IPs and ports. If the master doesn't know the > > IP, > > > it translates to the fail sink. The issue is that the logical node must > > > heartbeat so the master knows it IP and can then translate the > > logicalSink > > > to a rpcSink with ip and ports. Ideally when a node shows up with a new > > > IP, we would automatically tell the translation mechanism to update the > > > translation. refresh / refreshAll does this manually.

> > > > Jon. > > > > On Thu, Aug 19, 2010 at 8:30 PM, Wes <wchengta...@gmail.com> wrote: > > > > > > > > > It seems that it cannot find the corresponding collectors (as it says > > > > fail( "logicalSink( \"collectorB\" )" )). Did I miss something in the > > > > setting for those two collectors?

> > > > -> > > > // Jonathan Hsieh (shay) > > > > // j...@cloudera.com > > -> > // Jonathan Hsieh (shay) > > // j...@cloudera.com

-// Jonathan Hsieh (shay) // j...@cloudera.com

groups.google.com//cb17dce47817b0

3/3