Вы находитесь на странице: 1из 50

Making Reg[Ee]x Your Buddy

(?i)(mi(chael|ke) wilde), Splunk Ninja


Thursday, August 18, 11

August 15, 2011

Hi, Im Michael Wilde


You may know me from:

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

What is RegEx
Finite Automata

Regular Expression invented in the 1950s by

mathemaUcian Stephen Cole Kleene Implemented by ed and grep creator Ken Thompson in 1973 Pa[ern matching language for text processing Has slightly dierent implementaUons (PERL, POSIX) Way crypUc at rst sight

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

Why should you care


Field extracUon is a requirement for reporUng Index-Ume ltering & rouUng Youll seem smart It will be useful beyond Splunk You might score with the (ladies|dudes) at
(Maker\sFaire |ComiCon).

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

Thinking Regex

Thursday, August 18, 11

Thinking Regex
Log Events are a great place to start, they have structure Dont overthink it. The pa[ern is there waiUng to
discovered

Dont be lazy and use wildcards too much Learn to love NOT regexes. \S+ \D+ \W+ [^,]+

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

Be nice to your RegEx engine


MS-DOS taught us to be laaaaaaaaaaaaaaaaazy with *.* A regex engine matches character by character, and then does backtracking. Match in as few steps as possible

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

Regexes in Splunk
Search Language: rex, erex, regex Indexing: Filtering data (in|out), line breaking, timestamp extraction Field Extraction

Thursday, August 18, 11

IFX
Splunk has a built in "interacUve eld extractor" It can be useful. Give it samples of data, and it will a[empt to learn a regex and persist a single eld It has a limitaUon of the amount of events to display in its viewer. You might not see your search results when using it? Huh?

Splunk Worldwide Users Conference


Thursday, August 18, 11

10

Copyright Splunk 2011

what if we could use that "intelligent" stu IFX was doing but in the search language

Thursday, August 18, 11

Splunk Worldwide Users Conference

11

Copyright Splunk 2011

meet "erex"
Allows you to give it examples, but it works on your search results Allows you to give it counterexamples of stu you don't want to match on Builds you a proper rex command

Splunk Worldwide Users Conference


Thursday, August 18, 11

12

Copyright Splunk 2011

...there's an app for that. right?

Splunk Worldwide Users Conference


Thursday, August 18, 11

13

Copyright Splunk 2011

Field Extractor App


Imagine you could use your mouse, highlight elds, name them, persist them, go home early and never write regex. David Carasso's Field Extractor app is like a "workbench for eld extracUon" Download it from SplunkBase

Splunk Worldwide Users Conference


Thursday, August 18, 11

14

Copyright Splunk 2011

searching with regex

Thursday, August 18, 11

the | regex search command


Did you know splunk crushes all terms to lower case? If you need to look for specic pa;erns or even words and respect the case the original events are in, use | regex index=splunktv|regex _raw="(MP3|M4A)" <--noMce this is a case sensiMve pa;ern match.

Splunk Worldwide Users Conference


Thursday, August 18, 11

16

Copyright Splunk 2011

What about good ole Rex?

Search Ume eld extracUons via your own regexes -- in the search language Name your elds Reuse everyone elses work!

Splunk Worldwide Users Conference


Thursday, August 18, 11

17

Copyright Splunk 2011

a few more tricks for you

Splunk Worldwide Users Conference


Thursday, August 18, 11

18

Copyright Splunk 2011

host extracUon irritates me

Splunk Worldwide Users Conference


Thursday, August 18, 11

19

Copyright Splunk 2011

regex in host extracUon


Splunk will a[empt to do the right thing. Log source will likely make it hard for Splunk--and you'll blame Splunk Props.conf & transforms.conf are needed to properly extract hostnames in some cases (F5 Big-IP and HP networking gear Use default seungs in props.conf and use your own seungs as well

Splunk Worldwide Users Conference


Thursday, August 18, 11

20

Copyright Splunk 2011

priority boarding in props.conf


[source::...a...] TRANSFORMS-ahosts = ahostextrac:on priority = 1 [source::...z...] TRANSFORMS-zhosts = zhostextrac:on priority = 99 what if the source we were matching against had the word "arizona" in it? It will match both, right? Use "Priority" to control matching. 99 is higher than 1. So 99 is a higher priority. Yeah, i know... weird.
Splunk Worldwide Users Conference
Thursday, August 18, 11

21

Copyright Splunk 2011

Basic Training Complete!

Lets do something more difficult

Thursday, August 18, 11

Splunk is so smart
except when its not
<policy id="3">Finjan HTTPS policy</policy> <cp id="5" name="AcUve Content" display_name="AcUve Content"/> <group id="5002" cp_id="5" type="0">Full prole - Binary Behavior</group> <item id="28015">Format error in CRL lastUpdate eld</item> <item id="3265747">*.served.com/*</item> <rule_comment id="2" name="Block cerUcate validaUon errors">&lt;! [CDATA[Block HTTPS content without a valid cerUcate]]&gt;</rule_comment>

AUTO-KV pulled the id eld out of every event. Yay!!!


23

Splunk Worldwide Users Conference


Thursday, August 18, 11

Copyright Splunk 2011

id is not the eld name


look closer Agent Starling
<policy id="3">Finjan HTTPS policy</policy> <cp id="5" name="AcUve Content" display_name="AcUve Content"/> <group id="5002" cp_id="5" type="0">Full prole - Binary Behavior</group> <item id="28015">Format error in CRL lastUpdate eld</item> <rule_comment id="2" name="Block cerUcate validaUon errors">&lt;! [CDATA[Block HTTPS content without a valid cerUcate]]&gt;</rule_comment>

We can educate Splunk on dynamically pulling the KEY and VALUE with...

Splunk Worldwide Users Conference


Thursday, August 18, 11

24

Copyright Splunk 2011

Dynamic Key Value ExtracUon


...but tailored for our needs REGEX for the KEY is \<([^\=]+)\= <policy id="3"> Less than, followed by (anything that is not an equal sign--greedy match) <cp id="5" followed by an equal sign <item id="28015">
keep going dude!

REGEX for the VALUE is \( A quote (followed by anything that is not a quote--greedy match) followed by a quote followed by a greater than sign
Splunk Worldwide Users Conference
Thursday, August 18, 11

<policy id="3"> <cp id="5" <item id="28015">


Copyright Splunk 2011

25

Persist your sweet dynamic KV pa[erns


props.conf & transforms.conf required
Create an entry in props.conf like this: [m86_dynamic_kv] REPORT-m86elds = mym86kv Create an entry in transforms.conf like this: [mym86kv]
REGEX = \<([^\=]+)\=\"([^\"]+)\"\> FORMAT = $1::$2

Text

$1 $2

<policy id="3">Finjan HTTPS policy</ policy>


26 Copyright Splunk 2011

Splunk Worldwide Users Conference


Thursday, August 18, 11

Dang it! It wasnt perfect


some of our events dont nish their XML tag right a~er a quote
Create an entry in props.conf like this: [m86_dynamic_kv] REPORT-m86elds = mym86kv Create an entry in transforms.conf like this: [mym86kv]
REGEX = \<([^\=]+)\=\"([^\"]+)[^\>]+\> FORMAT = $1::$2

Text

$1 $2

<rule_comment id="690" name="Log everythin Image les">&lt;![CDATA[Logs all content passin the system except for ......
27 Copyright Splunk 2011

Splunk Worldwide Users Conference


Thursday, August 18, 11

Think youre good?


Try extracUng the service eld
2011/07/21 19:27:22.071 [(ninja-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ninja-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms

Your job is to create a mulU-valued eld as the service eld exists mulUple Umes in each event
Splunk Worldwide Users Conference
Thursday, August 18, 11

28

Copyright Splunk 2011

Look for the obvious pa[erns


2011/07/21 19:27:22.071 [(ela4-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms

Your brain will tell you to look for anything a~er the rst comma a~er that le~ bracket and before the second comma
Splunk Worldwide Users Conference
Thursday, August 18, 11

29

Copyright Splunk 2011

...and your brain was wrong.


2011/07/21 19:27:22.071 [(ela4-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
This is NOT a service

Dang... what are we gonna do now?


Splunk Worldwide Users Conference
Thursday, August 18, 11

30

Copyright Splunk 2011

What is common with services


2011/07/21 19:27:22.071 [(ela4-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms

Theyre all alphanumeric or word characters 0-9A-Za-z_


Splunk Worldwide Users Conference
Thursday, August 18, 11

31

Copyright Splunk 2011

But what about the preceding text


2011/07/21 19:27:22.071 [(ela4-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms

Le~ bracket followed by some stu, followed by a comma.. but its not consistent. SomeUmes a ( le~ paren is in there.

Splunk Worldwide Users Conference


Thursday, August 18, 11

32

Copyright Splunk 2011

This is a be[er match


2011/07/21 19:27:22.071 [(ela4-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms

\[[\(\-a-zA-Z0-9]+,([a-zA-Z]+),

Say the matching paZern out loud. It will help

Le~ bracket, followed by anything in this character list (greedy). Followed by a comma, and then create a capturing group of text that matches upper or lower case roman alphabet-- greedy (as many Umes as possible). End capturing group, then followed by a comma.

Splunk Worldwide Users Conference


Thursday, August 18, 11

33

Copyright Splunk 2011

Cant be too hard to extend it, right?


2011/07/21 19:27:22.071 [(ela4-fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i: 1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig, 172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i: 1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms

\[[\(\-a-zA-Z0-9]+,([a-zA-Z]+),[^\[]+\[[\(\- a-zA-Z0-9]+,([a-zA-Z]+),
Le~ bracket, followed by anything in this character list (greedy). Followed by a comma, and then create a capturing group of text that matches upper or lower case roman alphabet--greedy (as many Umes as possible). End capturing group, then followed by a comma. Followed by anything that is NOT a Le~ Bracket, followed by.....
Splunk Worldwide Users Conference
34

Copyright Splunk 2011

Thursday, August 18, 11

Sad Trombone
This one has four services
2011/07/21 19:27:27.596 [(ninja4-fe29,genie,/handle,131292312,2011/07/21 19:27:27.310)[ninja4- be716,lmt,PbContentService.write<tetherAccountData;default>][ninja4- be05,tether,TetherAccountService.bindAccount][ninja4- be393,auth,Auth2Service.upgradeSubject]] [] [Auth2Service] upgradeSubject(V1.21.49,"INT",[LIM:131292312:s: 1311276361:b8f677d957eb3f7b9622247b72374c791720bc17,true], {internalAppName=twitter-sync},"tether",null)=[Principal[2],[INT: 131292312/twitter-sync: 1311276447:df9dd0175bd2e6107c2dfae36dfd9a9dc11f0631,false,20y]] in 15ms

Splunk Worldwide Users Conference


Thursday, August 18, 11

35

Copyright Splunk 2011

Remember rex?
He devours data

But you can make rex very hungry and control how much lunch he eats. By default, he only gets one helping of meat

Splunk Worldwide Users Conference


Thursday, August 18, 11

36

Copyright Splunk 2011

Using max_match with rex


You limit or expand the number of Umes it runs rex max_match=20 "\[[\(\-a-zA-Z0-9]+,(?<service>[a-zA-Z]+)," Instead of that last regex that matched two services, lets just match one, and tell rex to repeat our pa[ern matching

Splunk Worldwide Users Conference


Thursday, August 18, 11

37

Copyright Splunk 2011

You can persist this in cong les


props.conf & transforms.conf required
Create an entry in props.conf like this: [ninjasocial] REPORT-ninjaelds = myepicregex Create an entry in transforms.conf like this: [myepicregex]
REGEX = \[[\(\-a-zA-Z0-9]+,(?<service>[a-zA-Z]+), MV_ADD = TRUE

Splunk Worldwide Users Conference


Thursday, August 18, 11

38

Copyright Splunk 2011

And now for something dicult


gaming logs - Team Fortress

L 08/02/2011 - 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-2677 2177 -127") (victim_position "-2555 2323 -127")
Splunk Worldwide Users Conference
Thursday, August 18, 11

39

Copyright Splunk 2011

I need the data


gaming logs - Team Fortress

L 08/02/2011 - 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-2677 2177 -127") (victim_position "-2555 2323 -127")
Splunk Worldwide Users Conference
Thursday, August 18, 11

40

Copyright Splunk 2011

Whos who?
How do we know who did what to whom?

L 08/02/2011 - 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-2677 2177 -127") (victim_position "-2555 2323 -127")
Splunk Worldwide Users Conference
Thursday, August 18, 11

41

Copyright Splunk 2011

actor

actor_id

actor_team

actor_type

L 08/02/2011 - 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-2677 2177 -127") (victim_position "-2555 2323 -127")
actee actee_id actee_type
42

actee_team
Copyright Splunk 2011

Splunk Worldwide Users Conference


Thursday, August 18, 11

Didnt we see this slide before?


How do we know who did what to whom?

L 08/02/2011 - 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-2677 2177 -127") (victim_position "-2555 2323 -127")
Splunk Worldwide Users Conference
Thursday, August 18, 11

43

Copyright Splunk 2011

See that pa[ern? Remember max_match?


L 08/02/2011 - 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-2677 2177 -127") (victim_position "-2555 2323 -127")

Splunk Worldwide Users Conference


Thursday, August 18, 11

44

Copyright Splunk 2011

See that pa[ern? Remember max_match?


"The Administrator<61><BOT><Red>" "MoreGun<56><BOT><Blue>"
Using rex / mv_add, lets capture it in to some temporary mul9-value elds

Splunk Worldwide Users Conference


Thursday, August 18, 11

45

Copyright Splunk 2011

Temporary MulUValue Fields


actor_name_z actor_id_z actor_type_z actor_team_z The Administrator,MoreGun 61,56 BOT,BOT Red,Blue

Using rex / mv_add, lets capture it in to some temporary mul9-value elds

Splunk Worldwide Users Conference


Thursday, August 18, 11

46

Copyright Splunk 2011

Evaluate & Transform with mvindex


mul9-value elds have an posi9on value in the array

mvindex actor_name_z actor_id_z actor_type_z actor_team_z

0 1 The Administrator,MoreGun 61, 56 BOT,BOT Red,Blue

Splunk Worldwide Users Conference


Thursday, August 18, 11

47

Copyright Splunk 2011

Its Ume for our elds to split up!


mul9-value elds have an posi9on value in the array

| eval actor_name = mvindex(actor_name_z,0)| eval actee_name = mvindex(actor_name_z,1) actor_name = The Administrator actee_name = MoreGun
Splunk Worldwide Users Conference
Thursday, August 18, 11

48

Copyright Splunk 2011

Resources

regexlib.com regular-expressions.info gskinner.com/RegExr Reggy / RegExhibit RegexBuddy (JGSo~.com)

Thursday, August 18, 11

Questions, just ask! Michael Wilde, Splunk Ninja ninja@splunk.com

Thursday, August 18, 11

Вам также может понравиться