Академический Документы
Профессиональный Документы
Культура Документы
timestamp memory error cpu error … reboot undiagnosed repair diagnosed repair tool
0:00 3 0 0 0 0 NULL
0:15 0 0 1 0 0 A
0:20 1 0 0 0 0 NULL
0:21 0 0 1 0 0 B
0:30 0 0 1 1 0 C
group of engineers from a different background, logs it as a verb Apriori is a classical algorithm that is designed to identify fre-
fiAC Cyclefi. This could even happen to the same word, for example, quent item-sets. As illustrated in Algorithm 1, starting from fre-
fino diagnosisfi and fiundiagnosedfi in the last two messages mean quent items, i.e. item-sets at length=1, the algorithm generates
the same condition, but would impose huge difficulty when one candidate item-sets by adding one item at a time, known as the can-
tries to parse this log and count the events with regular expressions. didate generation process. At each length k, candidate generation
With a pre-defined structure, i.e. a list of fields to put the in- is done and all the candidate item-sets are scanned to increment
formation in, structured logging requires a canonical way to log the count of their occurrences. Then the item-sets that meet the
events. For example, in a structured table, the messages above can min-support threshold are kept and returned as the frequent item-
be logged in the format shown in Table 1. set L K . We add a practical constraint max-length on the maximum
In this conversion, engineers could decide not to log fino diagno- length of the item-set that we are interested in. The limit on max-
sis found in tool Afi in the structured table because it does not fit in length stops the algorithm from exploring item-sets that are too
the pre-defined structure. The structure of the table is flexible and descriptive and specific to the samples, which are typically less
can be tailored to the downstream application, for example, instead useful in production investigation.
of having multiple columns for memory error, cpu error, etc., we
can use one fierrorfi column and choose a value from a pre-defined
list such as memory, cpu, etc., to represent the same information. Algorithm 1: Apriori Algorithm
In addition to removing the ambiguity in the logs, enforcing let Ck be the candidate item-sets at length= k
structured logging through a single API also helps developers use let Lk be the frequent item-sets at length= k
and improve the existing architecture of the program, instead of L 1 = frequent items
adding ad-hoc functionalities for edge cases, which introduces k=1
unnecessary complexity that makes the code base much harder to
while Lk , ϕ and k ≤ max lenдth do
maintain. In this example, if there is only one API for logging a
Ck +1 = candidate item-sets generated from Lk
reboot, developers from tool A and B would likely reuse or improve
foreach transaction t in database do
a common reboot service instead of rebooting the servers in their
foreach item-set c covered by t do
own code bases. A common reboot service would be much easier to
increment the count of c
maintain and likely have a better-designed flow to handle reboots
in different scenarios. end
end
Lk+1 = item-sets in Ck +1 that meet min-support
3.3 Frequent Pattern Mining and Filtering k++
Our proposed RCA framework involves two steps: frequent pattern end
mining and filtering. Frequent patterns in the dataset are first return ∪Lk
reported, followed by an evaluation on how strongly each frequent
pattern correlates to the target failures.
In frequent pattern mining, each item should be a binary vari- By generating a large set of candidates and scanning through
able representing whether a characteristic (or a transaction in the the database many times, Apriori suffers from an exponential run
classical transaction database use-cases) exists. In a production time and memory complexity (O(2D )), making it impractical for
structured dataset, however, a column would usually represent one many production datasets. The FP-Growth algorithm, based on a
feature, which could have multiple distinct values, one for each special data structure FP-Tree, was introduced to deal with perfor-
entity. Therefore the structured log needs to first be transformed mance issues by leveraging a data structure that allows to bypass
into a schema that fits the frequent pattern mining formulation. the expensive candidate generation step [11]. FP-Growth uses
The transformation is done by applying one-hot encoding [12] on divide-and-conquer by mining short patterns recursively and then
each of the columns in the structured table. For a column in the combining them into longer item-sets.
structured table f , which has k possible values in a dataset, one-hot Frequent item-set mining through FP-Growth is done in two
encoding ”explodes” the schema and generate k columns {f 0 , f 1 , …, phases: FP-Tree construction and item-set generation. Algorithm 2
fk −1 }, each contains a binary value of whether the entity satisfies shows the process of FP-Tree construction. The FP-Tree construc-
f = fk . tion process takes two inputs: 1) the set of samples in the target
4
failure state (equivalent to a transaction database in classical fre- Algorithm 3: Frequent Item-set Generation
quent pattern mining literature), and 2) a min-support threshold,
based on which a pattern is classified as frequent or not. Each node Function FB-Growth(Tree, α):
in the tree consists of three fields, item-name, count, and node-link. if Tree contains a single path P then
item-name stores the item that the node represents, count repre- foreach combination β of nodes in path P do
sents the number of transactions covered by the portion of the path Generate pattern β ∪ α with support = min
reaching the node, and node-link links to the next node with the support of nodes in β
same item-name. The FP-tree is constructed in two scans of the end
dataset. The first scan finds the frequent items and sort them, and end
the second scan constructs the tree. else
foreach α i in tree do
Algorithm 2: FP-Tree Construction Generate pattern β = α i ∪ α with support =
α i .support
Scan data and find frequent items Construct β’s conditional pattern base and β’s
Order frequent items in decreasing order with respect to conditional FP-tree T β
support, F
if T β , ϕ then
Create root node T , labeled as NULL call FB-Growth(T β , β)
foreach transaction t in database do
end
foreach frequent item p in F do
end
if T has a child N such that N .item-set=p.item-set then
|N | + + end
end
else
Create N , link parent-link to T , and set already grouped by the distinct combinations of column values,
N .count = 1 with an additional weight column that records the count of the
Link N ’s node-link to nodes with the same identical entities. To handle this compact representation of the
item-name dataset, we modified the algorithms to account for the weights.
end This pre-aggregation significantly reduces the amount of data that
end we need to process in memory and would save > 100X of runtime
end in our production analyses.
Columns with specific information about the entities, e.g. the
host name of a machine or a job ID of a task, are excluded before
Algorithm 3 illustrates the process for generating the frequent the Scuba query. The aggregation in Scuba is only meaningful after
item-sets, based on the lemmas and properties Han et al. proposed excluding these columns, otherwise the aggregation would return
in [11]. A conditional pattern base is a sub-database which con- one entity per row due to the distinct values per entity. In addition
tains the set of frequent items co-occurring with the suffix pattern. to allowing the users to specify columns to be excluded in the
The process is initiated by calling F P-Growth(Tree, NU LL), then dataset, we also implement an automatic check to exclude columns
recursively building the conditional FP-Trees. with the number of distinct values > D portion of the number of
After finding the frequent item-sets in the dataset, we examine samples. Empirically, we use 2% as D in one of our applications,
how strongly the item-sets can differentiate positive (e.g. failed and the proper setting of D highly depends on the nature of the
hardware/jobs) samples from the negative ones. We use lift, defined dataset.
in Section 3.1, to filter out item-sets that are frequent in the failure Adding multithreading support to the algorithm speeds up the
state but not particularly useful in deciding if a sample will fail. For algorithm. Specifically, the Apriori algorithm generates a large
example, an item-set can be frequent in both non-failure and failure number of combinations, which need to be tested against the data.
states, and the evaluation based on lift would help us remove this By testing these combinations in parallel, we can scale up with
item-set from the output because it is not very useful in deciding the number of available cores. However, we found that FP-Growth
whether samples in that item-set would fail or not. outperforms Apriori even when Apriori is optimized with multi-
threading.
3.4 Pre- and Post-Processing for Performance
Optimization 3.5 Interpretability Optimization
We incorporated multiple optimizations as pre- and post-processing In production datasets, it is common that there exist a large number
to scale the RCA framework for accommodating near-realtime in- of distinct items, and the lengths of the findings are typically much
vestigations, which are important in responding to urgent system smaller than the number of the one-hot encoded feature columns (as
issues timely. Many entities in a production log are identical, ex- discussed in Section 3.3). As a result, there can be multiple findings
cept the columns that are unique identifiers of the entities such describing the same group of samples. To improve the quality
as the timestamps, hostnames, or job IDs. Utilizing Scuba’s scal- of the result, we implemented two filtering criteria for removing
able infrastructure [1], we query pre-aggregated data which are uninteresting results as described below:
5
Filter 1: item-set T is dropped if there is a proper subset User-Specified Filters
U , (U ⊂ T ) such that supp(U) > supp(T) and lift(U) ≥ lift(T)
If there exist shorter rules which have higher lift, longer rules
Variability-Based Filters
are pruned because they are less interesting. For example, consider
two rules:
Query and Dedup (Scuba)
{kernel A, server type B} => failure Y with lift 5
{kernel A, server type B, datacenter C} => failure Y with
One-Hot Encoding
lift 1.5
It is likely that describing the server and kernel interaction is Frequent Pattern Mining (Apriori / FP-Growth)
more significant than filtering by datacenter, therefore the second
rule is pruned, even though the lift values from both rules meet our
Initial Pattern Filtering Using Lift
threshold on the minimum lift.
Filter 2: An item-set T is dropped if there is a proper super-
set S, (S ⊃ T ) such that supp(S) = supp(T), lift(S) ≥ lift(T) Further Filtering for Interpretability
If a rule has a superset which describes the same number of
samples (same support), and the lift is equal or better, the superset Figure 1: The proposed framework for the Fast Dimensional
might be more interesting. For example, consider two rules: Analysis.
{datacenter A} => failure Y with support 0.8 and lift 2
{datacenter A, cluster B, rack C} => failure Y with support
0.8 and lift 2 tests and large based on support for it to have child nodes [4]. On
the other hand, as FP-Growth mines frequent item-sets only based
In this scenario, all of the samples in datacenter A are also in on support, as long as {A} ⇒ Y , {B} ⇒ Y , and {A, B} ⇒ Y have
cluster B and rack C. When trying to understand the root cause of enought support, they would all be reported as frequent item-sets.
the failures, knowing that our samples in datacenter A might not Then {A} ⇒ Y and {B} ⇒ Y will be filtered out due to low lift
be as interesting as knowing which rack the failures belong to. while {A, B} ⇒ Y will be reported in the final result.
100000
each length. The optimal parallelism level depends on the number of
10000 candidates, since each thread induces additional overhead. Figure 3
illustrates the runtime of Apriori at different levels of parallelism,
1000 based on the ASE dataset. The runtime is reported based on a
machine with approximately 50 GB memory and 25 processors. As
0.2 0.4 0.6 0.8 1 shown in the figure, a 9-thread parallelism resulted in the shortest
runtime, and every parallelism level up to 24 threads outperforms
Min-support
the single-threaded execution.
In Figure 4, we demonstrate the runtime at different number of
(b) SR dataset
item-sets. Note that the 9-thread configuration from Figure 3 is
used as the multi-threaded case here. It is clear that FP-Growth
Figure 2: Relationship between number of mined item-sets
outperforms single-threaded and multi-threaded Apriori, except
and min-support and max-length.
when the number of item-sets is small, as the overhead of setting up
7
1 5 9 24
200
Runtime (seconds)
150
100
50
0
0 0 0 0 0 0
00 00 00 00 00 00
10 20 30 40 50 60
Number of item-sets
the FP-Growth algorithm (see Section 3.3) is larger than the benefit
of not running Apriori’s candidate generation step.
For use-cases around periodic checks and alerting on new rules,
we mine a greater number of itemsets at the expense of runtime,
so FP-Growth is a better choice. For urgent investigations about
major issues in the applications, a smaller number of item-sets
is of interest and multi-threaded Apriori is chosen. In Figure 4b,
we see that for the SR dataset, Apriori can be faster when the
number of item-sets is smaller than 10000, which happens when
max-length < 4 or (max-length= 4 and min-support ≥ 0.8). For
the ASE dataset, multi-threaded Apriori is faster than FP-Growth
when the number of item-sets is smaller than 2000, which happens
when min-support ≥ 0.4. For a given dataset, running an initial
scan over the algorithms, lengths, and supports can help optimize
choice of algorithm and min-support.
60
in practice, the relationships between these parameters are highly
dependent on the nature of the datasets. Therefore we present
methods for optimizing the performance and interpretability. First
40
of all, min-support is a variable that controls how granular the
reported rules would be. In a real-time analysis or debug, a lower
20
max-length can be set to reduce runtime, and a higher min-support
can be applied to report only the most dominant issues in the dataset.
0 In a more thorough analysis that is less time-sensitive, a higher
2 4 6 8
max-length can be applied to generate a set of rules that are overall
Min-lift more interpretable, based on the filtering criteria in Section 3.5.
If there exists a clear convergence point given different max-
(a) ASE dataset lengths, a lower max-length should be used to avoid unnecessary
computation. If the RCA application is very sensitive to runtime
and the number of item-sets is small, one could first run the analysis
similar to the one presented in Figure 4 and use multi-threaded
40 Apriori in the region where it outperforms FP-Growth.
The advantage about support and lift is that they are very inter-
Number of association rules
30
pretable and intuitive metrics that any service engineer can adjust.
One intuition behind the lift value is to make sure we handle the
edge case where a label value, e.g. a specific failure state, has attri-
20
bution X , and no other label values has attribution X .
10
5 USE-CASE STUDY
The method discussed in this paper has been productionized in mul-
0
5000 10000 15000 20000 25000 tiple hardware, software, and tooling applications in our large-scale
service infrastructure. Deployment of this framework allows for
Min-lift fast root cause analysis as well as automatic alerting on new cor-
relations in the datasets, which may indicate unexpected changes
(b) SR dataset in the systems. In this section we present some of the applications
and the insights (after sanitizing the data) that were extracted by
Figure 5: Number of reported association rules vs. min-lift the proposed framework.
threshold. The triangle marks when a target rule appears in
the use cases discussed in Section 5. 5.1 Anomalous Hardware and Software
Configurations
2000
In large infrastructures, there are continuous maintenance activity
Number of association rules
15
Number of association rules
FDA
5
0
20000 40000 60000 80000 100000
6 CONCLUSIONS AND FUTURE WORKS
Min-lift
In this paper we have explored the problem of root cause analysis on
structured logs and we have presented our scalable approach based
Figure 8: Number of association rules at different min-lift on frequent item-set mining. We have also discussed a key change to
thresholds based on the SSH dataset. The triangle marks support and motivation for lift, which are important frequent item-
when a target rule appears in the use case. set metrics for our use-case. Furthermore, we presented various
optimizations to the core Apriori and FP-Growth frequent item-
set algorithms including parallelism and pre- and post-processing
5.5 Text Report-Based Root Cause Analysis methods. To our knowledge, this is the first work that utilizes
Another important RCA use case is to analyze a large text corpus. frequent item-set paradigm at the scale of a large internet company
Suppose that we are given a large number of text reports and each for root cause analysis on structured logs.
report can be represented by relevant features. The goal for this As part of our future work we aim to focus on temporal analy-
use case is to identify features that can represent the different sis and on gathering more continuous feedback. Specifically, for
topics in the reports. Due to the large number of text-based reports temporal analysis, we are working on understanding the trend of
and features, it is typically very challenging for humans to do association rules seen over time to discover seasonality and longer-
it manually. Below we present a generic example of applying the term trends in order to catch degradation of services or hardware
proposed fast dimensional analysis framework to text-based reports failures quicker. To overcome the lack of labels, we will focus on
for RCA. continuous feedback by leveraging our production system to un-
To utilize RCA to surface features for different topics, the text derstand which findings are more or less relevant to engineers in
reports first need to be labeled with topics. A very generic example order to learn which findings to promote and which findings need
of this can be: a report is about the topic of fispeech recognitionfi. to be hidden.
To label the topics, both supervised [23] and unsupervised methods
[5] can be applied. An advantage of supervised model is that we REFERENCES
can easily measure the quality of the inference and the topics are [1] Lior Abraham, John Allen, Oleksandr Barykin, Vinayak Borkar, Bhuwan Chopra,
Ciprian Gerea, Dan Merl, Josh Metzler, David Reiss, Subbu Subramanian, Janet
interpretable; however, it requires labeled data for training. Unsu- Wiener, and Okay Zed. 2013. Scuba: Diving into Data at Facebook. In International
pervised approach does not require labeled data but the topics are Conference on Very Large Data Bases (VLDB).
11
[2] R. Agrawal, T. Imielinski, and A. Swami. 1993. Mining association rules between
sets of items in large databases. In ACM SIGMOD International Conference on
Management of Data.
[3] S.D. Bay and M.J. Pazzani. 1999. Detecting change in categorical data: mining
contrast sets. In SIGKDD International Conference on Knowledge Discovery and
Data Mining. ACM.
[4] S.D. Bay and M.J. Pazzani. 2001. Detecting group differences: mining contrast
sets. Data Mining and Knowledge Discovery 5, 3 (2001).
[5] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet alloca-
tion. Journal of machine Learning research 3 (Jan 2003), 993–1022.
[6] Dhruba Borthakur. 2019. HDFS Architecture Guide. https://hadoop.apache.org/
docs/r1.2.1/hdfs design.html
[7] Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, , Jingren Zhou, Zhengping
Qian, Ming Wu, , and Lidong Zhou. 2014. Apollo: Scalable and Coordinated
Scheduling for Cloud-Scale Computing. In USENIX Symposium on Operating
Systems Design and Implementation.
[8] Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. 1997. Dynamic
itemset counting and implication rules for market basket data. In ACM SIGMOD
International Conference on Management of Data.
[9] Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi. 2017.
Automatically analyzing groups of crashes for finding correlations. In ESEC/FSE
Joint Meeting on Foundations of Software Engineering. ACM.
[10] Albert Greenberg, James Hamilton, David A. Maltz, and Parveen Patel. 2009. The
Cost of a Cloud: Research Problems in Data Center Networks. In ACM SIGCOMM
Computer Communication Review.
[11] Jiawei Han, Jian Pei, , and Yiwen Yin. 2000. Mining Frequent Patterns Without
Candidate Generation. In ACM SIGMOD International Conference on Management
of Data.
[12] David Harris and Sarah Harris. 2012. Digital design and computer architecture
(second ed.). Morgan Kaufmann.
[13] Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform
for Fine-Grained Resource Sharing in the Data Center. In USENIX conference on
Networked systems design and implementation.
[14] M. Isard. 2007. Autopilot: Automatic Data Center Management. In ACM SIGOPS
Operating System Review.
[15] Fan (Fred) Lin, Matt Beadon, Harish Dattatraya Dixit, Gautham Vunnam, Amol
Desai, and Sriram Sankar. 2018. Hardware Remediation At Scale. In IEEE/IFIP
International Conference on Dependable Systems and Networks Workshops.
[16] MySQL. 2019. MySQL Customer: Facebook. https://www.mysql.com/customers/
view/?id=757
[17] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka,
Ning Zhang, Suresh Antony, and Hao Liuand Raghotham Murthy. 2010. Hive - A
Petabyte Scale Data Warehouse Using Hadoop. In IEEE International Conference
on Data Engineering (ICDE).
[18] Martin Traverso. 2013. Presto: Interacting with petabytes of data at Face-
book. https://www.facebook.com/notes/facebook-engineering/presto-interacting-
with-petabytes-of-data-at-facebook/10151786197628920/ (2013).
[19] Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Ma-
hadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth
Seth, Bikas Saha, Carlo Curino, Owen OfiMalley, Sanjay Radia, Benjamin Reed,
and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource
Negotiator. In ACM Symposium on Cloud Computing.
[20] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. 2015.
Large-scale Cluster Management At Google with Borg. In European Conference
on Computer Systems (EuroSys).
[21] Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase
and topic discovery, with an application to information retrieval. In Seventh IEEE
International Conference on Data Mining (ICDM 2007). IEEE, 697–702.
[22] Kenny Yu and Chunqiang (CQ) Tang. 2019. Efficient, reliable cluster man-
agement at scale with Tupperware. https://engineering.fb.com/data-center-
engineering/tupperware/ (2019).
[23] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional
networks for text classification. In Advances in neural information processing
systems. 649–657.
[24] Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. 2014.
Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at
Internet Scale. In International Conference on Very Large Data Bases (VLDB).
12