Вы находитесь на странице: 1из 1

®

Cheat Sheet: Building a KNIME Workflow for Beginners

Getting started with KNIME Analytics Platform EXPLORE ANALYZE


Scatter Plot: Represents Sunburst Chart: Displays Stacked Area Chart: Plots

(All visualizations are interactive)


● Read through the installation guide at Decision Tree: The Learner node trains a C4.5
101.27
Decision Tree
100.00

input data rows as points categorical columns through multiple numerical data or a CART decision tree. The configuration
90.00

knime.com/installation
in a two dimensional plot. a hierarchy of rings. Each columns on top of each other
80.00

70.00 window includes options for pruning, early


● Check out the 7 things you should do after installing Input dimensions ring is sliced according to using the previous line as the 60.00
stopping, information measures, splitting
KNIME Analytics Platform at (columns) on the x-y axis the nominal values in the base reference. The areas in 50.00
values, and more. Both the Learner and the
knime.com/blog/seven-things plot and graphical corresponding column and between lines are colored for 40.00
Predictor node provide an interactive view
properties can be changed to the selected hierarchy. easier comparison. This chart where the decision tree is displayed together
30.00

in the configuration This is a powerful chart for is commonly used to visualize


20.00

● Take the E-Learning Course at 10.00 with the input data propagation.
knime.com/knime-introductory-course window or interactively in multivariate analysis. trending topics.
k-Means: Implements the k-Means clustering
0.00
k-Means
the node view. -10.00
-14.72
algorithm. Number of clusters must be set
● Browse the workflows on the public EXAMPLES
1.00 5.00 10.00 15.00 20.00 25.00 31.00

prior to node execution. This node builds the


Server available in the KNIME Explorer Line Plot
Line Plot: Plots numerical values in data columns Color Manager
Color Manager: Assigns a color property to Pie Chart
Pie Chart: Visualizes one aggregated metric for clusters. The Cluster Assigner node finds the
(y-axis) against values in a reference column each input row based on the row’s value in a different data partitions with colored slices on a closest cluster and assigns it to the input
Understanding the traffic light system: (x-axis). Data points are connected via colored lines. selected column. This color property affects circle where the areas are proportional to the metric data row. Being an unsupervised algorithm,
Not configured: Node is not yet configured and If the reference column on the x-axis contains the graphical representation in the upcoming values. The partitions are defined by a categorical this node pair doesn’t follow the classic
cannot be executed with its current settings sorted time values, the line plot graphically views. column. Learner - Predictor scheme.
represents the evolution of a time series.
Configured: Node has been correctly configured Logistic Regression Logistic Regression: The Learner node trains a
and may be executed at any time
Data Explorer
Data Explorer: Provides an interactive view to Box Plot
Box Plot: Visualizes numeric columns using Bar Chart
Bar Chart: Visualizes one or more aggregated metrics
summarize the statistics of the input data via the quartile statistics. Watch out for the points for different data partitions with rectangular bars logistic regression model to predict categorical
statistical measures and histograms - for both at the end of the whiskers - they might mark where the heights are proportional to the metric target values. The configuration window
Executed: Node has been successfully
numerical and nominal columns. outliers! values. The partitions are defined by a categorical includes options for solver, input feature
executed and results can be viewed and used in
column. choice, regularization functions to avoid
downstream nodes
overfitting, & more.
Scorer
Scorer: Calculates a number of performance
READ measures such as accuracy, F1-score, or
Cohen’s Kappa, to quantify the quality of a
File Reader File Reader: Reads all text files, particu- Table Reader Table Reader: Reads data from a .table file. Explore classifier.
larly character separated files, such as .table files are organized using a KNIME Learner Learner Nodes: Supervised algorithms in KNIME
CSV files. The File Reader is the proprietary format, including the full file Analytics Platform have a Learner node to train a
workhorse for reading text data. structure and are optimized for space and Predictor model on a previously labelled training set. Numeric Scorer
Numeric Scorer: Calculates a number of
speed - providing maximum performance with numerical error measures, such as root mean
minimum configuration! squared error, mean absolute error, or R^2, to
Excel Reader (XLS) Excel Reader (XLS): Reads content from Predictor Nodes: Used for applying models. The quantify the quality of a numerical predictor
sheets in Excel files (XLS, XLSX). Sheet two inputs are the trained model and the data to model.
and cells to be read can be defined in the Google Sheets
Reader
Google Sheets Reader: Reads data from a process. The output contains the original data and
configuration window. Google Sheet file. Authentication occurs on the model predictions. ROC Curve ROC Curve: Displays the Receiver Operating
the Google site. Google credentials are not Characteristic (ROC) curve of a classifier
saved within the KNIME workflow. working on a binary class problem. One of
Table Creator Table Creator: Allows users to manually
create a data table in its configuration Read Transform Analyze Deploy the two classes is arbitrarily chosen as the
window as a data sheet. Data cells can be positive class and the ROC curve is built on
copied and pasted in the sheet. Perfect knime:// protocol: References a file path relative to the probabilities/scores produced for that
for generating small data sets. some key location of the current KNIME installation like class on the input data set.
knime://knime.workflow/../<filename> or knime://<knime.-
Integrations to many open source data analytics tools are
Model Reader
Model Reader: Reads machine learning server.mountpoint>/<path>/<filename>
also available. Some use the KNIME node GUI (H2O, Weka,
models generated with any of the Learner Keras, Spark MLlib). Others offer nodes with a
nodes. Models are usually saved after development environment for scripting and debugging (R,
training and reused in deployment. Python, Java).

TRANSFORM DEPLOY Resources


Data to Report
Data to Report: Marks the data table to be exported to BIRT
GroupBy
GroupBy: Groups the rows of a table by the unique Math Formula Math Formula: Implements a number of math Joiner
Joiner: Joins rows from two data tables based - a partially open source reporting tool integrated within ● KNIME Forum: Join our global
values in selected columns and calculates operations across multiple input columns, from on common values in one or more key columns. KNIME. When switching from KNIME to BIRT, the marked community and engage in conversa-
aggregation and statistical measures for the simple sum and average, to logarithms and The most common join types are possible: inner data sets are imported into BIRT. The Image To Report tions at forum.knime.com
defined groups. Despite its simple name, it offers exponentials. All Math Formula operators are also join, left outer join, right outer join, and full outer node marks the input images to be exported to BIRT.
powerful functionality and has many unsuspected available in the Column Expressions node. join. ● KNIME Books: More tips, ideas, and
usages. For example - row deduplication. lessons from knime.com/knimepress
Excel Writer (XLS) Excel Writer (XLS): Writes the input data table to a sheet
in an Excel file (XLS or XLSX).
Pivoting Pivoting: Extends the aggregation functionality of String to Date&Time String to Date&Time: Converts values in a String Sorter Sorter: Sorts the table in ascending or ● KNIME Events: Take a course, attend
the GroupBy node by creating an output data table column into Date&Time values. The Date&Time descending order based on the values of a a workshop, or join a meetup at
with columns and rows for the unique values in format contained in the String values can be chosen column. In addition, it is possible to sort knime.com/events
selected input columns. Note: the unique values of manually defined or auto guessed. based on multiple columns.
the grouping column become rows and the unique
Table Writer
Table Writer: Writes the input data table to a file using the
● KNIME Blog: Engaging topics,
values of the pivoting column become columns. .table KNIME proprietary format. This format includes the
full file structure and is optimized for space and speed. challenges, industry news, and
Including the table structure in the file is a great advantage knowledge nuggets at knime.com/blog
Rule Engine
Rule Engine: Applies a set of rules to each row of Cell Splitter
Cell Splitter: Splits values in a selected column into Concatenate
Concatenate: Merges vertically two data tables,
the input data table. All Rule Engine operators are two or more substrings, as defined by a delimiter by piling up cells in columns with the same - especially when exchanging data files among users.
also available in the Column Expressions node. match. Delimiter is a set character, such as a name. Cells in uncommon columns are filled with ● Workflow Hub: Browse our example
comma, space, or any other character or character missing values. The Concatenate (Optional in)
CSV Writer CSV Writer: Writes the input data table to a CSV file. workflows and/or share your own
sequence. node merges vertically up to four data tables. workflows. Show appreciation for
others by adding ratings, or comments
at workflows.knime.com
Partitioning
Partitioning: Splits data into two subsets Column Filter
Column Filter: Filters columns in or out from the Missing Value
Missing Value: Defines a strategy to deal with
according to a sampling strategy. This node is input data table according to a filtering rule. missing values in the input data table - either Google Sheets Writer: Writes the input data table into a
Google Sheets ● More Guides: Still using SAS or
generally used to produce a training and a test set Columns to be retained can be manually picked or globally on all columns, or individually for each Writer
Google Sheet file. Authentication occurs on the Google
to train and evaluate a machine learning model. selected according to their type, or of a regex single column. Excel? Transition to KNIME Analytics
site. Google credentials are not saved within the KNIME Platform with these handy guides at
expression matching their name. workflow. knime.com/knimepress
Row Filter
Row Filter: Filters rows in or out from the input data Column Rename
Column Rename: Assigns new names and types to String Manipulation
String Manipulation: Performs operations on
table according to a filtering rule. The filtering rule selected columns, as configured in the dialog. String values in columns, such as combining two
Send to Tableau
Server
Connectors to Tableau: Export input data table into a ● KNIME Server: For team-based
can match a value in a selected column or numbers or more Strings together, extracting one or more Tableau file or server for reporting. collaboration, automation, manage-
in a numerical range. substrings, trimming blank spaces, and so on. All ment, and deployment check out
operators are also available in the Column KNIME Server at knime.com/server
Expressions node.

Вам также может понравиться