Symantec Data Insight SDK Guide

Symantec Data Insight
Programmer's Reference
Guide
4.0
June 2013
Symantec Proprietary and Confidential
Symantec Data Insight Programmer's Reference Guide

The software described in this book is furnished under a license agreement and may be used
only in accordance with the terms of the agreement.
4.0
Documentation version: 4.0.0
Legal Notice
Copyright 2013 Symantec Corporation. All rights reserved.
Symantec, the Symantec Logo, the Checkmark Logo and are trademarks or registered
trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other
names may be trademarks of their respective owners.
This Symantec product may contain third party software for which Symantec is required
to provide attribution to the third party (Third Party Programs). Some of the Third Party
Programs are available under open source or free software licenses. The License Agreement
accompanying the Software does not alter any rights or obligations you may have under
those open source or free software licenses. Please see the Third Party Legal Notice Appendix
to this Documentation or TPIP ReadMe File accompanying this Symantec product for more
information on the Third Party Programs.
The product described in this document is distributed under licenses restricting its use,
copying, distribution, and decompilation/reverse engineering. No part of this document
may be reproduced in any form by any means without prior written authorization of
Symantec Corporation and its licensors, if any.
THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS,
REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO
BE LEGALLY INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL
OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING,
PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED
IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.
The Licensed Software and Documentation are deemed to be commercial computer software
as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19
"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, "Rights in
Commercial Computer Software or Commercial Computer Software Documentation", as
applicable, and any successor regulations. Any use, modification, reproduction release,
performance, display or disclosure of the Licensed Software and Documentation by the U.S.
Government shall be solely in accordance with the terms of this Agreement.
Symantec Corporation
350 Ellis Street
Mountain View, CA 94043
http://www.symantec.com
Technical Support
Symantec Technical Support maintains support centers globally. Technical
Supports primary role is to respond to specific queries about product features
and functionality. The Technical Support group also creates content for our online
Knowledge Base. The Technical Support group works collaboratively with the
other functional areas within Symantec to answer your questions in a timely
fashion. For example, the Technical Support group works with Product Engineering
and Symantec Security Response to provide alerting services and virus definition
updates.
Symantecs support offerings include the following:
A range of support options that give you the flexibility to select the right
amount of service for any size organization
Telephone and/or Web-based support that provides rapid response and

up-to-the-minute information
Upgrade assurance that delivers software upgrades
Global support purchased on a regional business hours or 24 hours a day, 7

days a week basis
Premium service offerings that include Account Management Services
For information about Symantecs support offerings, you can visit our website at
the following URL:
www.symantec.com/business/support/
All support services will be delivered in accordance with your support agreement
and the then-current enterprise technical support policy.
Contacting Technical Support

Customers with a current support agreement may access Technical Support
information at the following URL:
Before contacting Technical Support, make sure you have satisfied the system
requirements that are listed in your product documentation. Also, you should be
at the computer on which the problem occurred, in case it is necessary to replicate
the problem.
When you contact Technical Support, please have the following information
available:
Product release level
Hardware information
Available memory, disk space, and NIC information
Operating system
Version and patch level
Network topology
Router, gateway, and IP address information
Problem description:
Error messages and log files
Troubleshooting that was performed before contacting Symantec
Recent software configuration changes and network changes
Licensing and registration

If your Symantec product requires registration or a license key, access our technical
support Web page at the following URL:
Customer service
Customer service information is available at the following URL:
Customer Service is available to assist with non-technical questions, such as the
following types of issues:
Questions regarding product licensing or serialization
Product registration updates, such as address or name changes
General product information (features, language availability, local dealers)
Latest information about product updates and upgrades
Information about upgrade assurance and support contracts
Information about the Symantec Buying Programs
Advice about Symantec's technical support options
Nontechnical presales questions
Issues that are related to CD-ROMs, DVDs, or manuals
Support agreement resources

If you want to contact Symantec regarding an existing support agreement, please
contact the support agreement administration team for your region as follows:
Asia-Pacific and Japan
customercare_apac@symantec.com
Europe, Middle-East, and Africa
semea@symantec.com
North America and Latin America
supportsolutions@symantec.com
Contents
Technical Support ............................................................................................... 4

Chapter 1
About this guide .................................................................... 9

How this guide is organized ............................................................. 9
Chapter 2
DataInsight Query Language (DQL) ................................ 11

About Data Insight Query Language (DQL) ........................................
DQL Objects/Tables ......................................................................
About DQL Columns .....................................................................
device Columns ......................................................................
msu Columns ........................................................................
user Columns ........................................................................
groups Columns .....................................................................
path Columns ........................................................................
dfspath Columns ....................................................................
owner Columns ......................................................................
activity Columns ....................................................................
permission Columns ...............................................................
custodian Columns .................................................................
DQL Query Syntax ........................................................................
FROM clause .........................................................................
GET clause ............................................................................
FORMAT clause .....................................................................
IF clause ...............................................................................
USING clause ........................................................................
HAVING clause ......................................................................
GROUPBY clause ...................................................................
SORTBY clause ......................................................................
LIMIT clause .........................................................................
DQL functions .............................................................................
Example DQL queries ....................................................................
Chapter 3
11
11
13
13
14
15
16
17
19
22
23
24
25
26
26
26
27
30
32
35
35
36
36
36
38
Web API Specification ........................................................ 41

Web API specification for generic Collector service ............................ 41
Contents
Chapter 4
Creating custom scripts for remediation

actions ............................................................................. 53
About custom scripts .................................................................... 53
Chapter 5
Data Inventory Report schema ......................................... 57

Data Inventory report schema ........................................................
file_inventory table ................................................................
lob table ...............................................................................
user_lob table ........................................................................
user_totals table ....................................................................
user_interval_totals table ........................................................
lob_totals table ......................................................................
lob_interval_totals table ..........................................................
intervals table .......................................................................
msu_info table .......................................................................
dashboard_info table ..............................................................
Report configuration parameters ..............................................
57
57
59
59
59
60
60
60
61
61
61
62
Chapter
About this guide

This chapter includes the following topics:
How this guide is organized

This document contains a general description of the content and usage of the
Data Insight Software Developers Kit (SDK). Each chapter introduces and discusses
a Data Insight feature, its possible uses, and a description of how to use the
application programming interface for custom operations. The SDK contains
specific programming examples using these interfaces.
This guide provides an overview of the following Data Insight features that are
accessible with the SDK:
DataInsight Query Language (DQL) - Use DQL to create queries for the purpose
of creating customized reports.
See About Data Insight Query Language (DQL) on page 11.
The generic device web API - Use the API to extend platform support for the
storage devices that Data Insight monitors.
See Web API specification for generic Collector service on page 41.
For information about configuring a generic device in Data Insight and
credentials required to monitor the device, see the Symantec Data Insight
Administrator's Guide.
Custom scripts - Create scripts to define specific actions to handle remediation.

See About custom scripts on page 53.
To configure Data Insight to invoke these scripts to complete the custom
actions, see the Symantec Data Insight Administrator's Guide.
Schema of the Data Inventory Report.
10
About this guide

Chapter
DataInsight Query
Language (DQL)
About Data Insight Query Language (DQL)
DQL Objects/Tables
About DQL Columns
DQL Query Syntax
DQL functions
Example DQL queries
About Data Insight Query Language (DQL)

Data Insight Query Language(DQL) is a structured language to retrieve the
information that is stored in the Data Insight indices. Indices are the proprietary
internal data stores, that Data Insight use for storing information. DQL does not
provide the full functional capability of SQL, but it is expressive enough to allow
the users to easily extract, group, sort, and aggregate data.
DQL is a query-only language. You cannot use DQL to modify the Data Insight
indices. DQL queries are also protected by role-based-access-control, which means
that you can only see the information that you have access to.
DQL Objects/Tables
With DQL, you can run a query on objects and retrieve other objects as results. If
you are familiar with the SQL language, an object in DQL is similar to a table in
12
DataInsight Query Language (DQL)

DQL Objects/Tables
SQL. The attributes of an object are similar to the columns of the table. The output
of a DQL query is a relational database table with attribute values as column
values.
The complete list of DQL tables and their brief description is as shown:
device
Describes the details of the storage devices or content repository

servers that Data Insight monitors. For example, a NetApp or EMC
filer, a Windows File Server or a SharePoint web application.
msu
Describes the details of the Data Insight storage units. An msu is a

unit of storage space which can be a file share (in case of CIFS or
NFS) or a site-collection (in case of SharePoint
path
Describes the details of the file or directory paths to individual msus.
dfspath
Describes the details of the DFS file or directory paths to individual

msus
owner
Describes the details of the computed owners of a file or directory

paths.
user
Describes the details of the users that are listed in directory services
such as Active Directory, LDAP, or NIS+ directory server.
groups
Describes the details of the groups that are listed in directory services
such as Active Directory, LDAP, or NIS+.
activity
Describes the details of the activity events on specific paths, that

are made by specific users, at specific times. For example, an activity
object can describe the following: file \\netapp1\mydocs\Market
Research.doc was read by user John Smith at 1334123700 (Wed, 11th
April 2012 05:55:00 GMT).
permission
Describes the details of the NTFS or UNIX permissions that are set
on directory or file paths.
custodian
Describes the details of the custodians that are assigned to devices,

msus, directories, or files.
In the above mentioned list of objects, the owner object differs from the rest of
the object it is a computed object. Owner objects are not first class objects that
are stored in the Data Insight indices. They are computed at run-time depending
on the method that is to be used to calculate file ownership.

About DQL Columns
About DQL Columns

Unlike a SQL table whose columns can only contain a single value, a DQL table
can have columns with multiple values. For example, the group Domain Users has
multiple values for its column memberusers. A pair of square brackets around the
column name is used to indicate that the column is multi-valued.
With DQL, you can have a table with the columns that refer to other tables. For
example, the table groups has a column memberusers which refers to rows from
the user table. When you retrieve such reference columns, you need to specify
what columns you want to retrieve from the referred table. For example, you
cannot retrieve memberusers from groups without specifying which columns of
the user table you are interested in. So, you can select memberusers.name or
memberusers.sid but not just memberusers.
device Columns
Column
Type
Description
id
Integer
Unique identifier for this

device.
name
String
Name of this device.
type
String
Type of device (NetApp,

Celerra, WinNAS,
SharePoint).
collector
String
Name of Collector node.
indexer
String
Name of Indexer node.
[custodians]
[Custodian Object]
List of custodians for this

device.
capacity
Integer
Storage capacity of this

device.
used_space
Integer
The total amount of space

that all files and folders on
this device consume.
share_count
Integer
Number of shares on this

device.
open_share_count
Integer
Number of shares on this

device that are marked as
open.
13
14

About DQL Columns
Column
Type
Description
open_share_data_size
Integer
Total size of all open shares

on this device.
open_share_file_count
Integer
Total file count of all open

shares on this device.
file_count
Integer
Total file count of all shares

on this device.
sensitive_file_count
Integer
Number of sensitive files

across all shares on this
device.
folder_count
Integer
Total folder count of all

shares on this device.
activity_count
Integer
Total activity count on this

device (the activity count is
calculated for the last six
months).
Column
Type
Description
id
Integer

msu.
name
String
Name of this msu.
type
String
Type of msu (CIFS, NFSv3,

SharePoint).
device
Device Object
Device that this msu belongs

to.
indexer
String
Name of Indexer node.
indexdir
String
Path to index directory.
[custodians]
[Custodian Object]

msu.
[permissions]
[Permission Object]
List of permissions for this

msu (Share-level
permissions).
msu Columns

About DQL Columns
Column
Type
Description
isopen
Integer
1 if msu is open, otherwise 0.
activity_count
Integer
Total activity count in the

last six months.
active_user_count
Integer
Number of users who were

active in the last six months.
last_activity_time
Integer
Time of last recorded activity

on this msu.
size
Integer
Total size of this msu.
active_data_size
Integer
Total size of all active files

on this msu.
file_count
Integer
Number of files on this msu.
sensitive_file_count
Integer
Number of sensitive files on

this msu.
folder_count
Integer
Number of folders on this

msu.
most_active_user
User Object
User who is most active on

this msu.
Column
Type
Description
sid
String
Unique identifier of the user.
name
String
Full name of the user (e.g.,

John Smith).
login
String
Login of the user.
domain
String
Domain that the user belongs

to
firstname
String
First name of the user.
lastname
String
Last name of the user.
isdisabled
Integer
1 if the user is disabled, 0

otherwise.
user Columns
15
16

About DQL Columns
Column
Type
Description
isdeleted
Integer
1 if the user is deleted from

AD/LDAP, 0 otherwise.
buname
String
Name of the business unit

that this user belongs to.
buowner
String
Owner of the business unit

that user belongs to.
[memberof]
[Group Object]
Groups of which this user is

a member of.
<custom-attr>
[String]
Custom attribute of the user.

Replace <custom-attr> with
name of custom attribute, for
example, department. If the
name contains special
characters like -,*,%,^,/, etc.
enclose the name in quotes.
For example, "E-mail".
Column
Type
Description
sid
String
Unique identifier of this

group.
name
String
Name of this group.
domain
String
Domain of this group.
isdisabled
Integer
1 if the Group is disabled, 0

otherwise.
isdeleted
Integer
1 if the Group is deleted, 0

otherwise.
[memberof]
[Group Object]
Groups of which this group

is a member of.
[memberusers]
[User Object]
Users who are members of

this group.
[membergroups]
[Group Object]
Groups who are members of

this group.
groups Columns

About DQL Columns
Column
Type
Description
<custom-attr>
[String]
Custom attribute of Group.

Replace <custom-attr> with
name of custom attribute, for
example, location. If the
name contains special
characters like -,*,%,^,/, etc.
enclose the name in quotes.
For example, "E-mail".
Column
Type
Description
name
String
Name of path relative to the

msu.
absname
String
Absolute name of the path

containing device and share
names for example,
\\filer1\share100\a\b.
id
Integer

path within the msu.
parent
Path Object
Parent path of this path.
type
String
DIR for directory, FILE for

file.
device
Device Object
The device to which this path

belongs.
msu
msu Object
The msu to which this path

belongs.
size
Integer
Size of path in bytes. For

directories it is the size of all
files under the entire subtree.
last_accessed
Integer
Timestamp of when this path

was last accessed. Timestamp
is measured as the number
of seconds that have elapsed
since midnight UTC, January
1st, 1970.
path Columns
17
18

About DQL Columns
Column
Type
Description
last_modified
Integer

was last modified.
created_on
Integer

was created.
last_accessor
User Object
User who last accessed this

path.
last_modifier
User Object
User who last modified this

path.
creator
User Object
User who created this path.
creator_group
Group Object
Group creator of this path.
owner
Owner Object
Computed Owner of this

path.
isdeleted
Integer
1 if the path is deleted, 0

otherwise.
depth
Integer
Depth of the path from the

root of the share. For
example, / has a depth of 0,
/a has a depth of 1, /a/b
has a depth of 2.
activity_count
Integer

path.
isopen
Integer
1 if the path is open, 0

otherwise.
[open_reasons]
[String]
Reasons why the path is

considered open.
[permissions]
[Permission Object]
List of permissions on this

path.
[custodians]
[Custodian Object]

path.
issensitive
Integer
1 if the path is sensitive, 0

otherwise.
[filegroups]
[String]
List of filegroups for this

path.

About DQL Columns
Column
Type
Description
extension
String
File extension for this path.

For example PST, DOC etc.
[dfsnames]
[String]
List of DFS names for this

path.
[permitted_users]
[User Object]
List of users who have

permissions to access this
path.
permitted_users_count
Integer
Number of users who have

path.
[active_users]
[User Object]
List of users who are active

on this path.
active_users_count
Integer
Number of users who are

active on this path.
[inactive_users]
[User Object]
List of users who are inactive

on this path.
inactive_users_count
Integer

inactive on this path.
[dlp_policies]
[String]
List of DLP policies violated

by this path.
iscontrol_point
Integer
1 if the path is a control

point, 0 otherwise.
[control_point_reasons]
[String]

considered a control point.
filesystem_owner
User Object
Owner specified by the NTFS

file system.
Column
Type
Description
name
String
Name of DFS path relative to

the msu.
dfspath Columns
19
20

About DQL Columns
Column
Type
Description
absname
String
Absolute name of the DFS

path containing device and
share names for example,
\\dfsfiler1\dfsshare100\a\b.
id
Integer

path within the msu.
parent
DFS Path Object
Parent DFS path of this DFS

path.
physicalname
String
Absolute name of the

physical path that this DFS
path maps to.
type
String
DIR for directory, FILE for

file.
device
Device Object
The device to which this DFS

path belongs.
msu
msu Object
The msu to which this DFS

path belongs.
size
Integer
Size of path in bytes. For

directories it is the size of all
files under the entire subtree.
last_accessed
Integer

was last accessed. Timestamp
is measured as the number
of seconds that have elapsed
since midnight UTC, January
1st, 1970.
last_modified
Integer

was last modified.
created_on
Integer

was created.
last_accessor
User Object
User who last accessed this

path.
last_modifier
User Object
User who last modified this

path.
creator
User Object
User who created this path.

About DQL Columns
Column
Type
Description
creator_group
Group Object
Group creator of this path.
owner
Owner Object
Computed Owner of this

path.
isdeleted
Integer
1 if the path is deleted, 0

otherwise
depth
Integer
Depth of the path from the

root of the share. For
example, / has a depth of 0,
/a has a depth of 1, /a/b
has a depth of 2.
activity_count
Integer

path.
isopen
Integer
1 if the path is open, 0

otherwise.
[open_reasons]
[String]

considered open.
[permissions]
[Permission Object]
List of permissions on this

path.
[custodians]
[Custodian Object]

path.
issensitive
Integer
1 if the path is sensitive, 0

otherwise.
[filegroups]
[String]
List of filegroups for this

path.
extension
String
File extension for this path.

For example PST, DOC etc.
[permitted_users]
[User Object]
List of users who have

path.
permitted_users_count
Integer
Number of users who have

path.
[active_users]
[User Object]
List of users who are active

on this path.
21
22

About DQL Columns
Column
Type
Description
active_users_count
Integer

active on this path.
[inactive_users]
[User Object]
List of users who are inactive

on this path.
inactive_users_count
Integer

inactive on this path.
[dlp_policies]
[String]
List of DLP policies violated

by this path.
iscontrol_point
Integer
1 if the path is a control

point, 0 otherwise.
[control_point_reasons]
[String]

considered as a control point.
filesystem_owner
User Object
Owner specified by the NTFS

filesystem.
Column
Type
Description
path
Path Object
Path for which the owner is

computed.
dfspath
DFS Path Object
DFS path for which the

owner is computed.
user
User Object
The computed owner of the

path.
read_count
Integer
Number of read accesses

made by this user.
write_count
Integer
Number of write accesses

made by this user.
owner Columns

About DQL Columns
Column
Type
Description
method
String
The method that was used to

compute this owner. Possible
values are creator,
read_count, write_count,
rw_count, last_accessor,
last_modifier, and
parent_owner.
Column
Type
Description
timestamp
Integer
Timestamp of the activity.

Timestamp is measured as
the number of seconds that
have elapsed since midnight
UTC, January 1st, 1970.
timerange
Integer
Number of seconds since

timestamp that this activity
event might have happened.
user
User Object
User who initiated this

activity event.
path
Path Object
Path on which this activity

event occurred.
dfspath
DFS Path Object
DFS path on which this

activity event occurred.
opcode
Integer
Integer representing the

activity event.
operation
String
String notation of the

activity event (e.g., read,
write, create, delete, mkdir,
rmdir etc.).
count
Integer
Number of times this

operation was performed in
the timerange
ipaddr
String
IP address from where the

operation was performed.
activity Columns
23
24

About DQL Columns
Column
Type
Description
rename_target
Path Object
For rename or move

operations the target path to
which this path was
renamed.
dfs_rename_target
DFS Path Object
For rename or move

operations the target DFS
path to which this DFS path
was renamed.
Column
Type
Description
object_type
String
Type of object on which this

permission is set (msu, DIR).
path
Path Object
Path on which the

permission is set.
dfspath
DFS Path Object
DFS path on which the

permission is set.
msu
msu Object
The msu on which the

permission is set
trustee_type
String
Type of trustee (user, group).
user_trustee
User Object
Trustee of this permission.
group_trustee
Group Object
Trustee of this permission.
permission_mask
Integer
Permission bitmask.
readable_permission
String
List of readable permissions

read, write, full control etc.
type
String
Type of permission (GRANT,

DENY).
isinherited
Integer
1 if the permission is
inherited from parent.
inheriting_type
String
Type of object from which

this permission is inherited
(msu, DIR).
permission Columns

About DQL Columns
Column
Type
Description
inheriting_path
Path Object
Path from which this

permission is inherited.
inheriting_dfspath
DFS Path Object
DFS path from which this

inheriting_msu
msu Object
msu from which this

appliesto
String
Inheritance settings for this

permission (e.g. this folder,
all subfolders, only immediate
files).
Column
Type
Description
path
Path Object
Path on which the custodian

is assigned.
dfspath
DFS Path Object
DFS path on which the

custodian is assigned.
msu
msu Object
msu on which the custodian

is assigned.
device
Device Object
Device on which the

dfslink
String
DFS link on which the

user
User Object
The assigned custodian of

the path.
isinherited
Integer
1 if the custodian is inherited

from a parent (device, msu,
dir, dfslink).
inheriting_type
String
Type of object from which

the custodian is inherited
(device, msu, dir, dfslink).
inheriting_path
Path Object
Path from which the

custodian is inherited.
custodian Columns
25
26

DQL Query Syntax
Column
Type
Description
inheriting_dfspath
DFS Path Object
DFS path from which the

inheriting_msu
msu Object
msu from which the

inheriting_device
Device Object
Device from which the

inheriting_dfslink
String
DFS link from which the

DQL Query Syntax

The DQL query syntax and top-level grammatical constructs are as shown:
FROM
GET
[IF
[USING
[FORMAT
[GROUPBY
[HAVING
[SORTBY
[LIMIT
<table>
<column expression> [AS alias], <column expression> [AS alias], ...
<condition>]
<definition>]
<column> AS (CSV|TABLE <tablename>) [<count>]]
<column expression>, <column expression>, ...]
<aggregate-condition>]
<column expression> [ASC|DESC]]
[<offset>,]<count>];
FROM clause
The FROM specifies the table from which DQL retrieves the data. DQL does not
support joins as in SQL. You can only specify one table in the FROM clause.
GET clause
The GET clause specifies the columns (or expressions on columns) that you want
to retrieve from the table that you specify in the FROM clause.
DQL tables can have columns that refer to other tables. For example, the table
groups has a column memberusers which refers to rows from table user. When
you retrieve such reference columns, you need to specify what columns you want
to retrieve from the referred table. For example, you cannot retrieve memberusers
from groups without specifying which columns of the user table you are interested

DQL Query Syntax
in. So, you can select memberusers.name or memberusers.sid but not just
memberusers.
The column names in the output table are decided by the expressions used in the
GET clause. While displaying the output, DQL may optionally replace the period
( . ) with the underscore ( _ ). For example, for GET path.name, the output column
name in the SQLite database becomes path_name.
FORMAT clause
Data Insight tables can contain multi-valued columns. For example, path contains
a multivalued column permissions. When you specify the columns in the GET
clause, you also need to specify the manner in which you want their values to
appear in the output database table. Use the FORMAT clause to control the format
of the output in case of multi-valued columns. You can use two formatting options
as shown below:
FORMAT <column> AS CSV
The above syntax displays the output values for a multi-valued column as a
comma-separated list in a single column.
FORMAT <column> AS TABLE <tablename>
The above syntax displays the output values for a multi-valued column in a
separate table. Each row of this table contains a reference to its corresponding
row in the parent table.
The default value for the FORMAT clause is a TABLE. If you do not provide a
FORMAT clause in your query, DQL displays the contents of the multi-valued
columns in separate tables. And the name of the multi-valued column is displayed
as the default name of the table. For example, if you want to retrieve path
permissions and you do not specify the FORMAT clause, DQL displays the output
the permissions of a path in a separate table called permissions.
Consider this example:
FROM
GET
FORMAT
groups
name, memberusers.sid, memberusers.name
memberusers AS CSV
Since memberusers is a multi-valued column, the FORMAT clause on memberusers

needs to be specified
The above query creates an output table groups containing four columns
groups_rowid, name, memberusers_sid, memberusers_name. The column
groups_rowid is a default column present in all DQL output tables, containing an
27
28

DQL Query Syntax
identification number for each rows. The columns memberusers_sid and

memberusers_name contains a comma-separated list of member user sids and
names.
Example output table is as shown below:
groups
groups_rowid
name
memberusers_sid
memberusers_name
Domain Users
S-1,S-2,S-10,S-11
John,Jim,Paul,Steve
HR_Global
S-10,S-12
Paul,Jane
HR_US
S-10
Paul
Suppose that you change the query to:
FROM
GET
FORMAT
groups
memberusers AS TABLE memberusers
In this case, the output database contains two tables groups and memberusers.
The groups table has two columns groups_rowid and name. The memberusers
table has three columns groups_rowid, memberusers_sid, memberusers_name.
The groups_rowid column in the memberusers table is a reference to the
groups_rowid column from the groups table.
Example output tables are as shown below:
groups
groups_rowid
name
Domain Users
HR_Global
HR_US
memberusers
groups_rowid
memberusers_sid
memberusers_name
S-1
John

DQL Query Syntax
S-2
Jim
S-10
Paul
S-11
Steve
S-10
Paul
S-12
Jane
S-10
Paul
By default, DQL lists all memberusers of a group. Optionally, you can limit the
number of memberusers listed using the FORMAT clause. This is as shown in the
following query:
FROM
GET
FORMAT
group
memberusers AS CSV 4
This limits the output table to a maximum of four member user values for each
group. These four values are the first four members of the list.
Nested multi-valued columns

There may be situations where you need to specify nested multi-valued columns.
For example, the path table has a multi-valued column active_users, which is a
reference to user table. The table user, in turn, has a multi-valued column
memberof which indicates the groups that a user belongs to. If you want to get all
active users for a path and the groups that each active user belongs to, write your
query as shown below.
FROM
GET
FORMAT
path
name, active_users.name, active_users.memberof.name
active_users AS CSV AND
active_users.memberof AS CSV;
In this querys output table, the third column active_users_memberof_name lists

all the groups of all the paths active users. For example, suppose that path /foo
has active users Joe and Jane. Suppose that Joe belongs to groups HR and
ALL-Employees, while Jane belongs to groups Finance and ALL-Employees. The
output column for this query will then be HR, ALL-Employees, Finance,
ALL-Employees.
29
30

DQL Query Syntax
Notice that you have a flat list of all group names in this column. You have lost
information about what groups each of the active users belongs to. You only know
that there is one active user who belongs to HR, two who belong to ALL-Employees
and one who belongs to Finance.
IF clause
The IF clause is an optional clause that you can use to specify a set of conditions
on the rows that you want to retrieve. It is similar to the WHERE clause of SQL.
DQL retrieves only those rows whose columns satisfy the condition(s) provided
under the IF clause.
Operators
DQL supports the following binary operators that you can use to specify a
condition:
Comparison operators: >, <, >=, <=, =, ==, !=, <>
Logical operators: AND, &&, OR, ||
Arithmetic operators: +, -, *, /, %
List containment operators: IN, NOT IN
Constants
DQLs IF clause supports specification of constants in operations. Constants can
be either numeric or string. Some example of supported column-related operations
are as shown below:
IF size/1024 > 10
IF size = 10
IF name IN (John, Joe)
Note that string comparisons are case insensitive by default. To specify case
sensitive or case insensitive comparisons, you can use the CASE SENSITIVE and
CASE INSENSITIVE keywords.
IF name IN (John, Joe) CASE SENSITIVE
IF name = John CASE INSENSITIVE
Conditions on multi-valued columns

You can use EACH or ANY prefixes to specify the conditions on multi-valued
columns. EACH specifies that each value of the multi-valued column should satisfy

DQL Query Syntax
the condition while ANY specifies that any value of the multi-valued column
should satisfy the condition.
Suppose that you want to retrieve only those paths on which the user John is
active. You can write a query as shown below.
FROM
GET
IF
FORMAT
path
name, active_users.name
ANY active_users.name = "John"
active_users AS CSV;
Suppose that you want to retrieve paths on which either John or Joe are active.
You can write a query (query a) as shown below.
FROM
GET
IF
FORMAT
path
ANY active_users.name IN ("John","Joe")
The above query retrieves the paths on which either John is one of the active users
and/or Joe is one of the active users.
Suppose that you want to retrieve the paths that only have John and Joe as active
users. You can write a query (query b) as shown below.
FROM
GET
IF
FORMAT
path
EACH active_users.name IN ("John","Joe")
The above query retrieves paths where the only active users are John and/or Joe.
Note that in query (a), you get the paths on which John or Joe is one of the active
users whereas in query (b), you get the paths on which John and/or Joe are the
only active users.
Conditions on nested multi-valued columns

Since nested multi-valued columns evaluate to a flat list, you can specify conditions
on them using the ANY and EACH constructs as above.
For example, suppose that you want to retrieve those paths containing at least
one active user belonging to group HR. You can write a query (query b) as shown
below.
31
32

DQL Query Syntax
FROM
GET
IF
FORMAT
path
name, active_users.memberof.name
ANY active_users.memberof.name = "HR"
Suppose that you want to retrieve those paths containing active users who belong
only to groups HR and/or FINANCE. You can write a query (query b) as shown
bellow.
FROM
GET
IF
FORMAT
path
name, active_users.memberof.name
EACH active_users.memberof.name IN ("HR", "FINANCE")
Note that DQL by default uses the ANY construct if you do not specify an
ANY/EACH construct.
USING clause
Values of certain columns like owner are computed at run-time based on some
criteria. For example, to compute an owner of a file, you need to specify what
methods (like read_count, rw_count, parent_owner etc.) you want to use to
determine the owner. When you determine active users of a path, you need to
specify the time range you want to consider for the activity.
You can use the USING clause to specify such functions that can be applied to
obtain a column value.
The details of the DQL USING functions are as shown below.
Calculating the owner

calc_owner(start_time TEXT, end_time TEXT, date_format TEXT,
ordered_list_of_owner_methods TEXT)
Example usage in query:
FROM
GET
USING
path
name, owner.user.name, owner.method, owner.read_count
owner AS calc_owner("2012-01-01", "2012-06-01", "YYYY-MM-DD",
"rw_count, read_count, last_accessor");

DQL Query Syntax
33
If you dont specify a USING function for owner, DQL uses a default time range
of last 6 months and uses a data owner ordering of rw_count, write_count,
read_count, last_modifier, last_accessor, creator, parent_owner.
Calculating the active_users

calc_active_users(start_time TEXT, end_time TEXT, date_format TEXT)
FROM
GET
USING
FORMAT
path
active_users AS
calc_active_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")
If you dont specify a USING function for active_users, DQL uses a default time
range of last 6 months.
Calculating the active_users_count

get_active_users_count(start_time TEXT, end_time TEXT, date_format TEXT)
FROM
GET
USING
path
name, active_users_count
active_users_count AS
get_active_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");
If you dont specify a USING function for active_users_count, DQL uses a default
time range of last 6 months.
Calculating the inactive_users

calc_inactive_users(start_time TEXT, end_time TEXT, date_format TEXT)
FROM
path
34

DQL Query Syntax
GET
USING
FORMAT
name, inactive_users.name
inactive_users AS
calc_inactive_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")
inactive_users AS CSV;
If you dont specify a USING function for inactive_users, DQL uses a default time
range of last 6 months for calculating inactivity.
Calculating the inactive_users_count

get_inactive_users_count(start_time TEXT, end_time TEXT, date_format TEXT)
FROM
GET
USING
path
name, inactive_users_count
inactive_users_count AS
get_inactive_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");
If you dont specify a USING function for inactive_users_count, DQL uses a default
time range of last 6 months for calculating inactivity.
Calculating the activity_count

get_activity_count(start_time TEXT, end_time TEXT, date_format TEXT)
FROM
GET
USING
path
name, activity_count
activity_count AS
get_activity_count("2012-01-01 10:00", "2012-01-01 15:00",
"YYYY-MM-DD HH:mm");
If you dont specify a USING function for activity_count, DQL uses a default time
range of last 6 months for calculating activity.

DQL Query Syntax
HAVING clause
The HAVING clause is similar to the SQL HAVING clause and allows specification
of conditions on aggregate functions. The syntax of conditions that can be specified
in the HAVING clause is the same as that of the DQL IF clause.
Suppose that you want to retrieve the sum of the sizes of all shares for each filer.
You can write a query for this as shown bellow:
FROM
GET
GROUPBY
msu
filer.name, sum(size)
filer.name;
Now suppose that you want to select only those filers whose sum of share sizes
is greater than 1 GB (1,073,741,824 bytes). Then you need to modify the previous
query as:
FROM
GET
GROUPBY
HAVING
msu
filer.name
sum(size) > 1073741824;
GROUPBY clause
The GROUPBY clause is similar to the SQL GROUP BY clause. It enables you to
aggregate the output rows into groups. Suppose that you want to retrieve the sum
of the sizes of all shares for each filer. You can write a query for this as shown
below.
FROM
GET
GROUPBY
msu
filer.name;
DQL supports the following aggregation functions:
sum
count
max
min
35
36

DQL functions
SORTBY clause
The SORTBY clause is similar to the SQL ORDER BY clause. It enables you to sort
of the rows of the output table based on their column values.
FROM
GET
SORTBY
msu
name, size
size DESC;
If no sort order is specified, DQL defaults to ASC.
LIMIT clause
The LIMIT clause is similar to the SQL LIMIT clause and is used to limit the number
of output rows.
LIMIT
LIMIT
count
offset, count
[This will retrieve the first "count" rows]

[This will retrieve "count" rows starting from
"offset"]
offset values start from 1.
DQL functions
DQL supports the following built-in functions:
upper(X)
Converts string X to uppercase.
lower(X)
Converts string X to lowercase.
strlen(X)
Returns length of string X.
length(X)
Returns number of items in list X.
substr(X, Y)
Returns true if Y is a substring of X. The comparison is

case-sensitive.
substri(X, Y)
Returns true if Y is a substring of X. The comparison is

case insensitive.

DQL functions
match(X, P)
Returns true if X matches the regular expression pattern

P. Regular expression matching is case-sensitive.
Pattern P can be specified as Patterns matching a single
character or Patterns matching multiple characters.
You can refer to the following URLs for information on
pattern matching:
http://pubs.opengroup.org/onlinepubs/9699919799/
utilities/V3_chap02.html#tag_18_13_01
http://pubs.opengroup.org/onlinepubs/
9699919799/utilities/V3_chap02.html#tag_18_13_02
matchi(X, P)
Returns true if X matches the regular expression pattern

P. Regular expression matching is case insensitive.
Pattern P can be specified as Patterns matching a single
character or Patterns matching multiple characters.
You can refer to the following URLs for information on
pattern matching:
http://pubs.opengroup.org/onlinepubs/9699919799/
utilities/V3_chap02.html#tag_18_13_01
http://pubs.opengroup.org/onlinepubs/
9699919799/utilities/V3_chap02.html#tag_18_13_02
datetime(D, F)
Returns time in epoch for the string date D. The format

in which date D is specified is indicated by the format
string F. The options for F are:
YYYY 4 digit year
MM - month of year (01 12)
DD - date of month
HH - hour (00 24)
mm - minutes (00 59)
ss seconds (00 59)
Z timezone
Example: datetime(2012-01-10 -0800, YYYY-MM-DD
Z)
formatdate(T, F)
Converts time T in epoch to a string whose format is

specified with string F. The options for F are the same
as those used by datetime(D, F).
37
38

Example DQL queries
Example DQL queries
Get the name, size, active data size, percentage of data size that is active,
openness, and number of active users for each share
FROM
GET
msu
name, size, active_data_size,
(active_data_size*100/size) AS active_data_percent,
isopen, active_user_count;
Get the activity for all paths of share, share1, on March 4, 2012 between 9:00
A.M. and 5:00 P.M..
FROM
GET
IF
activity
path.name, user.name, operation,
formatdate(timestamp, "YYYY/MM/DD HH:mm")
path.msu.name = "share1" AND
timestamp >= datetime("2012/03/04 09:00", "YYYY/MM/DD HH:mm")
AND timestamp <= datetime("2012/03/04 17:00",
"YYYY/MM/DD HH:mm");
Since the timestamp column of activity is epoch, convert it to a readable format

using formatdate().
Get a list of all sensitive files from all shares of filer, filer1, sorted by size.
FROM
GET
IF
SORTBY
Get a list of all open paths and the reason why they are marked as open.
FROM
GET
IF
FORMAT
path
name, issensitive, size
issensitive = 1 AND type = "FILE" AND device.name = "filer1"
size DESC;
path
name, msu.name, isopen, open_reasons
isopen = 1
open_reasons AS CSV;
Get a list of all open paths and the reason why they are marked as open. Also,
list the permissions on each open path.

Example DQL queries
FROM
GET
IF
FORMAT
FROM
IF
USING
path
name, msu.name, owner.user.name, owner.method,
owner.read_count, owner.write_count
type = "DIR"
owner AS calc_owner("2012-01-01", "2012-06-01",
"YYYY-MM-DD","rw_count, last_modifier");
Get a list of all open paths and their inactive users.
FROM
GET
IF
USING
user
name, sid, login, domain, "E-mail", department,
memberof.sid, memberof.name
memberof AS table memberof_groups;
Get a list of all directories and their owners.
FROM
GET
path
name, msu.name, isopen, open_reasons,
permissions.user_trustee.name, permissions.group_trustee.name,
permissions.readable_permission, permissions.isinherited,
permissions.inheriting_path.name
isopen = 1
permissions AS TABLE permissions
AND open_reasons AS CSV;
Get a list of all users, their e-mail and department (custom attributes) and the
groups that they belong to.
FROM
GET
39
path
name, msu.name, isopen, inactive_users.name
isopen = 1
inactive_users AS calc_inactive_users("2012-01-01",
"2012-06-01","YYYY-MM-DD"
For each share, get the count of paths that have permissions set on Everyone
FROM
GET
IF
permissions
msu.name, count(path.id) AS risk_path_count
object_type = "DIR" AND group_trustee.name = "Everyone"
40

Example DQL queries
GROUPBY
SORTBY
AND isinherited = 0
msu.name
risk_path_count DESC;
The condition isinherited = 0 ensures that we only get the paths that have
permissions explicitly defined on Everyone and not populate all paths that
simply inherit those permissions.
Chapter
Web API Specification

Web API specification for generic Collector service

The web API for the Data Insight generic collector allows web clients to push
events for the generic device filers configured in the Data Insight deployment. It
also provides a method to add shares for the configured filers.
The web client communicates with the Data Insight Collector node using HTTPS
requests. The HTTPS communication is based on one-way SSL authentication.
The HTTP server runs with its unique self-signed SSL certificate. The SSL
certificate is created on the server when DataInsightGenericCollector service is
configured on it. The authentication is complete when the Data Insight Collector
node verifies the identity of the web client.
Data Insight Collector node uses the following mechanism to communicate with
the web client:
1.
The Data Insight server identifies the client using a login API request.
2.
On successful log in, the Data Insight server returns an authentication token
as the response. The same token is inserted into an HTTP cookie called
MATRIX_AUTH which is valid for 30 minutes. If the log in attempt is
unsuccessful, an HTTP response code 401 is returned.
3.
You must include the authentication token in each subsequent request to the
Data Insight server either in an HTTP request header called MATRIX_AUTH,
or in a cookie with the same name, or as an HTTP request input parameter
with the same name.
4.
Each token has an inactivity timeout interval of 30 minutes. The token expires
if the client does not send a request for 30 minutes. In case the Data Insight
42

server restarts, the client must obtain the authentication token by using the
login API. Data Insight uses the standard HTTP status code 401 to convey
that login is required. Data Insight returns the HTTP status code 401
(Unauthorized), if the client does not have the correct privileges.
5.
The user principal against which log in is performed can be any valid Data
Insight user with the Server Administrator role.
All URLs referenced in the documentation have the following base:

https://<hostname> :<port> /api
where <port> is the port number for DataInsightGenericCollector service. The
default value for port is 8585, and the port number is configurable through Data
Insight Management Console.
Use the following request calls to push events to the Data Insight Collector node
and to add the shares that you want Data Insight to monitor:
1.
Login
POST /api?function=LOGIN
Request parameters
Name
Description
username
Data Insight user name
domain
The domain to which the

user belongs
password
The user's password
format
Format of the response

output
Comment
Optional format=json
Request body
Do not supply a request body for this method.
Response
Login Success
If format=json is specified, then the authentication token is written on HTTP
response output in JSON format.
HTTP/1.1 200 OK
Content-Type: application/json

43
Status: 200 OK
{"auth_token":"A2360DD2D9BB7284EF8BEB40E8DBA63F"}
If no format is specified, the authentication token is written on HTTP response

output.
If login fails, HTTP status code 401 (Unauthorized) is returned.
2.
Upload Events
POST /api?function=COLLECTOR&cmd=upload_events_sqlite&event_type=<type>
Request parameters
Name
Description
Comment
MATRIX_AUTH
Authentication token
event_type
Optional (cifs|nfs)
The type of events in the
file that is uploaded on the
Collector.
Request body
The request can be an HTTP multi-part request or the request body can have
the contents of the file.
Response
If the file upload is successful, returns a response with following structure:
HTTP/1.1 200 OK
Status: 200 OK
{"status_code":<code>,"status_msg":"<msg>"}
Status code 0 indicates success.

On failure, returns status code 500 (Internal Server error) in case of an
unexpected error.
Details of the file to be uploaded
The events file must be a SQLite DB file that has a single table, named events.
Table schema
44

Column name
Type
Constraints
Description
filer
TEXT
NOT NULL
Filer's address as
added to the Data
Insight
configuration.
opcode
INTEGER
NOT NULL
An integer
describing the event
operation (For
example,READ=3,
WRITE=4) Please
refer to the
Protobuf format for
a complete set of
values.
username
TEXT
Username of the
user for CIFS
(Optional). UID of
the user in case of
an NFS event.
domainname
TEXT
Domain of the user

for CIFS (Optional).
Blank for NFS.
sid
TEXT
SID of the user for

CIFS. Blank in case
of NFS.
pathname
TEXT
renamepath
TEXT
Applicable in case of
rename event.
type
TEXT
Type of path.
(FOLDER=1,
FILE=2)
ipaddr
TEXT
IP address from
where the path was
accessed (optional).
NOT NULL
Path where the

event occurred.
Refer to note below
for format.

Column name
Type
Constraints
Description
timestamp
INTEGER
NOT NULL
Timestamp of event
in seconds as UNIX
epoch.
45
CREATE TABLE events (filer TEXT NOT NULL, opcode INTEGER NOT NULL, usern
domainname TEXT, sid TEXT,
pathname TEXT NOT NULL, renamepath TEXT, type TEXT, ipaddr TEXT
timestamp INTEGER NOT NULL);
Note: For CIFS events, the SID value is mandatory; user name and domain
name are optional.
For NFS events, SID should be blank, user name should be the UID, and domain
name should be blank.
For CIFS events, pathname should be the UNC path.
For NFS events, the pathname should be the absolute path of the file or the
folder.
3.
Push events in JSON or Google Protocol Buffers format

POST /api?function=COLLECTOR&cmd=push_events&input_format=<format>
Request parameters
Name
Description
MATRIX_AUTH
input_format
The format in which the

events are pushed to the
Collector.
Comment
json|proto
Request body
The request body must contain the events list in the specified format, Google
Protocol Buffers or JSON.
Response
Returns a response with following structure:
HTTP/1.1 200 OK
46

Status: 200 OK
{"status_code":<code>,"status_msg":"<msg>"}
Status code 0 indicates success.

On failure, returns status code 400 (BAD_REQUEST) for incorrect input format
parameter.
Returns the status code 500 (Internal Server Error) in case of an unexpected
error.
Google Protocol Buffers format for pushing events to the Collector
message AuditEventsListMessage {
optional int64 device_id = 1;
optional string device_name = 2;
repeated CifsEventMessage cifs_events = 3;
repeated NfsEventMessage nfs_events = 4;
}
message CifsEventMessage {
required
required
optional
required
optional
optional
optional
required
optional
}
AccessType opcode = 1;
string unc_path = 2;
string rename_path = 3;
PathType path_type = 4;
string sid = 5;
string username = 6;
string domain = 7;
uint64 timestamp_msec = 8;
string ip_address = 9;
message NfsEventMessage {
required
required
optional
required
required
optional
optional
required
optional
}
AccessType opcode = 1;
string path = 2;
string rename_path = 3;
PathType path_type = 4;
int64 uid = 5;
int64 gid = 6;
string domain = 7;
uint64 timestamp_msec = 8;
string ip_address = 9;

enum PathType {
UNKNOWN_PATHTYPE = -1;
FOLDER = 1;
FILE = 2;
}
enum AccessType {
CREATE
DELETE
READ
WRITE
RENAME
MKDIR
RMDIR
RENAMEDIR
SECURITY
SYMLINK
LINK
READLINK
OPEN
=
=
=
=
=
=
=
=
=
=
=
=
=
1;
2;
3;
4;
5;
8;
9;
10;
18;
19;
20;
21;
200000;
Note: device_name is the name of the filer as added in Data Insight

configuration.
CIFS and NFS events for a filer can be pushed by a single
AuditEventsListMessage.
For CIFS event, SID is mandatory; user name, and domain name are optional.
For NFS event, SID should be blank, UID is mandatory, and domain should
be blank.
LINK, SYMLINK, and READLINK are specific to NFS events only.
The AccessType parameter for events like permission change or ACL change
is SECURITY.
JSON format for pushing events to the Collector
{
"deviceId": <Number>,
"deviceName": <String>,
"cifsEvents": [
<CIFS Event>
],
47
48

"nfsEvents": [
<NFS Event>
]
}
<CIFS Event>
{
"opcode": <String>,
"uncPath": <String>,
"renamePath": <String>,
"pathType": <String>,
"sid": <String>,
username: <String>,
domain: <String>,
"timestampMsec": <Number>,
"ipAddress": <String>
}
<NFS Event> {
"opcode": <String>,
"path": <String>,
"renamePath": <String>,
"pathType": <String>,
"uid": <Number>,
"gid": <Number>,
domain: <String>,
"timestampMsec": <Number>,
"ipAddress": <String>
}
Note: opcode and pathType fields can take only a specific set of values. Refer
to the protobuf enums for a description of values for each field; enum
AccessType for the field opcode and enum PathType for the field pathType.
Example
{
"deviceId": 0,
"deviceName": "10.209.89.3",
"cifsEvents": [
{

49
"opcode": "RENAME",
"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data1",
"renamePath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data2 ",
"pathType": "FOLDER",
"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",
"timestampMsec": 1340003837,
"ipAddress": "172.31.163.29"
},
{
"opcode": "CREATE",
"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\New Folder",
"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",
"timestampMsec": 1340003847,
"ipAddress": "172.31.163.29"
}
],
"nfsEvents": [
{
"opcode": "MKDIR",
"path": "\/openldaphome\/DIRU1",
"uid": 0,
"gid": 0,
"domain": "0",
"timestampMsec": 1339680545
}
]
}
4.
Add shares
POST /api?function=COLLECTOR&cmd=add_shares&format=<format>
Request parameters
Name
Description
MATRIX_AUTH
format
Format of the response

output
Request body
Comment
proto|json
50

Supply JSON or Google Protocol Buffers formatted list of shares as input.

Response
On success, HTTP status code 200 is returned.
On failure to add shares, HTTP status code 500 (Internal server error) is
returned.
ProtoBuf format for adding shares
message SharesListMessage
{
optional int64 device_id = 1;
optional string device_name = 2;
repeated ShareMessage shares = 3;
}
message ShareMessage
{
enum ShareType
{
CIFS = 0;
NFS = 1;
}
optional string shareName = 1;
optional string sharePath = 2;
optional ShareType shareType = 3 [default = CIFS];
}
JSON format for adding shares

Shares list
{
"deviceId": {number},
"deviceName": {string},
"shares": [
]
}
JSON format for adding shares

{
"shareName": {string},
"sharePath": {string},

"shareType": {string}
}
Note: The shareType parameter accepts only specific set of values. For the
possible set of values, refer enum ShareType in the Protobuf definition.
Example
{
"deviceId": 0,
"deviceName": "10.209.111.193",
"shares": [
{
"shareName": "/openldaphome",
"sharePath": "/openldaphome",
"shareType": "NFS"
},
{
"shareName": "/nfstest",
"sharePath": "/nfstest",
"shareType": "NFS"
}
]
}
Note: Data Insight scans the shares that are added only when the user enables
scanning and provides the Scanner credentials for the filer.
51
52

Chapter
Creating custom scripts for

remediation actions
About custom scripts

You can use custom scripts to extend Data Insight functionality. You can use the
custom scripts to perform the following actions:
To create a remediation ticket.
To apply remediation actions based on Data Insight recommendations.
To define actions to manage data.
Data is supplied to the scripts via command line arguments. Arguments vary
based on what the script is used for. The scripts can be created in the .exe, .bat,
.pl, or .vbs formats.
Data Insight handles custom scripts differently depending on the type of operation.
Following list shows how Data Insight handles various types of scripts:
Custom scripts to create a remediation request.

Data Insight invokes the script by passing in two arguments:
custom_script.pl file_name <path_to_file_with_recommendation>.
For example,
ticketing.pl file_name
C:\DataInsight\data\workflow\tmp\PR_ticketing_1.txt
The second argument is full path to a text file containing the permission
recommendations. Each line in the text file contains one action and the required
54
Creating custom scripts for remediation actions

variables to perform that action. Lines are separated by a new line character.
The script should read each line of the input file and open one or more
remediation tickets as needed. If script exits with a non-0 exit code, the action
is considered to have failed. Each line in the file is of the following format:
OP:<OPCODE> PARAM:VALUE; PARAM:VALUE; ...
For example,
OP:REMOVE_ACE USER:foouser@domain.com;
PATH:\\fileserver1\share1\path;
Refer to the next section for possible values for opcodes and their parameters.
Custom scripts to apply permission recommendations.

You can specify custom scripts to directly commit changes to Active Directory
and CIFS file systems. You need to specify one script to make changes to Active
Directory and one script to make changes to CIFS permissions. The
recommendation is passed to the custom script as command line arguments
with following format: script.pl OP:<OPCODE> PARAM:VALUE
PARAM:VALUE... Exact PARAM and VALUE depends on opcode being passed.
If the script exits with non-0 code, Data Insight considers the operation to
have failed. For this release, Data Insight recommendations only consist of
removing user or group ACE for paths, and removing user or members from
AD groups. More operations will be supported in future releases.
For example,
AD.pl OP:DEL_GROUP_MEMBER AD_USER:user@domain
TARGET_GROUP:group@domain;.
Data Insight will supply the following opcode and arguments for Active
Directory remediation:
OP:DEL_GROUP_MEMBER AD_GROUP:<group@domain>|AD_USER:<user@domain>
TARGET_GROUP:<target_group@domain>
Data Insight will supply the following opcode and arguments for CIFS
remediation:
OP:REMOVE_ACE GROUP:<group@domain>|USER:<user@domain>
PATH:<unc_path>
Custom scripts to define specific tasks to manage data.

Data Insight invokes the script directly passing the operation and variables
as a part of command line arguments. Path is the mandatory argument passed
to the script. Other parameters passed to the script depend on how the Custom
Action has been configured in the Management Console. Format of command
line arguments passed to the custom script is:
script.pl path:<path> prop:val prop:val ....
For example,
archive_files.pl path:\\filer\share\path.txt size:25KB

Data Insight supports the following properties that can be passed to the custom
scripts:
Properties
Format
size
NNN KB|MB|GB. E.g. 34 KB
size_on_disk
NNN KB|MB|GB. E.g. 34 KB
created_by
user@domain. SID if user name cannot be

resolved
created_on
milliseconds since Jan 1st 1970
last_modified_by

resolved
last_modified_on
last_accessed_by

resolved
last_accessed_on
data_owner

resolved
custodian

resolved. Multiple custodians are
comma-separated
For detailed information about how to use custom scripts for data and permission
remediation, see the Symantec Data Insight Administrator's Guide.
55
56

Chapter
Data Inventory Report

schema
Data Inventory report schema

The Data Inventory Report is used to extract information about paths from the
Data Insight index. Output of this report is a sqlite database, which can be used
for post processing as needed. When configuring report of this type, you can
choose to have the output database copied to some external location where you
plan to post process the output.
file_inventory table
In this table, there is one row for each matching file that is found in the specified
index dbs.
CREATE TABLE
xid
sid
user_id
owner_account
displayname
owner_method
bu_name
bu_owner
filer
file_inventory (
INTEGER,
TEXT,
INTEGER,
TEXT,
TEXT,
TEXT,
TEXT,
TEXT,
TEXT,
58
Data Inventory Report schema

share
dfs_server
dfs_share
dfs_path
fid
path
msu_type
interval
sensitive
msu_id
read_count
write_count
file_size
atime
ctime
mtime
fs_sid
TEXT,
TEXT,
TEXT,
TEXT,
INTEGER,
TEXT,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
TEXT);
The xid column can be ignored, and should always be 1.
The sid is typically the Windows SID of the calculated owner of the file.
The owner_method column indicates the owner method that Data Insight used
to calculate the owner.
The user_id is the foreign key into the fileuser table of the current version of
the users.db stored in the DataInsight\Data\users folder. This is used for debug
purposes only.
The owner_account, displayname, bu_name and bu_owner columns are other

columns from the fileuser table.
The filer, share, path, dfs_server, dfs_share and dfs_path columns combine to
give the path to the file. The fid column is the foreign key into the fentry table
of the latest version of the index.db for this share. fid is used for debug purposes
only.
The msu_type is an integer value describing the type of share. There are four
possible values:
1 CIFS
2 SharePoint
3 NFS
8 DFS

The interval column is the foreign key into the intervals table below, based on
the last access time of the file.
The msu_id is the foreign key into the msu table of the latest version of the
config.db stored in the DataInsight\Data\conf folder.
Read count and write count are the aggregate number of audit events of each
time of events over the total time period specified for this run of the report.
File_size is the logical file size from the file system. Atime, ctime, and mtime
are the metadata for the file also pulled from the file system.
The fs_sid is the SID of the file system owner value from the file system
metadata.
lob table
This table consists of a list of distinct Lines of Businesses (LOBs). Other tables use
this table in a foreign key manner.
CREATE TABLE lob (

lob_id
INTEGER PRIMARY KEY,
lob_name
TEXT);
user_lob table
This table gives the mapping from users to the associated LOBs.
CREATE TABLE user_lob (

user_id
lob_id
INTEGER);
user_totals table
This table gives the total numbers of files, sensitive files etc. for each user. In the
final output, the msu_id column is displayed as empty. The user_id is the foreign
key into the fileuser table of the current version of the users.db stored in the
DataInsight\Data\users folder.
CREATE TABLE user_totals (

user_id
INTEGER,
msu_id
INTEGER,
total_files
INTEGER,
59
60

total_bytes
INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);
user_interval_totals table
This table breaks out the information from the user_totals table over each interval
specified from the input database. The interval_id is a foreign key to the intervals
table.
CREATE TABLE user_interval_totals (

user_id
INTEGER,
msu_id
INTEGER,
interval_id
INTEGER,
total_files
INTEGER,
total_bytes
INTEGER,
lob_totals table
Based on the mapping specified in the User_lob table, this table gives the total
numbers for each LOBs. In the final output, the msu_id column will be empty.
CREATE TABLE lob_totals (

lob_id
msu_id
INTEGER,
total_files
INTEGER,
total_bytes
INTEGER,
lob_interval_totals table
This table breaks out the information from the lob_totals table over each interval
specified from the input database. The interval_id is a foreign key into the intervals
table.
CREATE TABLE lob_interval_totals (

lob_id
INTEGER,
msu_id
INTEGER,

interval_id
total_files
total_bytes
sensitive_files
sensitive_bytes
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER);
intervals table
This table gives the beginning and end of each interval as specified in the input
database. The beginning and end times are specified as epoch numbers. For
example, the time 0 would be Midnight at Jan 1, 1970, and each higher number is
one second after that.
CREATE TABLE IF NOT EXISTS intervals(

interval
INTEGER, ///< 4 => most recent
///< 0 => before interval
start
INTEGER, ///< start month of interval
end
INTEGER); ///< end month of interval
msu_info table
This table copies the data from the Dashboard database to specify if the msu is
open. The msu_id column is a foreign key to the table of the latest version of the
config.db stored in the DataInsight\Data\conf folder.
CREATE TABLE msu_info (

msu_id
is_open
INTEGER);
dashboard_info table
This table is similar to the msu_info table in that it copies information from the
latest version of the Dashboard database into the report output database. There
may be a slight mismatch in the numbers here versus the totals from the
user_totals table. This difference happens due to the difference in the time at
which each set of numbers are calculated.
CREATE TABLE dashboard_info (

msu_id
dir_files
INTEGER,
dir_sens_files
INTEGER,
61
62

dash_files
dash_sens_files
INTEGER,
INTEGER);
Report configuration parameters

One important setting for the Data Inventory report is the separate_dbs
configuration setting. The separate_dbs setting forces the report to start a new
db file after the specified number of rows have been inserted into the detail table.
The separate_dbs setting indicates how many rows should be inserted into the
report output database details section before the db is closed, renamed and a new
db is started. If the output file name specified to the report process is
report_output.db, then the separate_dbs parameter will create files named
report_output.db.0, report_output.db.1, etc. every time the limit specified in the
setting is reached. The current db file being written to is always report_output.db,
and this file is where all of the summary data is written to. When merge_rpt runs,
it will no longer copy rows from the file_inventory table into the final output db.
It will only copy rows from the user_totals, etc. tables, and then create the
lob_totals, etc. tables. As in the log_level setting, report.separate_dbs is checked
first, and if not found, then the separate_dbs setting is checked.
You need to set this property for each Indexer node including the Management
Server node. For example, if the ID of your Indexer node is 3, issue the following
commands on your Management Server to set these properties for each node:
configdb -o -T node -k <nodeid> -J report.separate_dbs -j true
configdb -o -T node -k <nodeid> -J report.chunk_size -j 1000000

Symantec Data Insight SDK Guide

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Symantec Data Insight SDK Guide

Загружено:

Авторское право:

Доступные форматы

Symantec Data Insight

Symantec Proprietary and Confidential

Symantec Data Insight Programmer's Reference Guide

Symantec Proprietary and Confidential

Symantec Proprietary and Confidential

Telephone and/or Web-based support that provides rapid response and

Upgrade assurance that delivers software upgrades

Global support purchased on a regional business hours or 24 hours a day, 7

Premium service offerings that include Account Management Services

Contacting Technical Support

Product release level

Symantec Proprietary and Confidential

Available memory, disk space, and NIC information

Version and patch level

Router, gateway, and IP address information

Error messages and log files

Troubleshooting that was performed before contacting Symantec

Recent software configuration changes and network changes

Licensing and registration

Questions regarding product licensing or serialization

Product registration updates, such as address or name changes

General product information (features, language availability, local dealers)

Latest information about product updates and upgrades

Information about upgrade assurance and support contracts

Information about the Symantec Buying Programs

Advice about Symantec's technical support options

Nontechnical presales questions

Issues that are related to CD-ROMs, DVDs, or manuals

Symantec Proprietary and Confidential

Support agreement resources

Europe, Middle-East, and Africa

North America and Latin America

Symantec Proprietary and Confidential

Technical Support ............................................................................................... 4

About this guide .................................................................... 9

DataInsight Query Language (DQL) ................................ 11

Web API Specification ........................................................ 41

Creating custom scripts for remediation

Data Inventory Report schema ......................................... 57

Symantec Proprietary and Confidential

About this guide

How this guide is organized

How this guide is organized

Custom scripts - Create scripts to define specific actions to handle remediation.

Schema of the Data Inventory Report.

Symantec Proprietary and Confidential

About this guide

Symantec Proprietary and Confidential

About Data Insight Query Language (DQL)

About DQL Columns

DQL Query Syntax

Example DQL queries

About Data Insight Query Language (DQL)

DataInsight Query Language (DQL)

Describes the details of the storage devices or content repository

Describes the details of the Data Insight storage units. An msu is a

Describes the details of the file or directory paths to individual msus.

Describes the details of the DFS file or directory paths to individual

Describes the details of the computed owners of a file or directory

Describes the details of the activity events on specific paths, that

Describes the details of the custodians that are assigned to devices,

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)

About DQL Columns