Вы находитесь на странице: 1из 62

Symantec Data Insight

Programmer's Reference
Guide
4.0

June 2013

Symantec Proprietary and Confidential

Symantec Data Insight Programmer's Reference Guide


The software described in this book is furnished under a license agreement and may be used
only in accordance with the terms of the agreement.
4.0
Documentation version: 4.0.0

Legal Notice
Copyright 2013 Symantec Corporation. All rights reserved.
Symantec, the Symantec Logo, the Checkmark Logo and are trademarks or registered
trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other
names may be trademarks of their respective owners.
This Symantec product may contain third party software for which Symantec is required
to provide attribution to the third party (Third Party Programs). Some of the Third Party
Programs are available under open source or free software licenses. The License Agreement
accompanying the Software does not alter any rights or obligations you may have under
those open source or free software licenses. Please see the Third Party Legal Notice Appendix
to this Documentation or TPIP ReadMe File accompanying this Symantec product for more
information on the Third Party Programs.
The product described in this document is distributed under licenses restricting its use,
copying, distribution, and decompilation/reverse engineering. No part of this document
may be reproduced in any form by any means without prior written authorization of
Symantec Corporation and its licensors, if any.
THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS,
REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO
BE LEGALLY INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL
OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING,
PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED
IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.
The Licensed Software and Documentation are deemed to be commercial computer software
as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19
"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, "Rights in
Commercial Computer Software or Commercial Computer Software Documentation", as
applicable, and any successor regulations. Any use, modification, reproduction release,
performance, display or disclosure of the Licensed Software and Documentation by the U.S.
Government shall be solely in accordance with the terms of this Agreement.

Symantec Proprietary and Confidential

Symantec Corporation
350 Ellis Street
Mountain View, CA 94043
http://www.symantec.com

Symantec Proprietary and Confidential

Technical Support
Symantec Technical Support maintains support centers globally. Technical
Supports primary role is to respond to specific queries about product features
and functionality. The Technical Support group also creates content for our online
Knowledge Base. The Technical Support group works collaboratively with the
other functional areas within Symantec to answer your questions in a timely
fashion. For example, the Technical Support group works with Product Engineering
and Symantec Security Response to provide alerting services and virus definition
updates.
Symantecs support offerings include the following:

A range of support options that give you the flexibility to select the right
amount of service for any size organization

Telephone and/or Web-based support that provides rapid response and


up-to-the-minute information

Upgrade assurance that delivers software upgrades

Global support purchased on a regional business hours or 24 hours a day, 7


days a week basis

Premium service offerings that include Account Management Services

For information about Symantecs support offerings, you can visit our website at
the following URL:
www.symantec.com/business/support/
All support services will be delivered in accordance with your support agreement
and the then-current enterprise technical support policy.

Contacting Technical Support


Customers with a current support agreement may access Technical Support
information at the following URL:
www.symantec.com/business/support/
Before contacting Technical Support, make sure you have satisfied the system
requirements that are listed in your product documentation. Also, you should be
at the computer on which the problem occurred, in case it is necessary to replicate
the problem.
When you contact Technical Support, please have the following information
available:

Product release level

Symantec Proprietary and Confidential

Hardware information

Available memory, disk space, and NIC information

Operating system

Version and patch level

Network topology

Router, gateway, and IP address information

Problem description:

Error messages and log files

Troubleshooting that was performed before contacting Symantec

Recent software configuration changes and network changes

Licensing and registration


If your Symantec product requires registration or a license key, access our technical
support Web page at the following URL:
www.symantec.com/business/support/

Customer service
Customer service information is available at the following URL:
www.symantec.com/business/support/
Customer Service is available to assist with non-technical questions, such as the
following types of issues:

Questions regarding product licensing or serialization

Product registration updates, such as address or name changes

General product information (features, language availability, local dealers)

Latest information about product updates and upgrades

Information about upgrade assurance and support contracts

Information about the Symantec Buying Programs

Advice about Symantec's technical support options

Nontechnical presales questions

Issues that are related to CD-ROMs, DVDs, or manuals

Symantec Proprietary and Confidential

Support agreement resources


If you want to contact Symantec regarding an existing support agreement, please
contact the support agreement administration team for your region as follows:
Asia-Pacific and Japan

customercare_apac@symantec.com

Europe, Middle-East, and Africa

semea@symantec.com

North America and Latin America

supportsolutions@symantec.com

Symantec Proprietary and Confidential

Contents

Technical Support ............................................................................................... 4


Chapter 1

About this guide .................................................................... 9


How this guide is organized ............................................................. 9

Chapter 2

DataInsight Query Language (DQL) ................................ 11


About Data Insight Query Language (DQL) ........................................
DQL Objects/Tables ......................................................................
About DQL Columns .....................................................................
device Columns ......................................................................
msu Columns ........................................................................
user Columns ........................................................................
groups Columns .....................................................................
path Columns ........................................................................
dfspath Columns ....................................................................
owner Columns ......................................................................
activity Columns ....................................................................
permission Columns ...............................................................
custodian Columns .................................................................
DQL Query Syntax ........................................................................
FROM clause .........................................................................
GET clause ............................................................................
FORMAT clause .....................................................................
IF clause ...............................................................................
USING clause ........................................................................
HAVING clause ......................................................................
GROUPBY clause ...................................................................
SORTBY clause ......................................................................
LIMIT clause .........................................................................
DQL functions .............................................................................
Example DQL queries ....................................................................

Chapter 3

11
11
13
13
14
15
16
17
19
22
23
24
25
26
26
26
27
30
32
35
35
36
36
36
38

Web API Specification ........................................................ 41


Web API specification for generic Collector service ............................ 41
Symantec Proprietary and Confidential

Contents

Chapter 4

Creating custom scripts for remediation


actions ............................................................................. 53
About custom scripts .................................................................... 53

Chapter 5

Data Inventory Report schema ......................................... 57


Data Inventory report schema ........................................................
file_inventory table ................................................................
lob table ...............................................................................
user_lob table ........................................................................
user_totals table ....................................................................
user_interval_totals table ........................................................
lob_totals table ......................................................................
lob_interval_totals table ..........................................................
intervals table .......................................................................
msu_info table .......................................................................
dashboard_info table ..............................................................
Report configuration parameters ..............................................

Symantec Proprietary and Confidential

57
57
59
59
59
60
60
60
61
61
61
62

Chapter

About this guide


This chapter includes the following topics:

How this guide is organized

How this guide is organized


This document contains a general description of the content and usage of the
Data Insight Software Developers Kit (SDK). Each chapter introduces and discusses
a Data Insight feature, its possible uses, and a description of how to use the
application programming interface for custom operations. The SDK contains
specific programming examples using these interfaces.
This guide provides an overview of the following Data Insight features that are
accessible with the SDK:

DataInsight Query Language (DQL) - Use DQL to create queries for the purpose
of creating customized reports.
See About Data Insight Query Language (DQL) on page 11.

The generic device web API - Use the API to extend platform support for the
storage devices that Data Insight monitors.
See Web API specification for generic Collector service on page 41.
For information about configuring a generic device in Data Insight and
credentials required to monitor the device, see the Symantec Data Insight
Administrator's Guide.

Custom scripts - Create scripts to define specific actions to handle remediation.


See About custom scripts on page 53.
To configure Data Insight to invoke these scripts to complete the custom
actions, see the Symantec Data Insight Administrator's Guide.

Schema of the Data Inventory Report.

Symantec Proprietary and Confidential

10

About this guide


How this guide is organized

Symantec Proprietary and Confidential

Chapter

DataInsight Query
Language (DQL)
This chapter includes the following topics:

About Data Insight Query Language (DQL)

DQL Objects/Tables

About DQL Columns

DQL Query Syntax

DQL functions

Example DQL queries

About Data Insight Query Language (DQL)


Data Insight Query Language(DQL) is a structured language to retrieve the
information that is stored in the Data Insight indices. Indices are the proprietary
internal data stores, that Data Insight use for storing information. DQL does not
provide the full functional capability of SQL, but it is expressive enough to allow
the users to easily extract, group, sort, and aggregate data.
DQL is a query-only language. You cannot use DQL to modify the Data Insight
indices. DQL queries are also protected by role-based-access-control, which means
that you can only see the information that you have access to.

DQL Objects/Tables
With DQL, you can run a query on objects and retrieve other objects as results. If
you are familiar with the SQL language, an object in DQL is similar to a table in
Symantec Proprietary and Confidential

12

DataInsight Query Language (DQL)


DQL Objects/Tables

SQL. The attributes of an object are similar to the columns of the table. The output
of a DQL query is a relational database table with attribute values as column
values.
The complete list of DQL tables and their brief description is as shown:
device

Describes the details of the storage devices or content repository


servers that Data Insight monitors. For example, a NetApp or EMC
filer, a Windows File Server or a SharePoint web application.

msu

Describes the details of the Data Insight storage units. An msu is a


unit of storage space which can be a file share (in case of CIFS or
NFS) or a site-collection (in case of SharePoint

path

Describes the details of the file or directory paths to individual msus.

dfspath

Describes the details of the DFS file or directory paths to individual


msus

owner

Describes the details of the computed owners of a file or directory


paths.

user

Describes the details of the users that are listed in directory services
such as Active Directory, LDAP, or NIS+ directory server.

groups

Describes the details of the groups that are listed in directory services
such as Active Directory, LDAP, or NIS+.

activity

Describes the details of the activity events on specific paths, that


are made by specific users, at specific times. For example, an activity
object can describe the following: file \\netapp1\mydocs\Market
Research.doc was read by user John Smith at 1334123700 (Wed, 11th
April 2012 05:55:00 GMT).

permission

Describes the details of the NTFS or UNIX permissions that are set
on directory or file paths.

custodian

Describes the details of the custodians that are assigned to devices,


msus, directories, or files.

In the above mentioned list of objects, the owner object differs from the rest of
the object it is a computed object. Owner objects are not first class objects that
are stored in the Data Insight indices. They are computed at run-time depending
on the method that is to be used to calculate file ownership.

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

About DQL Columns


Unlike a SQL table whose columns can only contain a single value, a DQL table
can have columns with multiple values. For example, the group Domain Users has
multiple values for its column memberusers. A pair of square brackets around the
column name is used to indicate that the column is multi-valued.
With DQL, you can have a table with the columns that refer to other tables. For
example, the table groups has a column memberusers which refers to rows from
the user table. When you retrieve such reference columns, you need to specify
what columns you want to retrieve from the referred table. For example, you
cannot retrieve memberusers from groups without specifying which columns of
the user table you are interested in. So, you can select memberusers.name or
memberusers.sid but not just memberusers.

device Columns
Column

Type

Description

id

Integer

Unique identifier for this


device.

name

String

Name of this device.

type

String

Type of device (NetApp,


Celerra, WinNAS,
SharePoint).

collector

String

Name of Collector node.

indexer

String

Name of Indexer node.

[custodians]

[Custodian Object]

List of custodians for this


device.

capacity

Integer

Storage capacity of this


device.

used_space

Integer

The total amount of space


that all files and folders on
this device consume.

share_count

Integer

Number of shares on this


device.

open_share_count

Integer

Number of shares on this


device that are marked as
open.

Symantec Proprietary and Confidential

13

14

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

open_share_data_size

Integer

Total size of all open shares


on this device.

open_share_file_count

Integer

Total file count of all open


shares on this device.

file_count

Integer

Total file count of all shares


on this device.

sensitive_file_count

Integer

Number of sensitive files


across all shares on this
device.

folder_count

Integer

Total folder count of all


shares on this device.

activity_count

Integer

Total activity count on this


device (the activity count is
calculated for the last six
months).

Column

Type

Description

id

Integer

Unique identifier for this


msu.

name

String

Name of this msu.

type

String

Type of msu (CIFS, NFSv3,


SharePoint).

device

Device Object

Device that this msu belongs


to.

indexer

String

Name of Indexer node.

indexdir

String

Path to index directory.

[custodians]

[Custodian Object]

List of custodians for this


msu.

[permissions]

[Permission Object]

List of permissions for this


msu (Share-level
permissions).

msu Columns

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

isopen

Integer

1 if msu is open, otherwise 0.

activity_count

Integer

Total activity count in the


last six months.

active_user_count

Integer

Number of users who were


active in the last six months.

last_activity_time

Integer

Time of last recorded activity


on this msu.

size

Integer

Total size of this msu.

active_data_size

Integer

Total size of all active files


on this msu.

file_count

Integer

Number of files on this msu.

sensitive_file_count

Integer

Number of sensitive files on


this msu.

folder_count

Integer

Number of folders on this


msu.

most_active_user

User Object

User who is most active on


this msu.

Column

Type

Description

sid

String

Unique identifier of the user.

name

String

Full name of the user (e.g.,


John Smith).

login

String

Login of the user.

domain

String

Domain that the user belongs


to

firstname

String

First name of the user.

lastname

String

Last name of the user.

isdisabled

Integer

1 if the user is disabled, 0


otherwise.

user Columns

Symantec Proprietary and Confidential

15

16

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

isdeleted

Integer

1 if the user is deleted from


AD/LDAP, 0 otherwise.

buname

String

Name of the business unit


that this user belongs to.

buowner

String

Owner of the business unit


that user belongs to.

[memberof]

[Group Object]

Groups of which this user is


a member of.

<custom-attr>

[String]

Custom attribute of the user.


Replace <custom-attr> with
name of custom attribute, for
example, department. If the
name contains special
characters like -,*,%,^,/, etc.
enclose the name in quotes.
For example, "E-mail".

Column

Type

Description

sid

String

Unique identifier of this


group.

name

String

Name of this group.

domain

String

Domain of this group.

isdisabled

Integer

1 if the Group is disabled, 0


otherwise.

isdeleted

Integer

1 if the Group is deleted, 0


otherwise.

[memberof]

[Group Object]

Groups of which this group


is a member of.

[memberusers]

[User Object]

Users who are members of


this group.

[membergroups]

[Group Object]

Groups who are members of


this group.

groups Columns

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

<custom-attr>

[String]

Custom attribute of Group.


Replace <custom-attr> with
name of custom attribute, for
example, location. If the
name contains special
characters like -,*,%,^,/, etc.
enclose the name in quotes.
For example, "E-mail".

Column

Type

Description

name

String

Name of path relative to the


msu.

absname

String

Absolute name of the path


containing device and share
names for example,
\\filer1\share100\a\b.

id

Integer

Unique identifier for this


path within the msu.

parent

Path Object

Parent path of this path.

type

String

DIR for directory, FILE for


file.

device

Device Object

The device to which this path


belongs.

msu

msu Object

The msu to which this path


belongs.

size

Integer

Size of path in bytes. For


directories it is the size of all
files under the entire subtree.

last_accessed

Integer

Timestamp of when this path


was last accessed. Timestamp
is measured as the number
of seconds that have elapsed
since midnight UTC, January
1st, 1970.

path Columns

Symantec Proprietary and Confidential

17

18

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

last_modified

Integer

Timestamp of when this path


was last modified.

created_on

Integer

Timestamp of when this path


was created.

last_accessor

User Object

User who last accessed this


path.

last_modifier

User Object

User who last modified this


path.

creator

User Object

User who created this path.

creator_group

Group Object

Group creator of this path.

owner

Owner Object

Computed Owner of this


path.

isdeleted

Integer

1 if the path is deleted, 0


otherwise.

depth

Integer

Depth of the path from the


root of the share. For
example, / has a depth of 0,
/a has a depth of 1, /a/b
has a depth of 2.

activity_count

Integer

Total activity count on this


path.

isopen

Integer

1 if the path is open, 0


otherwise.

[open_reasons]

[String]

Reasons why the path is


considered open.

[permissions]

[Permission Object]

List of permissions on this


path.

[custodians]

[Custodian Object]

List of custodians for this


path.

issensitive

Integer

1 if the path is sensitive, 0


otherwise.

[filegroups]

[String]

List of filegroups for this


path.

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

extension

String

File extension for this path.


For example PST, DOC etc.

[dfsnames]

[String]

List of DFS names for this


path.

[permitted_users]

[User Object]

List of users who have


permissions to access this
path.

permitted_users_count

Integer

Number of users who have


permissions to access this
path.

[active_users]

[User Object]

List of users who are active


on this path.

active_users_count

Integer

Number of users who are


active on this path.

[inactive_users]

[User Object]

List of users who are inactive


on this path.

inactive_users_count

Integer

Number of users who are


inactive on this path.

[dlp_policies]

[String]

List of DLP policies violated


by this path.

iscontrol_point

Integer

1 if the path is a control


point, 0 otherwise.

[control_point_reasons]

[String]

Reasons why the path is


considered a control point.

filesystem_owner

User Object

Owner specified by the NTFS


file system.

Column

Type

Description

name

String

Name of DFS path relative to


the msu.

dfspath Columns

Symantec Proprietary and Confidential

19

20

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

absname

String

Absolute name of the DFS


path containing device and
share names for example,
\\dfsfiler1\dfsshare100\a\b.

id

Integer

Unique identifier for this


path within the msu.

parent

DFS Path Object

Parent DFS path of this DFS


path.

physicalname

String

Absolute name of the


physical path that this DFS
path maps to.

type

String

DIR for directory, FILE for


file.

device

Device Object

The device to which this DFS


path belongs.

msu

msu Object

The msu to which this DFS


path belongs.

size

Integer

Size of path in bytes. For


directories it is the size of all
files under the entire subtree.

last_accessed

Integer

Timestamp of when this path


was last accessed. Timestamp
is measured as the number
of seconds that have elapsed
since midnight UTC, January
1st, 1970.

last_modified

Integer

Timestamp of when this path


was last modified.

created_on

Integer

Timestamp of when this path


was created.

last_accessor

User Object

User who last accessed this


path.

last_modifier

User Object

User who last modified this


path.

creator

User Object

User who created this path.

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

creator_group

Group Object

Group creator of this path.

owner

Owner Object

Computed Owner of this


path.

isdeleted

Integer

1 if the path is deleted, 0


otherwise

depth

Integer

Depth of the path from the


root of the share. For
example, / has a depth of 0,
/a has a depth of 1, /a/b
has a depth of 2.

activity_count

Integer

Total activity count on this


path.

isopen

Integer

1 if the path is open, 0


otherwise.

[open_reasons]

[String]

Reasons why the path is


considered open.

[permissions]

[Permission Object]

List of permissions on this


path.

[custodians]

[Custodian Object]

List of custodians for this


path.

issensitive

Integer

1 if the path is sensitive, 0


otherwise.

[filegroups]

[String]

List of filegroups for this


path.

extension

String

File extension for this path.


For example PST, DOC etc.

[permitted_users]

[User Object]

List of users who have


permissions to access this
path.

permitted_users_count

Integer

Number of users who have


permissions to access this
path.

[active_users]

[User Object]

List of users who are active


on this path.

Symantec Proprietary and Confidential

21

22

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

active_users_count

Integer

Number of users who are


active on this path.

[inactive_users]

[User Object]

List of users who are inactive


on this path.

inactive_users_count

Integer

Number of users who are


inactive on this path.

[dlp_policies]

[String]

List of DLP policies violated


by this path.

iscontrol_point

Integer

1 if the path is a control


point, 0 otherwise.

[control_point_reasons]

[String]

Reasons why the path is


considered as a control point.

filesystem_owner

User Object

Owner specified by the NTFS


filesystem.

Column

Type

Description

path

Path Object

Path for which the owner is


computed.

dfspath

DFS Path Object

DFS path for which the


owner is computed.

user

User Object

The computed owner of the


path.

read_count

Integer

Number of read accesses


made by this user.

write_count

Integer

Number of write accesses


made by this user.

owner Columns

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

method

String

The method that was used to


compute this owner. Possible
values are creator,
read_count, write_count,
rw_count, last_accessor,
last_modifier, and
parent_owner.

Column

Type

Description

timestamp

Integer

Timestamp of the activity.


Timestamp is measured as
the number of seconds that
have elapsed since midnight
UTC, January 1st, 1970.

timerange

Integer

Number of seconds since


timestamp that this activity
event might have happened.

user

User Object

User who initiated this


activity event.

path

Path Object

Path on which this activity


event occurred.

dfspath

DFS Path Object

DFS path on which this


activity event occurred.

opcode

Integer

Integer representing the


activity event.

operation

String

String notation of the


activity event (e.g., read,
write, create, delete, mkdir,
rmdir etc.).

count

Integer

Number of times this


operation was performed in
the timerange

ipaddr

String

IP address from where the


operation was performed.

activity Columns

Symantec Proprietary and Confidential

23

24

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

rename_target

Path Object

For rename or move


operations the target path to
which this path was
renamed.

dfs_rename_target

DFS Path Object

For rename or move


operations the target DFS
path to which this DFS path
was renamed.

Column

Type

Description

object_type

String

Type of object on which this


permission is set (msu, DIR).

path

Path Object

Path on which the


permission is set.

dfspath

DFS Path Object

DFS path on which the


permission is set.

msu

msu Object

The msu on which the


permission is set

trustee_type

String

Type of trustee (user, group).

user_trustee

User Object

Trustee of this permission.

group_trustee

Group Object

Trustee of this permission.

permission_mask

Integer

Permission bitmask.

readable_permission

String

List of readable permissions


read, write, full control etc.

type

String

Type of permission (GRANT,


DENY).

isinherited

Integer

1 if the permission is
inherited from parent.

inheriting_type

String

Type of object from which


this permission is inherited
(msu, DIR).

permission Columns

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


About DQL Columns

Column

Type

Description

inheriting_path

Path Object

Path from which this


permission is inherited.

inheriting_dfspath

DFS Path Object

DFS path from which this


permission is inherited.

inheriting_msu

msu Object

msu from which this


permission is inherited.

appliesto

String

Inheritance settings for this


permission (e.g. this folder,
all subfolders, only immediate
files).

Column

Type

Description

path

Path Object

Path on which the custodian


is assigned.

dfspath

DFS Path Object

DFS path on which the


custodian is assigned.

msu

msu Object

msu on which the custodian


is assigned.

device

Device Object

Device on which the


custodian is assigned.

dfslink

String

DFS link on which the


custodian is assigned.

user

User Object

The assigned custodian of


the path.

isinherited

Integer

1 if the custodian is inherited


from a parent (device, msu,
dir, dfslink).

inheriting_type

String

Type of object from which


the custodian is inherited
(device, msu, dir, dfslink).

inheriting_path

Path Object

Path from which the


custodian is inherited.

custodian Columns

Symantec Proprietary and Confidential

25

26

DataInsight Query Language (DQL)


DQL Query Syntax

Column

Type

Description

inheriting_dfspath

DFS Path Object

DFS path from which the


custodian is inherited.

inheriting_msu

msu Object

msu from which the


custodian is inherited.

inheriting_device

Device Object

Device from which the


custodian is inherited.

inheriting_dfslink

String

DFS link from which the


custodian is inherited.

DQL Query Syntax


The DQL query syntax and top-level grammatical constructs are as shown:

FROM
GET
[IF
[USING
[FORMAT
[GROUPBY
[HAVING
[SORTBY
[LIMIT

<table>
<column expression> [AS alias], <column expression> [AS alias], ...
<condition>]
<definition>]
<column> AS (CSV|TABLE <tablename>) [<count>]]
<column expression>, <column expression>, ...]
<aggregate-condition>]
<column expression> [ASC|DESC]]
[<offset>,]<count>];

FROM clause
The FROM specifies the table from which DQL retrieves the data. DQL does not
support joins as in SQL. You can only specify one table in the FROM clause.

GET clause
The GET clause specifies the columns (or expressions on columns) that you want
to retrieve from the table that you specify in the FROM clause.
DQL tables can have columns that refer to other tables. For example, the table
groups has a column memberusers which refers to rows from table user. When
you retrieve such reference columns, you need to specify what columns you want
to retrieve from the referred table. For example, you cannot retrieve memberusers
from groups without specifying which columns of the user table you are interested

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


DQL Query Syntax

in. So, you can select memberusers.name or memberusers.sid but not just
memberusers.
The column names in the output table are decided by the expressions used in the
GET clause. While displaying the output, DQL may optionally replace the period
( . ) with the underscore ( _ ). For example, for GET path.name, the output column
name in the SQLite database becomes path_name.

FORMAT clause
Data Insight tables can contain multi-valued columns. For example, path contains
a multivalued column permissions. When you specify the columns in the GET
clause, you also need to specify the manner in which you want their values to
appear in the output database table. Use the FORMAT clause to control the format
of the output in case of multi-valued columns. You can use two formatting options
as shown below:
FORMAT <column> AS CSV

The above syntax displays the output values for a multi-valued column as a
comma-separated list in a single column.
FORMAT <column> AS TABLE <tablename>

The above syntax displays the output values for a multi-valued column in a
separate table. Each row of this table contains a reference to its corresponding
row in the parent table.
The default value for the FORMAT clause is a TABLE. If you do not provide a
FORMAT clause in your query, DQL displays the contents of the multi-valued
columns in separate tables. And the name of the multi-valued column is displayed
as the default name of the table. For example, if you want to retrieve path
permissions and you do not specify the FORMAT clause, DQL displays the output
the permissions of a path in a separate table called permissions.
Consider this example:

FROM
GET
FORMAT

groups
name, memberusers.sid, memberusers.name
memberusers AS CSV

Since memberusers is a multi-valued column, the FORMAT clause on memberusers


needs to be specified
The above query creates an output table groups containing four columns
groups_rowid, name, memberusers_sid, memberusers_name. The column
groups_rowid is a default column present in all DQL output tables, containing an
Symantec Proprietary and Confidential

27

28

DataInsight Query Language (DQL)


DQL Query Syntax

identification number for each rows. The columns memberusers_sid and


memberusers_name contains a comma-separated list of member user sids and
names.
Example output table is as shown below:
groups
groups_rowid

name

memberusers_sid

memberusers_name

Domain Users

S-1,S-2,S-10,S-11

John,Jim,Paul,Steve

HR_Global

S-10,S-12

Paul,Jane

HR_US

S-10

Paul

Suppose that you change the query to:

FROM
GET
FORMAT

groups
name, memberusers.sid, memberusers.name
memberusers AS TABLE memberusers

In this case, the output database contains two tables groups and memberusers.
The groups table has two columns groups_rowid and name. The memberusers
table has three columns groups_rowid, memberusers_sid, memberusers_name.
The groups_rowid column in the memberusers table is a reference to the
groups_rowid column from the groups table.
Example output tables are as shown below:
groups
groups_rowid

name

Domain Users

HR_Global

HR_US

memberusers
groups_rowid

memberusers_sid

memberusers_name

S-1

John

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


DQL Query Syntax

S-2

Jim

S-10

Paul

S-11

Steve

S-10

Paul

S-12

Jane

S-10

Paul

By default, DQL lists all memberusers of a group. Optionally, you can limit the
number of memberusers listed using the FORMAT clause. This is as shown in the
following query:

FROM
GET
FORMAT

group
name, memberusers.sid, memberusers.name
memberusers AS CSV 4

This limits the output table to a maximum of four member user values for each
group. These four values are the first four members of the list.

Nested multi-valued columns


There may be situations where you need to specify nested multi-valued columns.
For example, the path table has a multi-valued column active_users, which is a
reference to user table. The table user, in turn, has a multi-valued column
memberof which indicates the groups that a user belongs to. If you want to get all
active users for a path and the groups that each active user belongs to, write your
query as shown below.

FROM
GET
FORMAT

path
name, active_users.name, active_users.memberof.name
active_users AS CSV AND
active_users.memberof AS CSV;

In this querys output table, the third column active_users_memberof_name lists


all the groups of all the paths active users. For example, suppose that path /foo
has active users Joe and Jane. Suppose that Joe belongs to groups HR and
ALL-Employees, while Jane belongs to groups Finance and ALL-Employees. The
output column for this query will then be HR, ALL-Employees, Finance,
ALL-Employees.

Symantec Proprietary and Confidential

29

30

DataInsight Query Language (DQL)


DQL Query Syntax

Notice that you have a flat list of all group names in this column. You have lost
information about what groups each of the active users belongs to. You only know
that there is one active user who belongs to HR, two who belong to ALL-Employees
and one who belongs to Finance.

IF clause
The IF clause is an optional clause that you can use to specify a set of conditions
on the rows that you want to retrieve. It is similar to the WHERE clause of SQL.
DQL retrieves only those rows whose columns satisfy the condition(s) provided
under the IF clause.

Operators
DQL supports the following binary operators that you can use to specify a
condition:

Comparison operators: >, <, >=, <=, =, ==, !=, <>

Logical operators: AND, &&, OR, ||

Arithmetic operators: +, -, *, /, %

List containment operators: IN, NOT IN

Constants
DQLs IF clause supports specification of constants in operations. Constants can
be either numeric or string. Some example of supported column-related operations
are as shown below:

IF size/1024 > 10

IF size = 10

IF name IN (John, Joe)

Note that string comparisons are case insensitive by default. To specify case
sensitive or case insensitive comparisons, you can use the CASE SENSITIVE and
CASE INSENSITIVE keywords.

IF name IN (John, Joe) CASE SENSITIVE

IF name = John CASE INSENSITIVE

Conditions on multi-valued columns


You can use EACH or ANY prefixes to specify the conditions on multi-valued
columns. EACH specifies that each value of the multi-valued column should satisfy
Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


DQL Query Syntax

the condition while ANY specifies that any value of the multi-valued column
should satisfy the condition.
Suppose that you want to retrieve only those paths on which the user John is
active. You can write a query as shown below.

FROM
GET
IF
FORMAT

path
name, active_users.name
ANY active_users.name = "John"
active_users AS CSV;

Suppose that you want to retrieve paths on which either John or Joe are active.
You can write a query (query a) as shown below.

FROM
GET
IF
FORMAT

path
name, active_users.name
ANY active_users.name IN ("John","Joe")
active_users AS CSV;

The above query retrieves the paths on which either John is one of the active users
and/or Joe is one of the active users.
Suppose that you want to retrieve the paths that only have John and Joe as active
users. You can write a query (query b) as shown below.

FROM
GET
IF
FORMAT

path
name, active_users.name
EACH active_users.name IN ("John","Joe")
active_users AS CSV;

The above query retrieves paths where the only active users are John and/or Joe.
Note that in query (a), you get the paths on which John or Joe is one of the active
users whereas in query (b), you get the paths on which John and/or Joe are the
only active users.

Conditions on nested multi-valued columns


Since nested multi-valued columns evaluate to a flat list, you can specify conditions
on them using the ANY and EACH constructs as above.
For example, suppose that you want to retrieve those paths containing at least
one active user belonging to group HR. You can write a query (query b) as shown
below.

Symantec Proprietary and Confidential

31

32

DataInsight Query Language (DQL)


DQL Query Syntax

FROM
GET
IF
FORMAT

path
name, active_users.memberof.name
ANY active_users.memberof.name = "HR"
active_users.memberof AS CSV;

Suppose that you want to retrieve those paths containing active users who belong
only to groups HR and/or FINANCE. You can write a query (query b) as shown
bellow.

FROM
GET
IF
FORMAT

path
name, active_users.memberof.name
EACH active_users.memberof.name IN ("HR", "FINANCE")
active_users.memberof AS CSV;

Note that DQL by default uses the ANY construct if you do not specify an
ANY/EACH construct.

USING clause
Values of certain columns like owner are computed at run-time based on some
criteria. For example, to compute an owner of a file, you need to specify what
methods (like read_count, rw_count, parent_owner etc.) you want to use to
determine the owner. When you determine active users of a path, you need to
specify the time range you want to consider for the activity.
You can use the USING clause to specify such functions that can be applied to
obtain a column value.
The details of the DQL USING functions are as shown below.

Calculating the owner


calc_owner(start_time TEXT, end_time TEXT, date_format TEXT,
ordered_list_of_owner_methods TEXT)

Example usage in query:

FROM
GET
USING

path
name, owner.user.name, owner.method, owner.read_count
owner AS calc_owner("2012-01-01", "2012-06-01", "YYYY-MM-DD",
"rw_count, read_count, last_accessor");

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


DQL Query Syntax

33

If you dont specify a USING function for owner, DQL uses a default time range
of last 6 months and uses a data owner ordering of rw_count, write_count,
read_count, last_modifier, last_accessor, creator, parent_owner.

Calculating the active_users


calc_active_users(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM
GET
USING
FORMAT

path
name, active_users.name
active_users AS
calc_active_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")
active_users AS CSV;

If you dont specify a USING function for active_users, DQL uses a default time
range of last 6 months.

Calculating the active_users_count


get_active_users_count(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM
GET
USING

path
name, active_users_count
active_users_count AS
get_active_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");

If you dont specify a USING function for active_users_count, DQL uses a default
time range of last 6 months.

Calculating the inactive_users


calc_inactive_users(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM

path
Symantec Proprietary and Confidential

34

DataInsight Query Language (DQL)


DQL Query Syntax

GET
USING
FORMAT

name, inactive_users.name
inactive_users AS
calc_inactive_users("2012-01-01", "2012-06-01", "YYYY-MM-DD")
inactive_users AS CSV;

If you dont specify a USING function for inactive_users, DQL uses a default time
range of last 6 months for calculating inactivity.

Calculating the inactive_users_count


get_inactive_users_count(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM
GET
USING

path
name, inactive_users_count
inactive_users_count AS
get_inactive_users_count("2012-01-01", "2012-06-01", "YYYY-MM-DD");

If you dont specify a USING function for inactive_users_count, DQL uses a default
time range of last 6 months for calculating inactivity.

Calculating the activity_count


get_activity_count(start_time TEXT, end_time TEXT, date_format TEXT)

Example usage in query:

FROM
GET
USING

path
name, activity_count
activity_count AS
get_activity_count("2012-01-01 10:00", "2012-01-01 15:00",
"YYYY-MM-DD HH:mm");

If you dont specify a USING function for activity_count, DQL uses a default time
range of last 6 months for calculating activity.

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


DQL Query Syntax

HAVING clause
The HAVING clause is similar to the SQL HAVING clause and allows specification
of conditions on aggregate functions. The syntax of conditions that can be specified
in the HAVING clause is the same as that of the DQL IF clause.
Suppose that you want to retrieve the sum of the sizes of all shares for each filer.
You can write a query for this as shown bellow:

FROM
GET
GROUPBY

msu
filer.name, sum(size)
filer.name;

Now suppose that you want to select only those filers whose sum of share sizes
is greater than 1 GB (1,073,741,824 bytes). Then you need to modify the previous
query as:

FROM
GET
GROUPBY
HAVING

msu
filer.name, sum(size)
filer.name
sum(size) > 1073741824;

GROUPBY clause
The GROUPBY clause is similar to the SQL GROUP BY clause. It enables you to
aggregate the output rows into groups. Suppose that you want to retrieve the sum
of the sizes of all shares for each filer. You can write a query for this as shown
below.

FROM
GET
GROUPBY

msu
filer.name, sum(size)
filer.name;

DQL supports the following aggregation functions:

sum

count

max

min

Symantec Proprietary and Confidential

35

36

DataInsight Query Language (DQL)


DQL functions

SORTBY clause
The SORTBY clause is similar to the SQL ORDER BY clause. It enables you to sort
of the rows of the output table based on their column values.

FROM
GET
SORTBY

msu
name, size
size DESC;

If no sort order is specified, DQL defaults to ASC.

LIMIT clause
The LIMIT clause is similar to the SQL LIMIT clause and is used to limit the number
of output rows.

LIMIT
LIMIT

count
offset, count

[This will retrieve the first "count" rows]


[This will retrieve "count" rows starting from
"offset"]

offset values start from 1.

DQL functions
DQL supports the following built-in functions:
upper(X)

Converts string X to uppercase.

lower(X)

Converts string X to lowercase.

strlen(X)

Returns length of string X.

length(X)

Returns number of items in list X.

substr(X, Y)

Returns true if Y is a substring of X. The comparison is


case-sensitive.

substri(X, Y)

Returns true if Y is a substring of X. The comparison is


case insensitive.

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


DQL functions

match(X, P)

Returns true if X matches the regular expression pattern


P. Regular expression matching is case-sensitive.
Pattern P can be specified as Patterns matching a single
character or Patterns matching multiple characters.
You can refer to the following URLs for information on
pattern matching:
http://pubs.opengroup.org/onlinepubs/9699919799/
utilities/V3_chap02.html#tag_18_13_01
http://pubs.opengroup.org/onlinepubs/
9699919799/utilities/V3_chap02.html#tag_18_13_02

matchi(X, P)

Returns true if X matches the regular expression pattern


P. Regular expression matching is case insensitive.
Pattern P can be specified as Patterns matching a single
character or Patterns matching multiple characters.
You can refer to the following URLs for information on
pattern matching:
http://pubs.opengroup.org/onlinepubs/9699919799/
utilities/V3_chap02.html#tag_18_13_01
http://pubs.opengroup.org/onlinepubs/
9699919799/utilities/V3_chap02.html#tag_18_13_02

datetime(D, F)

Returns time in epoch for the string date D. The format


in which date D is specified is indicated by the format
string F. The options for F are:
YYYY 4 digit year
MM - month of year (01 12)
DD - date of month
HH - hour (00 24)
mm - minutes (00 59)
ss seconds (00 59)
Z timezone
Example: datetime(2012-01-10 -0800, YYYY-MM-DD
Z)

formatdate(T, F)

Converts time T in epoch to a string whose format is


specified with string F. The options for F are the same
as those used by datetime(D, F).

Symantec Proprietary and Confidential

37

38

DataInsight Query Language (DQL)


Example DQL queries

Example DQL queries

Get the name, size, active data size, percentage of data size that is active,
openness, and number of active users for each share

FROM
GET

msu
name, size, active_data_size,
(active_data_size*100/size) AS active_data_percent,
isopen, active_user_count;

Get the activity for all paths of share, share1, on March 4, 2012 between 9:00
A.M. and 5:00 P.M..

FROM
GET
IF

activity
path.name, user.name, operation,
formatdate(timestamp, "YYYY/MM/DD HH:mm")
path.msu.name = "share1" AND
timestamp >= datetime("2012/03/04 09:00", "YYYY/MM/DD HH:mm")
AND timestamp <= datetime("2012/03/04 17:00",
"YYYY/MM/DD HH:mm");

Since the timestamp column of activity is epoch, convert it to a readable format


using formatdate().

Get a list of all sensitive files from all shares of filer, filer1, sorted by size.

FROM
GET
IF
SORTBY

Get a list of all open paths and the reason why they are marked as open.

FROM
GET
IF
FORMAT

path
name, issensitive, size
issensitive = 1 AND type = "FILE" AND device.name = "filer1"
size DESC;

path
name, msu.name, isopen, open_reasons
isopen = 1
open_reasons AS CSV;

Get a list of all open paths and the reason why they are marked as open. Also,
list the permissions on each open path.

Symantec Proprietary and Confidential

DataInsight Query Language (DQL)


Example DQL queries

FROM
GET

IF
FORMAT

FROM

IF
USING

path
name, msu.name, owner.user.name, owner.method,
owner.read_count, owner.write_count
type = "DIR"
owner AS calc_owner("2012-01-01", "2012-06-01",
"YYYY-MM-DD","rw_count, last_modifier");

Get a list of all open paths and their inactive users.

FROM
GET
IF
USING

user
name, sid, login, domain, "E-mail", department,
memberof.sid, memberof.name
memberof AS table memberof_groups;

Get a list of all directories and their owners.

FROM
GET

path
name, msu.name, isopen, open_reasons,
permissions.user_trustee.name, permissions.group_trustee.name,
permissions.readable_permission, permissions.isinherited,
permissions.inheriting_path.name
isopen = 1
permissions AS TABLE permissions
AND open_reasons AS CSV;

Get a list of all users, their e-mail and department (custom attributes) and the
groups that they belong to.

FROM
GET

39

path
name, msu.name, isopen, inactive_users.name
isopen = 1
inactive_users AS calc_inactive_users("2012-01-01",
"2012-06-01","YYYY-MM-DD"

For each share, get the count of paths that have permissions set on Everyone

FROM
GET
IF

permissions
msu.name, count(path.id) AS risk_path_count
object_type = "DIR" AND group_trustee.name = "Everyone"
Symantec Proprietary and Confidential

40

DataInsight Query Language (DQL)


Example DQL queries

GROUPBY
SORTBY

AND isinherited = 0
msu.name
risk_path_count DESC;

The condition isinherited = 0 ensures that we only get the paths that have
permissions explicitly defined on Everyone and not populate all paths that
simply inherit those permissions.

Symantec Proprietary and Confidential

Chapter

Web API Specification


This chapter includes the following topics:

Web API specification for generic Collector service

Web API specification for generic Collector service


The web API for the Data Insight generic collector allows web clients to push
events for the generic device filers configured in the Data Insight deployment. It
also provides a method to add shares for the configured filers.
The web client communicates with the Data Insight Collector node using HTTPS
requests. The HTTPS communication is based on one-way SSL authentication.
The HTTP server runs with its unique self-signed SSL certificate. The SSL
certificate is created on the server when DataInsightGenericCollector service is
configured on it. The authentication is complete when the Data Insight Collector
node verifies the identity of the web client.
Data Insight Collector node uses the following mechanism to communicate with
the web client:
1.

The Data Insight server identifies the client using a login API request.

2.

On successful log in, the Data Insight server returns an authentication token
as the response. The same token is inserted into an HTTP cookie called
MATRIX_AUTH which is valid for 30 minutes. If the log in attempt is
unsuccessful, an HTTP response code 401 is returned.

3.

You must include the authentication token in each subsequent request to the
Data Insight server either in an HTTP request header called MATRIX_AUTH,
or in a cookie with the same name, or as an HTTP request input parameter
with the same name.

4.

Each token has an inactivity timeout interval of 30 minutes. The token expires
if the client does not send a request for 30 minutes. In case the Data Insight
Symantec Proprietary and Confidential

42

Web API Specification


Web API specification for generic Collector service

server restarts, the client must obtain the authentication token by using the
login API. Data Insight uses the standard HTTP status code 401 to convey
that login is required. Data Insight returns the HTTP status code 401
(Unauthorized), if the client does not have the correct privileges.
5.

The user principal against which log in is performed can be any valid Data
Insight user with the Server Administrator role.

All URLs referenced in the documentation have the following base:


https://<hostname> :<port> /api
where <port> is the port number for DataInsightGenericCollector service. The
default value for port is 8585, and the port number is configurable through Data
Insight Management Console.
Use the following request calls to push events to the Data Insight Collector node
and to add the shares that you want Data Insight to monitor:
1.

Login
POST /api?function=LOGIN

Request parameters
Name

Description

username

Data Insight user name

domain

The domain to which the


user belongs

password

The user's password

format

Format of the response


output

Comment

Optional format=json

Request body
Do not supply a request body for this method.
Response
Login Success
If format=json is specified, then the authentication token is written on HTTP
response output in JSON format.
HTTP/1.1 200 OK
Content-Type: application/json

Symantec Proprietary and Confidential

Web API Specification


Web API specification for generic Collector service

43

Status: 200 OK
{"auth_token":"A2360DD2D9BB7284EF8BEB40E8DBA63F"}

If no format is specified, the authentication token is written on HTTP response


output.
If login fails, HTTP status code 401 (Unauthorized) is returned.
2.

Upload Events
POST /api?function=COLLECTOR&cmd=upload_events_sqlite&event_type=<type>

Request parameters
Name

Description

Comment

MATRIX_AUTH

Authentication token

event_type

Optional (cifs|nfs)
The type of events in the
file that is uploaded on the
Collector.

Request body
The request can be an HTTP multi-part request or the request body can have
the contents of the file.
Response
If the file upload is successful, returns a response with following structure:
HTTP/1.1 200 OK
Content-Type: application/json
Status: 200 OK
{"status_code":<code>,"status_msg":"<msg>"}

Status code 0 indicates success.


On failure, returns status code 500 (Internal Server error) in case of an
unexpected error.
Details of the file to be uploaded
The events file must be a SQLite DB file that has a single table, named events.
Table schema

Symantec Proprietary and Confidential

44

Web API Specification


Web API specification for generic Collector service

Column name

Type

Constraints

Description

filer

TEXT

NOT NULL

Filer's address as
added to the Data
Insight
configuration.

opcode

INTEGER

NOT NULL

An integer
describing the event
operation (For
example,READ=3,
WRITE=4) Please
refer to the
Protobuf format for
a complete set of
values.

username

TEXT

Username of the
user for CIFS
(Optional). UID of
the user in case of
an NFS event.

domainname

TEXT

Domain of the user


for CIFS (Optional).
Blank for NFS.

sid

TEXT

SID of the user for


CIFS. Blank in case
of NFS.

pathname

TEXT

renamepath

TEXT

Applicable in case of
rename event.

type

TEXT

Type of path.
(FOLDER=1,
FILE=2)

ipaddr

TEXT

IP address from
where the path was
accessed (optional).

NOT NULL

Symantec Proprietary and Confidential

Path where the


event occurred.
Refer to note below
for format.

Web API Specification


Web API specification for generic Collector service

Column name

Type

Constraints

Description

timestamp

INTEGER

NOT NULL

Timestamp of event
in seconds as UNIX
epoch.

45

CREATE TABLE events (filer TEXT NOT NULL, opcode INTEGER NOT NULL, usern
domainname TEXT, sid TEXT,

pathname TEXT NOT NULL, renamepath TEXT, type TEXT, ipaddr TEXT
timestamp INTEGER NOT NULL);

Note: For CIFS events, the SID value is mandatory; user name and domain
name are optional.
For NFS events, SID should be blank, user name should be the UID, and domain
name should be blank.
For CIFS events, pathname should be the UNC path.
For NFS events, the pathname should be the absolute path of the file or the
folder.
3.

Push events in JSON or Google Protocol Buffers format


POST /api?function=COLLECTOR&cmd=push_events&input_format=<format>

Request parameters
Name

Description

MATRIX_AUTH

Authentication token

input_format

The format in which the


events are pushed to the
Collector.

Comment

json|proto

Request body
The request body must contain the events list in the specified format, Google
Protocol Buffers or JSON.
Response
Returns a response with following structure:
HTTP/1.1 200 OK

Symantec Proprietary and Confidential

46

Web API Specification


Web API specification for generic Collector service

Content-Type: application/json
Status: 200 OK
{"status_code":<code>,"status_msg":"<msg>"}

Status code 0 indicates success.


On failure, returns status code 400 (BAD_REQUEST) for incorrect input format
parameter.
Returns the status code 500 (Internal Server Error) in case of an unexpected
error.
Google Protocol Buffers format for pushing events to the Collector
message AuditEventsListMessage {
optional int64 device_id = 1;
optional string device_name = 2;
repeated CifsEventMessage cifs_events = 3;
repeated NfsEventMessage nfs_events = 4;
}
message CifsEventMessage {
required
required
optional
required
optional
optional
optional
required
optional
}

AccessType opcode = 1;
string unc_path = 2;
string rename_path = 3;
PathType path_type = 4;
string sid = 5;
string username = 6;
string domain = 7;
uint64 timestamp_msec = 8;
string ip_address = 9;

message NfsEventMessage {
required
required
optional
required
required
optional
optional
required
optional
}

AccessType opcode = 1;
string path = 2;
string rename_path = 3;
PathType path_type = 4;
int64 uid = 5;
int64 gid = 6;
string domain = 7;
uint64 timestamp_msec = 8;
string ip_address = 9;

Symantec Proprietary and Confidential

Web API Specification


Web API specification for generic Collector service

enum PathType {
UNKNOWN_PATHTYPE = -1;
FOLDER = 1;
FILE = 2;
}
enum AccessType {
CREATE
DELETE
READ
WRITE
RENAME
MKDIR
RMDIR
RENAMEDIR
SECURITY
SYMLINK
LINK
READLINK
OPEN

=
=
=
=
=
=
=
=
=
=
=
=
=

1;
2;
3;
4;
5;
8;
9;
10;
18;
19;
20;
21;
200000;

Note: device_name is the name of the filer as added in Data Insight


configuration.
CIFS and NFS events for a filer can be pushed by a single
AuditEventsListMessage.
For CIFS event, SID is mandatory; user name, and domain name are optional.
For NFS event, SID should be blank, UID is mandatory, and domain should
be blank.
LINK, SYMLINK, and READLINK are specific to NFS events only.
The AccessType parameter for events like permission change or ACL change
is SECURITY.
JSON format for pushing events to the Collector
{
"deviceId": <Number>,
"deviceName": <String>,
"cifsEvents": [
<CIFS Event>
],
Symantec Proprietary and Confidential

47

48

Web API Specification


Web API specification for generic Collector service

"nfsEvents": [
<NFS Event>
]
}
<CIFS Event>
{
"opcode": <String>,
"uncPath": <String>,
"renamePath": <String>,
"pathType": <String>,
"sid": <String>,
username: <String>,
domain: <String>,
"timestampMsec": <Number>,
"ipAddress": <String>
}
<NFS Event> {
"opcode": <String>,
"path": <String>,
"renamePath": <String>,
"pathType": <String>,
"uid": <Number>,
"gid": <Number>,
domain: <String>,
"timestampMsec": <Number>,
"ipAddress": <String>
}

Note: opcode and pathType fields can take only a specific set of values. Refer
to the protobuf enums for a description of values for each field; enum
AccessType for the field opcode and enum PathType for the field pathType.
Example

{
"deviceId": 0,
"deviceName": "10.209.89.3",
"cifsEvents": [
{
Symantec Proprietary and Confidential

Web API Specification


Web API specification for generic Collector service

49

"opcode": "RENAME",
"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data1",
"renamePath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\Data2 ",
"pathType": "FOLDER",
"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",
"timestampMsec": 1340003837,
"ipAddress": "172.31.163.29"
},
{
"opcode": "CREATE",
"uncPath": "\\\\NAMEMCCIFS1\\TestShare\\DI3.0RU1\\New Folder",
"pathType": "FOLDER",
"sid": "S-1-5-21-617441397-4198358099-2716562547-104771",
"timestampMsec": 1340003847,
"ipAddress": "172.31.163.29"
}
],
"nfsEvents": [
{
"opcode": "MKDIR",
"path": "\/openldaphome\/DIRU1",
"pathType": "FOLDER",
"uid": 0,
"gid": 0,
"domain": "0",
"timestampMsec": 1339680545
}
]
}

4.

Add shares
POST /api?function=COLLECTOR&cmd=add_shares&format=<format>

Request parameters
Name

Description

MATRIX_AUTH

Authentication token

format

Format of the response


output

Request body
Symantec Proprietary and Confidential

Comment

proto|json

50

Web API Specification


Web API specification for generic Collector service

Supply JSON or Google Protocol Buffers formatted list of shares as input.


Response
On success, HTTP status code 200 is returned.
On failure to add shares, HTTP status code 500 (Internal server error) is
returned.
ProtoBuf format for adding shares
message SharesListMessage
{
optional int64 device_id = 1;
optional string device_name = 2;
repeated ShareMessage shares = 3;
}
message ShareMessage
{
enum ShareType
{
CIFS = 0;
NFS = 1;
}
optional string shareName = 1;
optional string sharePath = 2;
optional ShareType shareType = 3 [default = CIFS];
}

JSON format for adding shares


Shares list
{
"deviceId": {number},
"deviceName": {string},
"shares": [
]
}

JSON format for adding shares


{
"shareName": {string},
"sharePath": {string},

Symantec Proprietary and Confidential

Web API Specification


Web API specification for generic Collector service

"shareType": {string}
}

Note: The shareType parameter accepts only specific set of values. For the
possible set of values, refer enum ShareType in the Protobuf definition.
Example
{
"deviceId": 0,
"deviceName": "10.209.111.193",
"shares": [
{
"shareName": "/openldaphome",
"sharePath": "/openldaphome",
"shareType": "NFS"
},
{
"shareName": "/nfstest",
"sharePath": "/nfstest",
"shareType": "NFS"
}
]
}

Note: Data Insight scans the shares that are added only when the user enables
scanning and provides the Scanner credentials for the filer.

Symantec Proprietary and Confidential

51

52

Web API Specification


Web API specification for generic Collector service

Symantec Proprietary and Confidential

Chapter

Creating custom scripts for


remediation actions
This chapter includes the following topics:

About custom scripts

About custom scripts


You can use custom scripts to extend Data Insight functionality. You can use the
custom scripts to perform the following actions:

To create a remediation ticket.

To apply remediation actions based on Data Insight recommendations.

To define actions to manage data.

Data is supplied to the scripts via command line arguments. Arguments vary
based on what the script is used for. The scripts can be created in the .exe, .bat,
.pl, or .vbs formats.
Data Insight handles custom scripts differently depending on the type of operation.
Following list shows how Data Insight handles various types of scripts:

Custom scripts to create a remediation request.


Data Insight invokes the script by passing in two arguments:
custom_script.pl file_name <path_to_file_with_recommendation>.

For example,
ticketing.pl file_name
C:\DataInsight\data\workflow\tmp\PR_ticketing_1.txt

The second argument is full path to a text file containing the permission
recommendations. Each line in the text file contains one action and the required
Symantec Proprietary and Confidential

54

Creating custom scripts for remediation actions


About custom scripts

variables to perform that action. Lines are separated by a new line character.
The script should read each line of the input file and open one or more
remediation tickets as needed. If script exits with a non-0 exit code, the action
is considered to have failed. Each line in the file is of the following format:
OP:<OPCODE> PARAM:VALUE; PARAM:VALUE; ...

For example,
OP:REMOVE_ACE USER:foouser@domain.com;
PATH:\\fileserver1\share1\path;

Refer to the next section for possible values for opcodes and their parameters.

Custom scripts to apply permission recommendations.


You can specify custom scripts to directly commit changes to Active Directory
and CIFS file systems. You need to specify one script to make changes to Active
Directory and one script to make changes to CIFS permissions. The
recommendation is passed to the custom script as command line arguments
with following format: script.pl OP:<OPCODE> PARAM:VALUE
PARAM:VALUE... Exact PARAM and VALUE depends on opcode being passed.
If the script exits with non-0 code, Data Insight considers the operation to
have failed. For this release, Data Insight recommendations only consist of
removing user or group ACE for paths, and removing user or members from
AD groups. More operations will be supported in future releases.
For example,
AD.pl OP:DEL_GROUP_MEMBER AD_USER:user@domain
TARGET_GROUP:group@domain;.

Data Insight will supply the following opcode and arguments for Active
Directory remediation:
OP:DEL_GROUP_MEMBER AD_GROUP:<group@domain>|AD_USER:<user@domain>
TARGET_GROUP:<target_group@domain>

Data Insight will supply the following opcode and arguments for CIFS
remediation:
OP:REMOVE_ACE GROUP:<group@domain>|USER:<user@domain>
PATH:<unc_path>

Custom scripts to define specific tasks to manage data.


Data Insight invokes the script directly passing the operation and variables
as a part of command line arguments. Path is the mandatory argument passed
to the script. Other parameters passed to the script depend on how the Custom
Action has been configured in the Management Console. Format of command
line arguments passed to the custom script is:
script.pl path:<path> prop:val prop:val ....
For example,
archive_files.pl path:\\filer\share\path.txt size:25KB
Symantec Proprietary and Confidential

Creating custom scripts for remediation actions


About custom scripts

Data Insight supports the following properties that can be passed to the custom
scripts:
Properties

Format

size

NNN KB|MB|GB. E.g. 34 KB

size_on_disk

NNN KB|MB|GB. E.g. 34 KB

created_by

user@domain. SID if user name cannot be


resolved

created_on

milliseconds since Jan 1st 1970

last_modified_by

user@domain. SID if user name cannot be


resolved

last_modified_on

milliseconds since Jan 1st 1970

last_accessed_by

user@domain. SID if user name cannot be


resolved

last_accessed_on

milliseconds since Jan 1st 1970

data_owner

user@domain. SID if user name cannot be


resolved

custodian

user@domain. SID if user name cannot be


resolved. Multiple custodians are
comma-separated

For detailed information about how to use custom scripts for data and permission
remediation, see the Symantec Data Insight Administrator's Guide.

Symantec Proprietary and Confidential

55

56

Creating custom scripts for remediation actions


About custom scripts

Symantec Proprietary and Confidential

Chapter

Data Inventory Report


schema
This chapter includes the following topics:

Data Inventory report schema

Data Inventory report schema


The Data Inventory Report is used to extract information about paths from the
Data Insight index. Output of this report is a sqlite database, which can be used
for post processing as needed. When configuring report of this type, you can
choose to have the output database copied to some external location where you
plan to post process the output.

file_inventory table
In this table, there is one row for each matching file that is found in the specified
index dbs.

CREATE TABLE
xid
sid
user_id
owner_account
displayname
owner_method
bu_name
bu_owner
filer

file_inventory (
INTEGER,
TEXT,
INTEGER,
TEXT,
TEXT,
TEXT,
TEXT,
TEXT,
TEXT,

Symantec Proprietary and Confidential

58

Data Inventory Report schema


Data Inventory report schema

share
dfs_server
dfs_share
dfs_path
fid
path
msu_type
interval
sensitive
msu_id
read_count
write_count
file_size
atime
ctime
mtime
fs_sid

TEXT,
TEXT,
TEXT,
TEXT,
INTEGER,
TEXT,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER,
TEXT);

The xid column can be ignored, and should always be 1.

The sid is typically the Windows SID of the calculated owner of the file.

The owner_method column indicates the owner method that Data Insight used
to calculate the owner.

The user_id is the foreign key into the fileuser table of the current version of
the users.db stored in the DataInsight\Data\users folder. This is used for debug
purposes only.

The owner_account, displayname, bu_name and bu_owner columns are other


columns from the fileuser table.

The filer, share, path, dfs_server, dfs_share and dfs_path columns combine to
give the path to the file. The fid column is the foreign key into the fentry table
of the latest version of the index.db for this share. fid is used for debug purposes
only.

The msu_type is an integer value describing the type of share. There are four
possible values:

1 CIFS

2 SharePoint

3 NFS

8 DFS

Symantec Proprietary and Confidential

Data Inventory Report schema


Data Inventory report schema

The interval column is the foreign key into the intervals table below, based on
the last access time of the file.

The msu_id is the foreign key into the msu table of the latest version of the
config.db stored in the DataInsight\Data\conf folder.

Read count and write count are the aggregate number of audit events of each
time of events over the total time period specified for this run of the report.

File_size is the logical file size from the file system. Atime, ctime, and mtime
are the metadata for the file also pulled from the file system.

The fs_sid is the SID of the file system owner value from the file system
metadata.

lob table
This table consists of a list of distinct Lines of Businesses (LOBs). Other tables use
this table in a foreign key manner.

CREATE TABLE lob (


lob_id
INTEGER PRIMARY KEY,
lob_name
TEXT);

user_lob table
This table gives the mapping from users to the associated LOBs.

CREATE TABLE user_lob (


user_id
INTEGER PRIMARY KEY,
lob_id
INTEGER);

user_totals table
This table gives the total numbers of files, sensitive files etc. for each user. In the
final output, the msu_id column is displayed as empty. The user_id is the foreign
key into the fileuser table of the current version of the users.db stored in the
DataInsight\Data\users folder.

CREATE TABLE user_totals (


user_id
INTEGER,
msu_id
INTEGER,
total_files
INTEGER,
Symantec Proprietary and Confidential

59

60

Data Inventory Report schema


Data Inventory report schema

total_bytes
INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);

user_interval_totals table
This table breaks out the information from the user_totals table over each interval
specified from the input database. The interval_id is a foreign key to the intervals
table.

CREATE TABLE user_interval_totals (


user_id
INTEGER,
msu_id
INTEGER,
interval_id
INTEGER,
total_files
INTEGER,
total_bytes
INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);

lob_totals table
Based on the mapping specified in the User_lob table, this table gives the total
numbers for each LOBs. In the final output, the msu_id column will be empty.

CREATE TABLE lob_totals (


lob_id
INTEGER PRIMARY KEY,
msu_id
INTEGER,
total_files
INTEGER,
total_bytes
INTEGER,
sensitive_files INTEGER,
sensitive_bytes INTEGER);

lob_interval_totals table
This table breaks out the information from the lob_totals table over each interval
specified from the input database. The interval_id is a foreign key into the intervals
table.

CREATE TABLE lob_interval_totals (


lob_id
INTEGER,
msu_id
INTEGER,
Symantec Proprietary and Confidential

Data Inventory Report schema


Data Inventory report schema

interval_id
total_files
total_bytes
sensitive_files
sensitive_bytes

INTEGER,
INTEGER,
INTEGER,
INTEGER,
INTEGER);

intervals table
This table gives the beginning and end of each interval as specified in the input
database. The beginning and end times are specified as epoch numbers. For
example, the time 0 would be Midnight at Jan 1, 1970, and each higher number is
one second after that.

CREATE TABLE IF NOT EXISTS intervals(


interval
INTEGER, ///< 4 => most recent
///< 0 => before interval
start
INTEGER, ///< start month of interval
end
INTEGER); ///< end month of interval

msu_info table
This table copies the data from the Dashboard database to specify if the msu is
open. The msu_id column is a foreign key to the table of the latest version of the
config.db stored in the DataInsight\Data\conf folder.

CREATE TABLE msu_info (


msu_id
INTEGER PRIMARY KEY,
is_open
INTEGER);

dashboard_info table
This table is similar to the msu_info table in that it copies information from the
latest version of the Dashboard database into the report output database. There
may be a slight mismatch in the numbers here versus the totals from the
user_totals table. This difference happens due to the difference in the time at
which each set of numbers are calculated.

CREATE TABLE dashboard_info (


msu_id
INTEGER PRIMARY KEY,
dir_files
INTEGER,
dir_sens_files
INTEGER,
Symantec Proprietary and Confidential

61

62

Data Inventory Report schema


Data Inventory report schema

dash_files
dash_sens_files

INTEGER,
INTEGER);

Report configuration parameters


One important setting for the Data Inventory report is the separate_dbs
configuration setting. The separate_dbs setting forces the report to start a new
db file after the specified number of rows have been inserted into the detail table.
The separate_dbs setting indicates how many rows should be inserted into the
report output database details section before the db is closed, renamed and a new
db is started. If the output file name specified to the report process is
report_output.db, then the separate_dbs parameter will create files named
report_output.db.0, report_output.db.1, etc. every time the limit specified in the
setting is reached. The current db file being written to is always report_output.db,
and this file is where all of the summary data is written to. When merge_rpt runs,
it will no longer copy rows from the file_inventory table into the final output db.
It will only copy rows from the user_totals, etc. tables, and then create the
lob_totals, etc. tables. As in the log_level setting, report.separate_dbs is checked
first, and if not found, then the separate_dbs setting is checked.
You need to set this property for each Indexer node including the Management
Server node. For example, if the ID of your Indexer node is 3, issue the following
commands on your Management Server to set these properties for each node:
configdb -o -T node -k <nodeid> -J report.separate_dbs -j true
configdb -o -T node -k <nodeid> -J report.chunk_size -j 1000000

Symantec Proprietary and Confidential

Вам также может понравиться