Вы находитесь на странице: 1из 9

SSAS:

there are three types of data modelling modes and they are

1) tabular model

2) multidimensional model

3) power pivot for share point

cubes are nothing but combination of dimensions and measures or facts

Dimension tables will have surrogate keys wherein fact tables would have measures

for creating SSAS cube

We need to do the following things

open SSDT and create an analysis services project and provide the path where you
want to create your project.
give a proper name to project and the solution

then select SSAS multidimensional and datamining project

once the wizard opens, there you can see the below things

data sources - what is impersonation information - this means you want to tell SSAS
engine on how to connect to databases through your data sources.

data source view

cube

dimensions

mining structures

assemblies

miscllenius

SQL queries:

What is Collation?
-----------------------------------------------------------------------------------
-------------

Collations in SQL Server provide sorting rules, case, and accent sensitivity
properties for your data. Collations that are used with character data types such
as char and varchar dictate the code page and corresponding characters that can be
represented for that data type. Whether you are installing a new instance of SQL
Server, restoring a database backup, or connecting server to client databases, it
is important that you understand the locale requirements, sorting order, and case
and accent sensitivity of the data that you are working with.

What is Database?
-----------------------------------------------------------------------------------
-------------

A database in SQL Server is made up of a collection of tables that stores a


specific set of structured data. A table contains a collection of rows, also
referred to as records or tuples, and columns, also referred to as attributes. Each
column in the table is designed to store a certain type of information, for
example, dates, names, dollar amounts, and numbers.

How many types of indexes are there and explain them.


-----------------------------------------------------------------------------------
-------------

1. Hash:
With a hash index, data is accessed through an in-memory hash table. Hash
indexes consume a fixed amount of memory, which is a function of the bucket count.
2. Memory-optimized nonclustered:
For memory-optimized nonclustered indexes, memory consumption is a function
of the row count and the size of the index key columns
3. Clustered:
A clustered index sorts and stores the data rows of the table or view in
order based on the clustered index key. The clustered index is implemented as a B-
tree index structure that supports fast retrieval of the rows, based on their
clustered index key values.
4. Nonclustered:
A nonclustered index can be defined on a table or view with a clustered index
or on a heap. Each index row in the nonclustered index contains the nonclustered
key value and a row locator. This locator points to the data row in the
clustered index or heap having the key value. The rows in the index are stored in
the order of the index key values, but the data rows are not guaranteed to be in
any particular order unless a clustered index is created on the table.
5. Unique:
A unique index ensures that the index key contains no duplicate values and
therefore every row in the table or view is in some way unique.

Uniqueness can be a property of both clustered and nonclustered indexes.


6. Columnstore:
An in-memory columnstore index stores and manages data by using column-based
data storage and column-based query processing.

Columnstore indexes work well for data warehousing workloads that primarily
perform bulk loads and read-only queries. Use the columnstore index to achieve up
to 10x query performance gains over traditional row-oriented storage, and up
to 7x data compression over the uncompressed data size.

Why should I use a columnstore index?


A columnstore index can provide a very high level of data compression,
typically by 10 times, to significantly reduce your data warehouse storage cost.
For analytics, a columnstore index offers an order of magnitude better
performance than a btree index. Columnstore indexes are the preferred data
storage format for data warehousing and analytics workloads. Starting with
SQL Server 2016 (13.x), you can use columnstore indexes for real-time
analytics on your operational workload.

Reasons why columnstore indexes are so fast:


Columns store values from the same domain and commonly have
similar values, which result in high compression rates. I/O bottlenecks in your
system are minimized or eliminated, and memory footprint is reduced
significantly.
High compression rates improve query performance by using a
smaller in-memory footprint. In turn, query performance can improve because SQL
Server can perform more query and data operations in memory.
Batch execution improves query performance, typically by two to
four times, by processing multiple rows together.
Queries often select only a few columns from a table, which
reduces total I/O from the physical media.

When should I use a columnstore index?


Recommended use cases:
Use a clustered columnstore index to store fact tables and large
dimension tables for data warehousing workloads. This method improves query
performance and data compression by up to 10 times. For more information, see
Columnstore indexes for data warehousing.
Use a nonclustered columnstore index to perform analysis in real
time on an OLTP workload. For more information, see Get started with
columnstore for real-time operational analytics.
How do I choose between a rowstore index and a columnstore index?
Rowstore indexes perform best on queries that seek into the data, when
searching for a particular value, or for queries on a small range of values. Use
rowstore indexes with transactional workloads because they tend to require
mostly table seeks instead of table scans.
Columnstore indexes give high performance gains for analytic queries
that scan large amounts of data, especially on large tables. Use columnstore
indexes on data warehousing and analytics workloads, especially on fact
tables, because they tend to require full table scans rather than table seeks.
7. Index with included columns:
A nonclustered index that is extended to include nonkey columns in addition
to the key columns.
8. Index on computed columns:
An index on a column that is derived from the value of one or more other
columns, or certain deterministic inputs.
9. Filtered:
An optimized nonclustered index, especially suited to cover queries that
select from a well-defined subset of data. It uses a filter predicate to index a
portion of rows in the table. A well-designed filtered index can improve
query performance, reduce index maintenance costs, and reduce index storage costs
compared with full-table indexes.

10. Spatial:
A spatial index provides the ability to perform certain operations more
efficiently on spatial objects (spatial data) in a column of the geometry data
type. The spatial index reduces the number of objects on which relatively
costly spatial operations need to be applied.
11. XML:
A shredded, and persisted, representation of the XML binary large objects
(BLOBs) in the xml data type column.
12. Fulltext:
A special type of token-based functional index that is built and maintained
by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for
sophisticated word searches in character string data.

What is Index Fragmentation?


-----------------------------------------------------------------------------------
-------------------------

The SQL Server Database Engine automatically modifies indexes whenever insert,
update, or delete operations are made to the underlying data. Over time these
modifications can cause the information in the index to become scattered in the
database (fragmented). Fragmentation exists when indexes have pages in which the
logical ordering, based on the key value, does not match the physical ordering
inside the data file. Heavily fragmented indexes can degrade query performance and
cause your application to respond slowly, especially scan operations. You can
remedy index fragmentation by reorganizing or rebuilding an index.

What is Index rebuilding and Index reorganizing?


-----------------------------------------------------------------------------------
--------------------------

Rebuilding an index drops and re-creates the index. This removes fragmentation,
reclaims disk space by compacting the pages based on the specified or existing fill
factor setting, and reorders the index rows in contiguous pages. When ALL is
specified, all indexes on the table are dropped and rebuilt in a single
transaction.

Reorganizing an index uses minimal system resources. It defragments the leaf level
of clustered and nonclustered indexes on tables and views by physically reordering
the leaf-level pages to match the logical, left to right, order of the leaf nodes.
Reorganizing also compacts the index pages. Compaction is based on the existing
fill factor value.

What is Fill Factor?


-----------------------------------------------------------------------------------
----------------------------

The fill-factor option is provided for fine-tuning index data storage and
performance. When an index is created or rebuilt, the fill-factor value determines
the percentage of space on each leaf-level page to be filled with data, reserving
the remainder on each page as free space for future growth. For example, specifying
a fill-factor value of 80 means that 20 percent of each leaf-level page will be
left empty, providing space for index expansion as data is added to the underlying
table. The empty space is reserved between the index rows rather than at the end of
the index.
The fill-factor value is a percentage from 1 to 100, and the server-wide default is
0 which means that the leaf-level pages are filled to capacity.

What is Filestream?
-----------------------------------------------------------------------------------
-----------------------------

FILESTREAM enables SQL Server-based applications to store unstructured data, such


as documents and images, on the file system.FILESTREAM integrates the SQL Server
Database Engine with an NTFS or ReFS file systems by storing varbinary(max) binary
large object (BLOB) data as files on the file system.

FILESTREAM is not automatically enabled when you install or upgrade SQL Server. You
must enable FILESTREAM by using SQL Server Configuration Manager and SQL Server
Management Studio. To use FILESTREAM, you must create or modify a database to
contain a special type of filegroup. Then, create or modify a table so that it
contains a varbinary(max) column with the FILESTREAM attribute. After you complete
these tasks, you can use Transact-SQL and Win32 to manage the FILESTREAM data.

What is Sequence?
-----------------------------------------------------------------------------------
-----------------------------
https://docs.microsoft.com/en-us/sql/relational-databases/sequence-
numbers/sequence-numbers?view=sql-server-2017

A sequence is a user-defined schema-bound object that generates a sequence of


numeric values according to the specification with which the sequence was created.
The sequence of numeric values is generated in an ascending or descending order at
a defined interval and may cycle (repeat) as requested. Sequences, unlike identity
columns, are not associated with tables. An application refers to a sequence object
to receive its next value. The relationship between sequences and tables is
controlled by the application. User applications can reference a sequence object
and coordinate the values keys across multiple rows and tables.
A sequence is created independently of the tables by using the CREATE SEQUENCE
statement. Options enable you to control the increment, maximum and minimum values,
starting point, automatic restarting capability, and caching to improve
performance.

What is Spatial data type?


-----------------------------------------------------------------------------------
-----------------------------

https://docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-data-sql-
server?view=sql-server-2017

Spatial data represents information about the physical location and shape of
geometric objects. These objects can be point locations or more complex objects
such as countries, roads, or lakes.
SQL Server supports two spatial data types: the geometry data type and the
geography data type.
The geometry type represents data in a Euclidean (flat) coordinate system.
The geography type represents data in a round-earth coordinate system.
Both data types are implemented as .NET common language runtime (CLR) data types in
SQL Server.

What is Stored procedure and why do we use it?


-----------------------------------------------------------------------------------
-----------------------------
https://docs.microsoft.com/en-us/sql/relational-databases/stored-procedures/stored-
procedures-database-engine?view=sql-server-2017

A stored procedure in SQL Server is a group of one or more Transact-SQL statements


or a reference to a Microsoft .NET Framework common runtime language (CLR) method.
Procedures resemble constructs in other programming languages because they can:
Accept input parameters and return multiple values in the form of output parameters
to the calling program.
Contain programming statements that perform operations in the database. These
include calling other procedures.
Return a status value to a calling program to indicate success or failure (and the
reason for failure).

There are four types of procedures.


1) User Defined
2) Temporary procedures which is again two types and they are local and global
temporary procedures.
3) System procedures which are stored in resource db and msdb databases
4) Extended store procedure (These should be used anymore as they may be removed in
the future versions of SQL Server.): These are actually DLLs which gets called from
a SQL server instance. Use CLR procedures instead.

How many types of table are there in SQL server?


-----------------------------------------------------------------------------------
-----------------------------

1. User Defined Table:

Tables are database objects that contain all the data in a database. In tables,
data is logically organized in a row-and-column format similar to a spreadsheet.
Each row represents a unique record, and each column represents a field in the
record.

2. temporary table

Temporary tables are stored in tempdb. There are two types of temporary tables:
local and global. They differ from each other in their names, their visibility, and
their availability. Local temporary tables have a single number sign (#) as the
first character of their names; they are visible only to the current connection for
the user, and they are deleted when the user disconnects from the instance of SQL
Server. Global temporary tables have two number signs (##) as the first characters
of their names; they are visible to any user after they are created, and they are
deleted when all users referencing the table disconnect from the instance of SQL
Server.

3. System Table:

SQL Server stores the data that defines the configuration of the server and all its
tables in a special set of tables known as system tables. Users cannot directly
query or update the system tables. The information in the system tables is made
available through the system views.

4. Wide Table:

https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables?view=sql-
server-2017

Wide tables use sparse columns to increase the total of columns that a table can
have to 30,000. Sparse columns are ordinary columns that have an optimized storage
for null values. Sparse columns reduce the space requirements for null values at
the cost of more overhead to retrieve nonnull values. A wide table has defined a
column set, which is an untyped XML representation that combines all the sparse
columns of a table into a structured output. The number of indexes and statistics
is also increased to 1,000 and 30,000, respectively. The maximum size of a wide
table row is 8,019 bytes. Therefore, most of the data in any particular row should
be NULL. The maximum number of nonsparse columns plus computed columns in a wide
table remains 1,024.

5. Partitioned table:

https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-
tables-and-indexes?view=sql-server-2017

Partitioned tables are tables whose data is horizontally divided into units which
may be spread across more than one filegroup in a database. Partitioning makes
large tables or indexes more manageable by letting you access or manage subsets of
data quickly and efficiently, while maintaining the integrity of the overall
collection. By default, SQL Server 2017 supports up to 15,000 partitions.

SQL Server supports table and index partitioning. The data of partitioned tables
and indexes is divided into units that can be spread across more than one filegroup
in a database. The data is partitioned horizontally, so that groups of rows are
mapped into individual partitions. All partitions of a single index or table must
reside in the same database. The table or index is treated as a single logical
entity when queries or updates are performed on the data.

Benefits of partition:

Partitioning large tables or indexes can have the following manageability and
performance benefits.
You can transfer or access subsets of data quickly and efficiently, while
maintaining the integrity of a data collection. For example, an operation such as
loading data from an OLTP to an OLAP system takes only seconds, instead of the
minutes and hours the operation takes when the data is not partitioned.
You can perform maintenance operations on one or more partitions more quickly. The
operations are more efficient because they target only these data subsets, instead
of the whole table. For example, you can choose to compress data in one or more
partitions or rebuild one or more partitions of an index.
You may improve query performance, based on the types of queries you frequently run
and on your hardware configuration. For example, the query optimizer can process
equi-join queries between two or more partitioned tables faster when the
partitioning columns in the tables are the same, because the partitions themselves
can be joined.
When SQL Server performs data sorting for I/O operations, it sorts the data first
by partition. SQL Server accesses one drive at a time, and this might reduce
performance. To improve data sorting performance, stripe the data files of your
partitions across more than one disk by setting up a RAID. In this way, although
SQL Server still sorts data by partition, it can access all the drives of each
partition at the same time.
In addition, you can improve performance by enabling lock escalation at the
partition level instead of a whole table. This can reduce lock contention on the
table. To reduce lock contention by allowing lock escalation to the partition, set
the LOCK_ESCALATION option of the ALTER TABLE statement to AUTO.

Components of partition:

The following terms are applicable to table and index partitioning.

Partition function:

A database object that defines how the rows of a table or index are mapped to a set
of partitions based on the values of certain column, called a partitioning column.
That is, the partition function defines the number of partitions that the table
will have and how the boundaries of the partitions are defined. For example, given
a table that contains sales order data, you may want to partition the table into
twelve (monthly) partitions based on a datetime column such as a sales date.

Partition scheme:

A database object that maps the partitions of a partition function to a set of


filegroups. The primary reason for placing your partitions on separate filegroups
is to make sure that you can independently perform backup operations on partitions.
This is because you can perform backups on individual filegroups.

Partitioning column
The column of a table or index that a partition function uses to partition the
table or index. Computed columns that participate in a partition function must be
explicitly marked PERSISTED. All data types that are valid for use as index columns
can be used as a partitioning column, except timestamp. The ntext, text, image,
xml, varchar(max), nvarchar(max), or varbinary(max) data types cannot be specified.
Also, Microsoft .NET Framework common language runtime (CLR) user-defined type and
alias data type columns cannot be specified.
Aligned index
An index that is built on the same partition scheme as its corresponding table.
When a table and its indexes are in alignment, SQL Server can switch partitions
quickly and efficiently while maintaining the partition structure of both the table
and its indexes. An index does not have to participate in the same named partition
function to be aligned with its base table. However, the partition function of the
index and the base table must be essentially the same, in that:
The arguments of the partition functions have the same data type.
They define the same number of partitions.
They define the same boundary values for partitions.
Partitioning Clustered Indexes
When partitioning a clustered index, the clustering key must contain the
partitioning column. When partitioning a nonunique clustered index, and the
partitioning column is not explicitly specified in the clustering key, SQL Server
adds the partitioning column by default to the list of clustered index keys. If the
clustered index is unique, you must explicitly specify that the clustered index key
contain the partitioning column.
Partitioning NonClustered Indexes
When partitioning a unique nonclustered index, the index key must contain the
partitioning column. When partitioning a nonunique, nonclustered index, SQL Server
adds the partitioning column by default as a nonkey (included) column of the index
to make sure the index is aligned with the base table. SQL Server does not add the
partitioning column to the index if it is already present in the index.
Non-aligned index
An index partitioned independently from its corresponding table. That is, the index
has a different partition scheme or is placed on a separate filegroup from the base
table. Designing an non-aligned partitioned index can be useful in the following
cases:
The base table has not been partitioned.
The index key is unique and it does not contain the partitioning column of the
table.
You want the base table to participate in collocated joins with more tables using
different join columns.
Partition elimination
The process by which the query optimizer accesses only the relevant partitions to
satisfy the filter criteria of the query.

What is sparce column?


-----------------------------------------------------------------------------------
---------------------

https://docs.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-
columns?view=sql-server-2017

Sparse columns are ordinary columns that have an optimized storage for null values.
Sparse columns reduce the space requirements for null values at the cost of more
overhead to retrieve nonnull values. Consider using sparse columns when the space
saved is at least 20 percent to 40 percent. Sparse columns and column sets are
defined by using the CREATE TABLE or ALTER TABLE statements.
Sparse columns can be used with column sets and filtered indexes.

What is Filtered Index?


-----------------------------------------------------------------------------------
---------------------

https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-
indexes?view=sql-server-2017

A filtered index is an optimized nonclustered index especially suited to cover


queries that select from a well-defined subset of data. It uses a filter predicate
to index a portion of rows in the table. A well-designed filtered index can improve
query performance as well as reduce index maintenance and storage costs compared
with full-table indexes.
You cannot create a filtered index on a view. However, the query optimizer can
benefit from a filtered index defined on a table that is referenced in a view. The
query optimizer considers a filtered index for a query that selects from a view if
the query results will be correct.

How to track data changes in SQL Server?


-----------------------------------------------------------------------------------
---------------------

https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/track-data-
changes-sql-server?view=sql-server-2017

SQL Server 2017 provides two features that track changes to data in a database:
change data capture and change tracking. These features enable applications to
determine the DML changes (insert, update, and delete operations) that were made to
user tables in a database. Change data capture and change tracking can be enabled
on the same database; no special considerations are required.

What is the difference between change data capture and change tracking?
-----------------------------------------------------------------------------------
---------------------

The following table lists the feature differences between change data capture and
change tracking. The tracking mechanism in change data capture involves an
asynchronous capture of changes from the transaction log so that changes are
available after the DML operation. In change tracking, the tracking mechanism
involves synchronous tracking of changes in line with DML operations so that change
information is available immediately.
Feature Change data capture Change tracking
Tracked changes
DML changes Yes No
Tracked information
Historical data Yes No
Whether column was changed Yes Yes
DML type Yes Yes

Вам также может понравиться