Академический Документы
Профессиональный Документы
Культура Документы
This document is primarily intended for use by SAP HANA developers and business analysts to use as a
general guideline for best practices. It presents the considerations to have in mind for when building native
HANA views, flows and other objects. This document covers HANA studio specifically and not any other
tools.
2 General
All BW tables can be found be in the SAP<SID> schema. Additional authorization might be required to be
able to access this schema or included tables. Be aware that only full or no access is possible for tables,
there are no row-level authorizations on table level (only possible for views).
Source: SAP
The Star Join in Calculation Views of type CUBE with Star Join is a Node Type rather than a Join type. It can
contain different kind of joins. While the fact table (data source) of a star join can be any type of input
node, only Calculation Views of Data Category Dimension are allowed as input nodes for dimensions.
2.2 Joins
The following joins exist in HANA:
Source: SAP
Source: SAP
Source: SAP
3 Best practices
Before handing over your solution to the operations team, you should be able to answer all questions with
yes. There can be exceptional cases where this is not the case, however they should then be mentioned in
sufficient time before the handover session so that the operations team can review them and be able to
ask detailed questions during the handover session.
3.1 General
Have you only used calculation views and no analytic & attribute views?
Do not create any attribute or analytic views unless you find a performance issue or missing feature
when using calculation views (this can only be the case < SPS12). From HANA SPS12 onwards, all
attribute and analytic views should be migrated to calculation views. This is even mandatory when
you want to move to XSA/HDI. A migration tool is available.
Have you only used graphical calculation views and no script-based calculation views?
Avoid the usage of script-based calculation views as they cannot be optimized as much as graphical
calculation views. From SPS12 onwards, script-based calculation views should be migrated to user
defined table functions. This is even mandatory when you want to move to XSA/HDI. A migration
tool is available.
Did you use projection nodes when adding views/tables to a calculation view?
All views/tables should be used with a projection node. Projection nodes improve performance by
narrowing the data set (columns). Further optimization can be done by applying filters at projection
nodes.
Are you using external / generated HANA views in your CV instead of directly including the active
data table?
This is SAP's recommended approach. Only by using external HANA views, you can ensure that data
from NLS/SAP IQ is also extracted and that new fields are added automatically. Be aware of bugs
before SPS 11 especially in relation to NLS (e.g. filters/WHERE clauses not being pushed down to
NLS layer). In addition, we experienced much worse performance in some cases on HANA Rev.
97.02 compared to directly using active data tables of an DSO.
When using external / generated HANA views of a DSO in your CV, have you activated SID
generation during activation?
This is SAP's recommendation to ensure optimal performance.
When comparing two columns frequently within a query, have you ensured that they have the
same data type?
For columns of different types, SAP HANA uses implicit type casting to enable comparison in HANA
Models. However, implicit type casting has a negative effect on performance.
In case of performance issues, have you considered to turn on caching?
It is possible to cache the results of the calculation view (under properties) which should speed up
the time to get the results when querying the CV. However, this only makes sense when the query
is time-consuming. Caching can occupy a significant space on the database system, so please do not
turn this on without talking to the operations team. (Hint: caching also needs to be turned on
initially by the database administrator)
3.2 Engines
Have you used the “Explain Plan” feature to check if the complete view got unfolded?
From HANA SPS9 onwards, HANA implicitly tries to optimize all calculation views in the SQL engine
(see SAP Note 2223597). This is independently of the setting “Execute in SQL Engine”.
While a lot of restrictions still exist on SPS9 - SPS11, there are no "No prominent blockers so far" for
SPS12 (see SAP Note 1857202). If the CV cannot be fully translated into relational algebra, the CV
will not be unfolded at all. You can check this via “Explain plan”. If the result only consists out of 1-2
rows using the column view as input, the CV did not get unfolded. An unfolded CV will consist of
many single operation/rows using the base objects of the VW. If the plan contains a line where
TABLE_NAME is the calculation view's name and the TABLE_TYPE is 'CALCULATION VIEW', this view
is not unfolded.
In general, a better performance can be expected if the view could be unfolded ergo the SQL
engine can be used for optimization.
Features that cannot be translated into relational algebra and should for this reason try to be
avoided in CV:
- Attribute Views
- Analytic Views
- Scripted Calculation Views
- possibly other ‘non-prominent blockers’ (clear documentation missing)
If the view cannot be unfolded, have you compared the performance when explicitly setting the
“Execute in SQL Engine” setting?
While there is no difference independently if ‘Execute in SQL Engine’ is set or not, when the view
can be completely unfolded, this is different if this is not possible. Enabling the “Execute in SQL
Engine” setting will trigger the view to get partly unfolded instead of being not unfolded at all
(when “Execute in SQL engine” is not set). Downturn is that data might then have to be moved
between engines. Based on the view, this can lead in some cases to a better, in other cases to a
worse performance. For this reason, it makes sense to at least compare the performance of both
approaches.
Have you used the “Explain Plan” feature to check if only one engine is used in the execution?
Depending how your model is built and which functions are used, different execution engines (CE,
Join, OLAP) are called. You must try to avoid that multiple engines are called within one view so all
data needs to be passed between them all the time which can lead to suboptimal performance.
Have you created all necessary calculations directly on the database layer?
Create all calculations within your calculation views and avoid creating any calculations in the
reporting layer (Universe & Front end tools).
Have you tested potential problems when using multiplications in calculated columns in
combination with aggregations?
Be careful when using multiplications in calculated columns in combination with aggregations:
Source: SAP
Source: SAP
Are you sure that all calculations before aggregation are actually required?
By analyzing your reporting requirements, you can arrive at a decision at which stage the
calculation should be performed. Try to minimize calculations before aggregation as this slow down
the performance. Calculation before aggregation is also not required if the calculations are just
additions or subtractions and there are no multiplications/divisions etc.
Are you sure that all data in the calculation view is required by the users?
Unnecessary data can significantly slow down the execution runtimes. If a large amount of data
stored in base tables is never used for reporting since users decide to apply WHERE clause filters in
reporting tools, introduce filters into information models in order to speed up query execution.
Is the data returned to the end user in the most aggregated way possible?
A primary goal during modeling is to minimize data transfers between engines. This statement
holds true both internally, that is, in the database between engines, but also between SAP HANA
and the end user client application. For example, an end user will never need to display million
rows of data. Such a large amount of information just cannot be consumed in a meaningful way.
Whenever possible, data should be aggregated and filtered to a manageable size before it leaves
the data layer. When deciding which records should be reported upon, a best practice is to think at
a “set level” not a “record level”.
In case of performance issues, have you considered to created client/country specific views?
Think about aggregating by a region, a date, or some other group in order to minimize the amount
of data passed between views. For performance purposes, it might make sense to create client
dependent views.
3.3 Calculated/Restricted columns
(from SPS10/11 onwards) If possible, have you validated your calculated columns, restricted
columns, filter & default value expressions for variables and input parameters against plain SQL
instead of SQLScript?
Use plain SQL instead of the Column Engine to ensure the optimal performance. This means that
expressions like IF…ELSE need be avoided wherever possible.
Have you avoided type casting of dates to string and vice versa to extract time dimensions?
This can cause performance issues. At least compare the performance against using the component
function “component(date, int)” if this also satisfies your requirements.
Have you avoided implicit type casting?
Statements like IF ‘1’=1 THEN … will work but have a performance impact due to casting INT to
NVARCHAR.
3.4 Joins/Unions
Have you checked that each join is actually required?
Avoid using JOIN nodes in calculation views as it is very costly from a computing standpoint. Even
an unused left outer join (where no column from the right-side is requested) can impact
performance. Also make sure that you do not use a JOIN when a UNION would be more
appropriate (Union provides much better performance).
Have you tried to join on key and indexed columns wherever possible?
This ensure the best performance compared to joins on non-indexed columns.
When using a referential join, have you ensured that the referential integrity is always given?
Referential joins should be used with caution since it assumes that referential integrity is ensured
(could for example be valid for master data tables). The only valid scenario for the Referential Join
is that
(a) it is 110% guaranteed that for each row in one table, there is at least one join partner in the
other table, and
(b) that holds true in both directions
(c) at all times.
If that is not the case then referential joins have the possibility to give incorrect calculations if the
referential integrity is not met (-> different results depending on whether you select columns from
the right table or not). Example: if a delivery header is created but the items are not processed until
a later stage then any calculations that use referential joins will be incorrect.
(Background: A referential join is semantically an inner join that assumes that referential integrity is
given which means that the left table always have an corresponding entry on the right table. It can
be seen as an optimized or faster inner join where the right table is not checked if no field from the
right table is requested. That means that the Referential Joins will be only executed, when fields
from both tables are requested. Therefore, if a field is selected from the right table it will act similar
to inner join, and if no fields from the right table is selected it will act similar to a left outer join.
From performance perspective, the Left Outer Join are almost equally fast as Referential Join, while
the Inner Join is usually slower due to the fact, that the join is always executed.
In Calculation Views, referential joins are executed in the following way:
o If at least one field is selected from the right table, it will behave as an inner join.
o If no field from the right table is selected, the execution of the referential join depends on
the cardinality:
If the cardinality is 1..1 or n..1, the join is not executed. This corresponds to the
most frequent situation, and in particular when you join a (left) fact table or view
with a (right) view or table that contains only dimensions. This is where the
optimization occurs.
In the rare case where you would use a 1..n cardinality, the join is executed as an
inner join. Indeed, this is a requirement to get the correct number of rows in the
output, which depends on the number of matching rows in the right table or view.)
Source: SAP
Have you reviewed the Explain Plan to check if the number of records are as expected?
Source: SAP
In case of performance issues, have you analyzed the CV with the CV debugger?
Source: SAP
In case of performance issues, have you checked the cardinality, number of rows and partitions
of the source tables via the performance analysis mode?
It might make sense to partition the table to support parallelization. You can even create several
partitions on one node. Joins have the best performance if both tables have same number of
partitions and partition are defined on the same columns (in a BW on HANA system, this is decided
automatically based on the TABLE_PLACEMENT table)
The performance analysis can provide additional input on this:
Source: SAP
In case of a scale-out scenario, have you checked in the plan visualizer if the network transfer
between nodes is responsible for bad performance?
In same cases, it might be feasible to move the involved tables to the same node (see
TABLE_PLACEMENT table).
3.6 Variables
Do you use a table instead of a view as input help for variables?
The attribute selection for “View/Table value for help” is important for performance. If you select
an attribute from a table rather of the view itself, the provision of the list of values will be faster as
there is no need to execute the view. This also ensures that the list of values is consistent from view
to view. The screenshot shows the default value / how it should not be done if you can avoid it:
Please be aware that this approach is usually not feasible if a CV is based on data coming from BW.
The reason is that CVs should be based on external HANA views and not on actual BW tables. It
might make sense though for CV built on data that got imported into an actual HANA table e.g. via
Data Services.
3.7 Tables
Did you use CDS to create new tables?
Never create a table with the SQL statement 'CREATE TABLE'. Instead use design time objects
(.hdbdd). This also makes the objects transportable and the objects are not bound to the use
(objects will get deleted when user gets deleted). For more see here. One .hdbdd file can contain
several table definition. If tables use the same structures or are within the same context, it makes
sense to use the same .hdbdd file for their definitions.
a. You expect an extremely high number of updates and barely any select queries.
b. Always report on all columns without aggregation
Have you checked if it is possible storing the primary keys in INT format?
This can lead to faster joins and better compression ratios.
3.8 SDA
When using virtual tables via SDA, have you created statistics on the virtual tables?
Statistics on virtual tables need to be created manually and can have a significant impact on the
performance. The statistics should be recreated when the table has changed significantly (e.g. 30%
data increase/decrease).
Did you follow SAP's security recommendations for SQL Script procedures?
● Mark each parameter using the keywords IN or OUT. Avoid using the INOUT keyword as this can
cause security vulnerabilities.
● Use the INVOKER keyword when you want the user to have the assigned privileges to start a
procedure. The default keyword, DEFINER, allows only the owner of the procedure to start it.
● Mark read-only procedures using READS SQL DATA whenever it is possible. This ensures that the
data and the structure of the database are not altered. Tip Another advantage to using READS SQL
DATA is that it optimizes performance.
● Ensure that the types of parameters and variables are as specific as possible. Avoid using
VARCHAR, for example. By reducing the length of variables you can reduce the risk of injection
attacks.
● Perform validation on input parameters within the procedure.
Have you avoided extremely large intermediate result sets, even if they will be reduced or
aggregated before the final result is returned to the client?
They can lead to high memory requirements.
Do you perform calculations at the last node possible in the data flow?
This should ideally be done in final aggregation or projection to reduce the amount of data rows on
which you are performing the calculations and also reduce the number of columns being
transferred from one node to another.
Have you split complex queries into logical sub queries wherever possible?
This can help the optimizer to identify common sub expressions and to derive more efficient
execution plans.
Have you at least once had a look at the explain and visualize plans to understand the costs of
your statements?
Look at the explain plan to investigate the performance impact of different SQL queries used in
scripts.
When writing SQL statements, have you checked if any of SAP's query tuning tips is relevant and
can be implemented?
Not following SAP's recommendations described under chapter 3.9.12: SQL Query tuning tips can
have a major negative impact on the performance.
4. Hit ‘Execute’
5. Select the object(s) you would like to transport and check them. In the below screenshot the folder
& the object are transported, but you can also choose individual objects only:
6. For new objects: Assign it to package (same as normal BW objects) and hit ‘save’.
7. Create a new request or assign it to an existing request. In case the HANA object is dependent on
BW objects, all objects should be bundled in the same request, so they are activated at the same
time. Then save.
9. Go to SE10 and open the request. Depending on your selection, you can either see a full package or
a single object. It will also state SAP HANA Transport for ABAP.
10. Now you can treat the transport as a normal ABAP transport (release and import via STMS). Once
the transport is imported, the object/folder etc. is automatically created in the HANA repository.
Additional notes:
1. As soon as you change the native HANA object in HANA Studio, you need to synchronize the
object again. In this case, the status also switches from green to yellow in transaction
SCTS_HTA. Otherwise, the ABAP transport does not contain the newest version of the HANA
object. If you execute the synchronization again and the corresponding request is not released
yet, the object will be updated in the request after synchronization. Otherwise, a new request
needs to be created.
2. When you delete an object, the ABAP system recognizes that an object is still activated in the
ABAP system that is not active in HANA. It can then be added to a transport request the same
way as a new/changed file.
3. When importing an external/generated HANA View and a dependent HANA calculation view,
the first import of the transport might fail. To fix this, you can either:
a. Reimport the transport a second time (the external HANA view is then activated and
can be found by the HANA calculation view)
b. Create a first transport including the generated/external HANA view and a second
transport including the dependent calculation view.
5 Sources
SAP courses HA100 and HA300
SAP HANA Modeling Learning Room (SAP Learning Hub)
SAP HANA International Focus Group (SAP Jam)
HTA Introduction: http://sapassets.edgesuite.net/sapcom/docs/2016/05/80f9d984-737c-0010-82c7-
eda71af511fa.pdf