Вы находитесь на странице: 1из 2

Tips for Creating Solid Data Warehouses

Five Integration Services Tips

Top 10

1.

Profiling culprits is not just for the FBIData Profiling can save a developer lots of research

headaches and improve customer perception regarding ETL and Cube quality. If youre about to bring in a new fact table and profiling reveals that 25 percent of the source records have a null primary key, you can report this to the client in your Data Profile. This advance understanding of the data profile accomplishes two things. First, it gives your data supplier a chance to improve the data in parallel with your ETL work. Second, if the data source is not improved or only improved slightly, your client is already aware of the extent of the problem and the genesis. This advance heads up helps you quickly answer questions about cube data quality, therefore saving you hours researching the root of the problem.

2.

Write T-SQL your Mother could understandSSIS is designed to be modular, to break out the application of Business Rules and data cleansing activities into discrete, easy-to-read-and-understand, maintainable chunks of code. Writing complex T-SQL select statements, sticking them into your Data Source component, and connecting that directly to a Destination Component circumvents all that is great about SSIS. Start with the goal of making SSIS Packages a series of simple, step-wise, modular code units and only use your advanced T-SQL skills when they are truly the only way you can think of to accomplish your goals.

3.

SSIS self-documentation is magical unless you forget to recite your spells

Lookup 1, Derived Column 6, Union All 14. What kind of names are these for SSIS Components? Horrible ones. SSIS can be an incredible little self-documenting machine. Why let a component be named Lookup 1 when it could be named Lookup Job Code using Job ID instead? Dont waste the potential of SSIS by not taking care to make your flows easy to view and understand.

4.

Be a pack ratRemember the transactional model adage If you can calculate it on the fly, dont store it? That adage is heresy in the Business Intelligence world. We save everything and we leave a trail. If we have a Business Rule for E that says Add A to B and divide by C times D we dont store E. We store A, B, C, D and E. We might also store the product of A+B and C*D just prior to storing E. This approach not only gives us flexibility if the Business Rule changes in the future but it also makes testing and troubleshooting much faster and straightforward.

5.

Give props to Union AllSome people hate the SSIS component known as Union All. They avoid Union All because this component is in a class of components known as a Partially Blocking Asynchronous Transformation. What that means in plainer language is they can be slow sometimes. Generally speaking, you should use Union Alls whenever they make sense, unless you have problems with performance. Only then should you look to remove them in favor of other options. As long as youre using them, give them the proper respect and use them to their fullest. Use Unions to clean things up and keep them tidy downstream. Unions can be the perfect place to jettison fields once they have served their purpose in a lookup or to rename them if they will eventually require a different name in the Destination Mapping.

Five Analysis Services Tips

6.

Recognize dimensions as low-hanging fruitDimensions are usually easy to develop in the cube. They often can be coded more quickly than they could be written up in a formal Requirements document. For this reason, it makes sense to use Named Queries in the DSV to prototype your dimensions. The Named Query can point directly at the source data. Once the customer approves the dimension presentation, the SQL from the Named Query can be passed to the ETL developer and used as a guideline for the table schema required in the final dimension table.

7.

Call them what they areDistinguish between Unknown and Not Applicable members in your

dimensions. In other words, if you have a Sales Orders Fact table and an Employee Fact table in the same cube, you might have an Invoice Type dimension that links to Sales Orders but has no meaning in reference to Employees. Dont mark the Employee Fact Records with your default Unknown value for the Invoice Type. Instead, set up a separate Not Applicable value for each dimension. This will help with Customer Satisfaction because customers dont like Unknowns but they often dont mind Not Applicable.

8.

Keep aggregations in your back pocketThe Aggregation wizard in SSAS is easy to use and straightforward, but if you establish the aggregations early, processing times during development slow down in favor of faster end-user query processing. Its better to create the aggregations later in the project, often after several iterations have gone into production. This not only gives you real data for Usage Based Optimization but provides a quick win when the size and complexity reach a point where aggregations can actually provide noticeable performance gains.

9.

Scope thisEnter Scope statements on the Calculations tab of the Cube file to suppress values that dont

make any sense. Scope statements arent an option youll learn about by browsing the Cube file interface. But Scope statements have simple MDX syntax and are powerful for creating a clean user experience. For example, sometimes percentages look silly when they appear as 100% at the rolled up levelt ey are often confusing to users when they appear. You can use Scope statements to suppress percentage measures at the All level of your dimensions.

10.

Shake the MDX bugabooMDX can be scary. Many clients dont realize that a cube can be made

much more powerful through the use of MDX functions in Calculated Measures. Maybe you hope they dont find out so you dont have to learn MDX. Decide today to create a Calculated Member to show off for your client. Create a Previous Period calculation. You can quickly and easily use the Current Member and the PrevMember function on your Date dimension and couple that with your most popular measure. The MDX is simple. Heres a sample Previous Period Calculation: ([Date].[YMD].CurrentMember.PrevMember, [Sales Count]). Date is the dimension, YMD is our date Hierarchy, and Sales Count is our most popular member.

Вам также может понравиться