Вы находитесь на странице: 1из 30

How can we update a record in target table without using Update strategy?

A target table can be updated without using 'Update Strategy'. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as "Update as Update" and check the "Update" check-box. Let's assume we have a target table "Customer" with fields as "Customer ID", "Customer Name" and "Customer Address". Suppose we want to update "Customer Address" without an Update Strategy. Then we have to define "Customer ID" as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.

Under what condition selecting Sorted Input in aggregator may fail the session?

If the input data is not sorted correctly, the session will fail. Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.

How to configure a Lookup as Active?


To use this option, while creating the transformation, we must configure the Lookup transformation property "Lookup Policy on Multiple Match" to Use All Values. Once created we cannot change the mode between passive and active. When ever the Lookup policy on multiple match attribute is set to Use All Values. The property becomes read-only.

Implementing a Lookup As Active


Scenario: Suppose we have customer order data in a relational table. Each customer has multiple orders in the table. We can configure the Lookup transformation to return all the orders placed by a customer. Now check the below simple mapping where we want to return all employees in the departments. Go to Transformation and click Create. Select Transformation Type as Lookup and enter a name for the transformation.

Next check the option Return All Values on Multiple Match.

Here our source is the DEPT table and the EMP table is used a lookup. The lookup condition is based on the department number.

Basically we try to achieve the result as the below sql select:SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC, EMP.ENAME, EMP.SAL FROM DEPT LEFT OUTER JOIN EMP ON DEPT.DEPTNO = EMP.DEPTNO

Active Lookup Transformation Restrictions:


1. We cannot return multiple rows from an unconnected Lookup transformation 2. We cannot enable dynamic cache for a Active Lookup transformation. 3. Active Lookup Transformation that returns multiple rows cannot share a cache with a similar Passive Lookup Transformation that returns one matching row for each input row.

What is Informatica Metadata


The term "metadata" is often used for the purpose of denoting "data about data". Although this definition does not apply strictly for Informatica PowerCentre, a better suggestion can be "structural metadata" which specifically apply to the data about the structures in Informatica. Informatica stores the data transformation logic in the form of PowerCentre Designer Mapping and the physical connection details etc. in the form of PowerCentre Manager session. Apart from that, Informatica PowerCentre also stores the information about Workflows, Worklets, Repositories, Folders etc. All these information is collectively called Informatica Metadata and are stored in a structured data model called Informatica Repository.

Tables and Views Under Informatica Metadata Repository


There are 500 over tables and views present in Informatica 8.5.x Repository. All table name starts with OPB_ and the view names start with REP_. My objective here is not to go through all the table and view list, instead, I want to give you the overview of the important Tables and Views only and want to show you how to write simple SQL Query to obtain useful information quickly using the metadata repository. So let's get started. Nicely and easily. How to get folders and mappings names from informatica metadata query: We can use OPB_MAPPING and OPB_SUBJECT tables residing under informatica Repository to obtain information about all the mappings under each Informatica Folder. Following SQL query shows you how to do it. Below SQL query that shows the names of all the Folders in the repository and the mappings contained in them along with last saved date, mapping version number and versioning comments, if any.
SELECT S.SUBJ_NAME FOLDER, M.MAPPING_NAME MAPPING, M.VERSION_NUMBER VERSION_NUMBER, CASE WHEN M.IS_VALID = 1 THEN 'YES' ELSE 'NO' END IS_VALID, M.LAST_SAVED SAVED_ON, M.CHECKOUT_USER_ID, M.COMMENTS FROM OPB_MAPPING M, OPB_SUBJECT S WHERE M.SUBJECT_ID = S.SUBJ_ID AND is_visible = 1 ORDER BY 1, 2, 3;

When we run a session, the integration service may create a reject file for each target instance in the mapping to store the target reject record. With the help of the Session Log and Reject File we can identify the cause of data rejection in the session. Eliminating the cause of rejection will lead to rejection free loads in the subsequent session runs. If the Informatica Writer or the Target Database rejects data due to any valid reason the integration service logs the rejected records into the reject file. Every time we run the session the integration service appends the rejected records to the reject file.

Working with Informatica Bad Files or Reject Files


By default the Integration service creates the reject files or bad files in the $PMBadFileDir process variable directory. It writes the entire reject record row in the bad file although the problem may be in any one of the Columns. The reject files have a default naming convention like [target_instance_name].bad . If we open the reject file in an editor we will see comma separated values having some tags/ indicator and some data values. We will see two types of Indicators in the reject file. One is the Row Indicator and the other is the Column Indicator . For reading the bad file the best method is to copy the contents of the bad file and saving the same as a CSV (Comma Sepatared Value) file. Opening the csv file will give an excel sheet type look and feel. The firstmost column in the reject file is the Row Indicator , that determines whether the row was destined for insert, update, delete or reject. It is basically a flag that determines the Update Strategy for the data row. When the Commit Type of the session is configured as User-defined the row indicator indicates whether the transaction was rolled back due to a non-fatal error, or if the committed transaction was in a failed target connection group.

List of Values of Row Indicators:

Row Indicator 0 1 2 3 4 5 6

Indicator Significance Insert Update Delete Reject Rolled-back insert Rolled-back update Rolled-back delete

Rejected By Writer or target Writer or target Writer or target Writer Writer Writer Writer

7 8 9

Committed insert Committed update Committed delete

Writer Writer Writer

Now comes the Column Data values followed by their Column Indicators, that determines the data quality of the corresponding Column.

List of Values of Column Indicators:


Column Indicator Type of data Writer Treats As Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key while inserting. Numeric data exceeded the specified precision or scale for the column. Bad data, if you configured the mapping target to reject overflow or truncated data. The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values. String data exceeded a specified precision for the column, so the Integration Service truncated it. Bad data, if you configured the mapping target to reject overflow or truncated data.

Valid data or Good Data.

Overflowed Numeric Data.

Null Value.

Truncated String Data.

Also to be noted that the second column contains column indicator flag value 'D' which signifies that the Row Indicator is valid.

Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a session. If the source changes incrementally and we can capture the changes, then we can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to delete previous loads data, process the entire source data and recalculate the same data each time you run the session.

Incremental Aggregation
When the session runs with incremental aggregation enabled for the first time say 1st week of Jan, we will use the entire source. This allows the Integration Service to read and store the necessary aggregate data information. On 2nd week of Jan, when we run the session again, we will filter out the CDC records from the source i.e the records loaded after the initial load. The Integration Service then processes these new data and updates the target accordingly. Use incremental aggregation when the changes do not significantly change the target.If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with entire source data and recalculate the same aggregation formula . INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in monthly facts in a weekly basis.

Sample Mapping
Let us see a sample mapping to implement incremental aggregation:

Look at the Source Qualifier query to fetch the CDC part using a BATCH_LOAD_CONTROL table that saves the last successful load date for the particular mapping.

Look at the ports tab of Expression transformation.

Look at the ports tab of Aggregator Transformation.

Now the most important session properties configuation to implement incremental Aggregation

If we want to reinitialize the aggregate cache suppose during first week of every month we will configure the same session in a new workflow at workflow level with the Reinitialize aggregate cache property checked in.

Example with Data


Now have a look at the source table data: CUSTOMER_KEY 1111 2222 3333 1111 1111 2222 4444 5555 INVOICE_KEY 5001 5002 5003 6007 6008 6009 1234 6157 AMOUNT 100 250 300 200 150 250 350 500 LOAD_DATE 01/01/2010 01/01/2010 01/01/2010 07/01/2010 07/01/2010 07/01/2010 07/01/2010 07/01/2010

After the first Load on 1st week of Jan 2010, the data in the target is as follows:

CUSTOMER_KEY 1111 2222 3333

INVOICE_KEY 5001 5002 5003

MON_KEY 201001 201001 201001

AMOUNT 100 250 300

Now during the 2nd week load it will process only the incremental data in the source i.e those records having load date greater than the last session run date. After the 2nd weeks load after incremental aggregation of the incremental source data with the aggregate cache file data will update the target table with the following dataset: CUSTOMER_KEY INVOICE_KEY MON_KEY AMOUNT Remarks/Operation The cache file updated after aggretation The cache file updated after aggretation The cache file remains the same as before New group row inserted in cache file New group row inserted in cache file

1111

6008

201001

450

2222

6009

201001

500

3333

5003

201001

300

4444

1234

201001

350

5555

6157

201001

500

Understanding Incremental Aggregation Process


The first time we run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data for that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.

Each subsequent time we run the session with incremental aggregation, we use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data. When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session. Each subsequent time we run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files. The Integration Service creates new aggregate data, instead of using historical data, when we configure the session to reinitialize the aggregate cache, Delete cache files etc. When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost. Pushdown Optimization which is a new concept in Informatica PowerCentre, allows developers to balance data transformation load among servers. This article describes pushdown techniques.

What is Pushdown Optimization?


Pushdown optimization is a way of load-balancing among servers in order to achieve optimal performance. Veteran ETL developers often come across issues when they need to determine the appropriate place to perform ETL logic. Suppose an ETL logic needs to filter out data based on some condition. One can either do it in database by using WHERE condition in the SQL query or inside Informatica by using Informatica Filter transformation. Sometimes, we can even "push" some transformation logic to the target database instead of doing it in the source side (Especially in the case of EL-T rather than ETL). Such optimization is crucial for overall ETL performance.

How does Push-Down Optimization work?


One can push transformation logic to the source or target database using pushdown optimization. The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.

Using Pushdown Optimization


Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization. Let us take an example:

Filter Condition used in this mapping is: DEPTNO>40 Suppose a mapping contains a Filter transformation that filters out all employees except those with a DEPTNO greater than 40. The Integration Service can push the transformation logic to the database. It generates the following SQL statement to process the transformation logic:
INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO) SELECT EMP_SRC.EMPNO, EMP_SRC.ENAME, EMP_SRC.SAL, EMP_SRC.COMM, EMP_SRC.DEPTNO FROM EMP_SRC WHERE (EMP_SRC.DEPTNO >40)

The Integration Service generates an INSERT SELECT statement and it filters the data using a WHERE clause. The Integration Service does not extract data from the database at this time. We can configure pushdown optimization in the following ways: For example, a mapping contains the following transformations: SourceDefn -> SourceQualifier -> Aggregator -> Rank -> Expression -> TargetDefn
SUM(SAL), SUM(COMM) Group by DEPTNO RANK PORT on SAL TOTAL = SAL+COMM

The Rank transformation cannot be pushed to the database. If the session is configured for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and

the Aggregator transformation to the source, processes the Rank transformation, and pushes the Expression transformation and target to the target database. When we use pushdown optimization, the Integration Service converts the expression in the transformation or in the workflow link by determining equivalent operators, variables, and functions in the database. If there is no equivalent operator, variable, or function, the Integration Service itself processes the transformation logic. The Integration Service logs a message in the workflow log and the Pushdown Optimization Viewer when it cannot push an expression to the database. Use the message to determine the reason why it could not push the expression to the database.
Informatica scenarios Design a mapping to load the last 3 rows from a flat file into a target? Solution: Consider the source has the following data. col a b c d e Step1: You have to assign row numbers to each record. Generate the row numbers using the expression transformation as mentioned above and call the row number generated port as O_count. Create a DUMMY output port in the same expression transformation and assign 1 to that port. So that, the DUMMY output port always return 1 for each row. In the expression transformation, the ports are V_count=V_count+1 O_count=V_count O_dummy=1 The output of expression transformation will be col, o_count, o_dummy a, 1, 1 b, 2, 1 c, 3, 1 d, 4, 1 e, 5, 1 Step2: Pass the output of expression transformation to aggregator and do not specify any group by condition. Create an output port O_total_records in the aggregator and assign O_count port to it. The aggregator will return the last row by default. The output of aggregator contains the DUMMY port which has value 1 and O_total_records port which has the value of total number of records in the source. In the aggregator transformation, the ports are O_dummy O_count O_total_records=O_count

The output of aggregator transformation will be O_total_records, O_dummy 5, 1 Step3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation. In the joiner transformation, the join condition will be O_dummy (port from aggregator transformation) = O_dummy (port from expression transformation) The output of joiner transformation will be col, o_count, o_total_records a, 1, 5 b, 2, 5 c, 3, 5 d, 4, 5 e, 5, 5 Step4: Now pass the ouput of joiner transformation to filter transformation and specify the filter condition as O_total_records (port from aggregator)O_count(port from expression) <=2 In the filter transformation, the filter condition will be O_total_records - O_count <=2 The output of filter transformation will be col o_count, o_total_records c, 3, 5 d, 4, 5 e, 5, 5 Design a mapping to load the first record from a flat file into one table A, the last record from a flat file into table B and the remaining records into table C? Solution: This is similar to the above problem; the first 3 steps are same. In the last step instead of using the filter transformation, you have to use router transformation. In the router transformation create two output groups. In the first group, the condition should be O_count=1 and connect the corresponding output group to table A. In the second group, the condition should be O_count=O_total_records and connect the corresponding output group to table B. The output of default group should be connected to table C.

Consider the following products data which contain duplicate rows. A B C C B D

B Q1. Design a mapping to load all unique products in one table and the duplicate rows in another table. The first table should contain the following output A D The second target should contain the following output B B B C C Solution: Use sorter transformation and sort the products data. Pass the output to an expression transformation and create a dummy port O_dummy and assign 1 to that port. So that, the DUMMY output port always return 1 for each row. The output of expression transformation will be Product, O_dummy A, 1 B, 1 B, 1 B, 1 C, 1 C, 1 D, 1 Pass the output of expression transformation to an aggregator transformation. Check the group by on product port. In the aggreagtor, create an output port O_count_of_each_product and write an expression count(product). The output of aggregator will be Product, O_count_of_each_product A, 1 B, 3 C, 2 D, 1 Now pass the output of expression transformation, aggregator transformation to joiner transformation and join on the products port. In the joiner transformation check the property sorted input, then only

you can connect both expression and aggregator to joiner transformation. The output of joiner will be product, O_dummy, O_count_of_each_product A, 1, 1 B, 1, 3 B, 1, 3 B, 1, 3 C, 1, 2 C, 1, 2 D, 1, 1 Now pass the output of joiner to a router transformation, create one group and specify the group condition as O_dummy=O_count_of_each_product. Then connect this group to one table. Connect the output of default group to another table. Q2. Design a mapping to load each product once into one table and the remaining products which are duplicated into another table. The first table should contain the following output A B C D The second table should contain the following output B B C Solution: Use sorter transformation and sort the products data. Pass the output to an expression transformation and create a variable port,V_curr_product, and assign product port to it. Then create a V_count port and in the expression editor write IIF(V_curr_product=V_prev_product, V_count+1,1). Create one more variable port V_prev_port and assign product port to it. Now create an output port O_count port and assign V_count port to it. In the expression transformation, the ports are Product V_curr_product=product V_count=IIF(V_curr_product=V_prev_product,V_count+1,1) V_prev_product=product O_count=V_count

The output of expression transformation will be Product, O_count A, 1 B, 1 B, 2 B, 3 C, 1 C, 2 D, 1 Now Pass the output of expression transformation to a router transformation, create one group and specify the condition as O_count=1. Then connect this group to one table. Connect the output of default group to another table. Design a mapping to get the pervious row salary for the current row. If there is no pervious row exists for the current row, then the pervious row salary should be displayed as null. The output should look like as employee_id, salary, pre_row_salary 10, 1000, Null 20, 2000, 1000 30, 3000, 2000 40, 5000, 3000 Solution: Connect the source Qualifier to expression transformation. In the expression transformation, create a variable port V_count and increment it by one for each row entering the expression transformation. Also create V_salary variable port and assign the expression IIF(V_count=1,NULL,V_prev_salary) to it . Then create one more variable port V_prev_salary and assign Salary to it. Now create output port O_prev_salary and assign V_salary to it. Connect the expression transformation to the target ports. In the expression transformation, the ports will be employee_id salary V_count=V_count+1 V_salary=IIF(V_count=1,NULL,V_prev_salary) V_prev_salary=salary O_prev_salary=V_salary

Design a mapping to get the next row salary for the current row. If there is no next row for the current row, then the next row salary should be displayed as null. The output should look like as employee_id, salary, next_row_salary 10, 1000, 2000 20, 2000, 3000 30, 3000, 5000 40, 5000, Null Solution: Step1: Connect the source qualifier to two expression transformation. In each expression transformation, create a variable port V_count and in the expression editor write V_count+1. Now create an output port O_count in each expression transformation. In the first expression transformation, assign V_count to O_count. In the second expression transformation assign V_count-1 to O_count. In the first expression transformation, the ports will be employee_id salary V_count=V_count+1 O_count=V_count In the second expression transformation, the ports will be employee_id salary V_count=V_count+1 O_count=V_count-1 Step2: Connect both the expression transformations to joiner transformation and join them on the port O_count. Consider the first expression transformation as Master and second one as detail. In the joiner specify the join type as Detail Outer Join. In the joiner transformation check the property sorted input, then only you can connect both expression transformations to joiner transformation. Step3: Pass the output of joiner transformation to a target table. From the joiner, connect the employee_id, salary which are obtained from the first expression transformation to the employee_id, salary ports in target table. Then from the joiner, connect the salary which is obtained from the second expression transformaiton to the next_row_salary port in the target table.

Design a mapping to find the sum of salaries of all employees and this sum should repeat for all the rows. The output should look like as employee_id, salary, salary_sum 10, 1000, 11000 20, 2000, 11000 30, 3000, 11000 40, 5000, 11000 Solution: Step1: Connect the source qualifier to the expression transformation. In the expression transformation, create a dummy port and assign value 1 to it. In the expression transformation, the ports will be employee_id salary O_dummy=1 Step2: Pass the output of expression transformation to aggregator. Create a new port O_sum_salary and in the expression editor write SUM(salary). Do not specify group by on any port. In the aggregator transformation, the ports will be salary O_dummy O_sum_salary=SUM(salary) Step3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation. Step4: Pass the output of joiner to the target table.

2. Consider the following employees table as source department_no, employee_name 20, R 10, A

10, D 20, P 10, B 10, C 20, Q 20, S

Q1. Design a mapping to load a target table with the following values from the above source? department_no, employee_list 10, A 10, A,B 10, A,B,C 10, A,B,C,D 20, A,B,C,D,P 20, A,B,C,D,P,Q 20, A,B,C,D,P,Q,R 20, A,B,C,D,P,Q,R,S Solution: Step1: Use a sorter transformation and sort the data using the sort key as department_no and then pass the output to the expression transformation. In the expression transformation, the ports will be department_no employee_name V_employee_list = IIF(ISNULL(V_employee_list),employee_name,V_employee_list||','||employee_name) O_employee_list = V_employee_list Step2: Now connect the expression transformation to a target table.

Q2. Design a mapping to load a target table with the following values from the above source? department_no, employee_list 10, A 10, A,B 10, A,B,C 10, A,B,C,D 20, P

20, P,Q 20, P,Q,R 20, P,Q,R,S Solution: Step1: Use a sorter transformation and sort the data using the sort key as department_no and then pass the output to the expression transformation. In the expression transformation, the ports will be department_no employee_name V_curr_deptno=department_no V_employee_list = IIF(V_curr_deptno! = V_prev_deptno,employee_name,V_employee_list||','||employee_name) V_prev_deptno=department_no O_employee_list = V_employee_list Step2: Now connect the expression transformation to a target table.

Q3. Design a mapping to load a target table with the following values from the above source? department_no, employee_names 10, A,B,C,D 20, P,Q,R,S Solution: The first step is same as the above problem. Pass the output of expression to an aggregator transformation and specify the group by as department_no. Now connect the aggregator transformation to a target table.

How to solve pcsf_10342 when enabling integration service in Informatica 8.6.1 ? Answered by: Sudarshan on: May 2nd, 2012 Login to the admin console and delete the repository content and recreate it. Restart the integration service. It works! Hiiam learning Informatica 8.1 ( which is what I could get my hands on)..I am connecting to Oracle 10g.I created 2 connections to the db using connection --> relational connection

browser.The source and target are the same db in this case, just diff table names. But created 2 separate connections .I... Answered by: Raghu on: Mar 9th, 2012 Delete it Answered by: Lokesh M on: Dec 20th, 2011 Try these and see if it helps - Delete statics and try to retrieve. - Try to export with INDEXES=n STATISTICS=none - Disable auditing with "noaudit session" SQL> noaudit session;

1.Junk Dimension:Contains miscellaneous data such as flags and indicator 2.Degenerated Dimension:Which is derived from the fact table and does not have any dimension of its own. 3.Conformed Dimension:which is connected are shared by more than one facts. Index cache : Integration service stores all conditional values in to the index cache and all output values into the data cache.

Unix
How to print/display the first line of a file?
There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command.
$> head -1 file.txt

No prize in guessing that if you specify [head -2] then it would print first 2 records of the file. Another way can be by using [sed] command. [Sed] is a very powerful text editor which can be used for various text manipulation purposes like this.
$> sed '2,$ d' file.txt

How does the above command work? The 'd' parameter basically tells [sed] to delete all the records from display from line 2 to last line of the file (last line is represented by $ symbol). Of course it does not actually delete those lines from the file, it just does not display those lines in standard output screen. So you only see the remaining line which is the 1st line.

How to print/display the last line of a file?


The easiest way is to use the [tail] command.
$> tail -1 file.txt

If you want to do it using [sed] command, here is what you should write:
$> sed -n '$ p' test

From our previous answer, we already know that '$' stands for the last line of the file. So '$ p' basically prints (p for print) the last line in standard output screen. '-n' switch takes [sed] to silent mode so that [sed] does not print anything else in the output.

How to display n-th line of a file?


The easiest way to do it will be by using [sed] I guess. Based on what we already know about [sed] from our previous examples, we can quickly deduce this command:
$> sed n '<n> p' file.txt

You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be
$> sed n '4 p' test

Of course you can do it by using [head] and [tail] command as well like below:
$> head -<n> file.txt | tail -1

You need to replace <n> with the actual line number. So if you want to print the 4th line, the command will be
$> head -4 file.txt | tail -1

How to remove the first line / header from a file?


We already know how [sed] can be used to delete a certain line from the output by using the'd' switch. So if we want to delete the first line the command should be:
$> sed '1 d' file.txt

But the issue with the above command is, it just prints out all the lines except the first line of the file on the standard output. It does not really change the file in-place. So if you want to delete the first line from the file itself, you have two options. Either you can redirect the output of the file to some other file and then rename it back to original file like below:
$> sed '1 d' file.txt > new_file.txt $> mv new_file.txt file.txt

Or, you can use an inbuilt [sed] switch 'i' which changes the file in-place. See below:
$> sed i '1 d' file.txt

How to remove the last line/ trailer from a file in Unix script?
Always remember that [sed] switch '$' refers to the last line. So using this knowledge we can deduce the below command:
$> sed i '$ d' file.txt

How to remove certain lines from a file in Unix?


If you want to remove line <m> to line <n> from a given file, you can accomplish the task in the similar method shown above. Here is an example:
$> sed i '5,7 d' file.txt

The above command will delete line 5 to line 7 from the file file.txt

How to remove the last n-th line from a file?


This is bit tricky. Suppose your file contains 100 lines and you want to remove the last 5 lines. Now if you know how many lines are there in the file, then you can simply use the above shown method and can remove all the lines from 96 to 100 like below:
$> sed i '96,100 d' file.txt # alternative to command [head -95 file.txt]

But not always you will know the number of lines present in the file (the file may be generated dynamically, etc.) In that case there are many different ways to solve the problem. There are some ways which are quite complex and fancy. But let's first do it in a way that we can understand easily and remember easily. Here is how it goes:
$> tt=`wc -l file.txt | cut -f1 -d' '`;sed i "`expr $tt - 4`,$tt d" test

As you can see there are two commands. The first one (before the semi-colon) calculates the total number of lines present in the file and stores it in a variable called tt. The second command (after the semi-colon), uses the variable and works in the exact way as shows in the previous example.

How to check the length of any line in a file?


We already know how to print one line from a file which is this:
$> sed n '<n> p' file.txt

Where <n> is to be replaced by the actual line number that you want to print. Now once you know it, it is easy to print out the length of this line by using [wc] command with '-c' switch.
$> sed n '35 p' file.txt | wc c

The above command will print the length of 35th line in the file.txt.

How to get the nth word of a line in Unix?


Assuming the words in the line are separated by space, we can use the [cut] command. [cut] is a very powerful and useful command and it's real easy. All you have to do to get the n-th word from the line is issue the following command:
cut f<n> -d' '

'-d' switch tells [cut] about what is the delimiter (or separator) in the file, which is space ' ' in this case. If the separator was comma, we could have written -d',' then. So, suppose I want find the 4th word from the below string: A quick brown fox jumped over the lazy cat, we will do something like this:
$> echo A quick brown fox jumped over the lazy cat | cut f4 d' '

And it will print fox

How to reverse a string in unix?


Pretty easy. Use the [rev] command.
$> echo "unix" | rev xinu

How to get the last word from a line in Unix file?


We will make use of two commands that we learnt above to solve this. The commands are [rev] and [cut]. Here we go. Let's imagine the line is: C for Cat. We need Cat. First we reverse the line. We get taC rof C. Then we cut the first word, we get 'taC'. And then we reverse it again.
$>echo "C for Cat" | rev | cut -f1 -d' ' | rev Cat

How to get the n-th field from a Unix command output?


We know we can do it by [cut]. Like below command extracts the first field from the output of [wc c] command

$>wc -c file.txt | cut -d' ' -f1 109

But I want to introduce one more command to do this here. That is by using [awk] command. [awk] is a very powerful command for text pattern scanning and processing. Here we will see how may we use of [awk] to extract the first field (or first column) from the output of another command. Like above suppose I want to print the first column of the [wc c] output. Here is how it goes like this:
$>wc -c file.txt | awk ' ''{print $1}' 109

The basic syntax of [awk] is like this:


awk 'pattern space''{action space}'

The pattern space can be left blank or omitted, like below:


$>wc -c file.txt | awk '{print $1}' 109

In the action space, we have asked [awk] to take the action of printing the first column ($1). More on [awk] later.

How to replace the n-th line in a file with a new line in Unix?
This can be done in two steps. The first step is to remove the n-th line. And the second step is to insert a new line in n-th line position. Here we go. Step 1: remove the n-th line
$>sed -i'' '10 d' file.txt # d stands for delete

Step 2: insert a new line at n-th line position


$>sed -i'' '10 i This is the new line' file.txt # i stands for insert

How to show the non-printable characters in a file?


Open the file in VI editor. Go to VI command mode by pressing [Escape] and then [:]. Then type [set list]. This will show you all the non-printable characters, e.g. Ctrl-M characters (^M) etc., in the file.

How to zip a file in Linux?


Use inbuilt [zip] command in Linux
$> zip j file.zip

How to unzip a file in Linux?

Use inbuilt [unzip] command in Linux.


$> unzip j file.zip

How to test if a zip file is corrupted in Linux?


Use -t switch with the inbuilt [unzip] command
$> unzip t file.zip

How to check if a file is zipped in Unix?


In order to know the file type of a particular file use the [file] command like below:
$> file file.txt file.txt: ASCII text

If you want to know the technical MIME type of the file, use -i switch.
$>file -i file.txt file.txt: text/plain; charset=us-ascii

If the file is zipped, following will be the result


$> file i file.zip file.zip: application/x-zip

How to connect to Oracle database from within shell script?


You will be using the same [sqlplus] command to connect to database that you use normally even outside the shell script. To understand this, let's take an example. In this example, we will connect to database, fire a query and get the output printed from the unix shell. Ok? Here we go
$>res=`sqlplus -s username/password@database_name <<EOF SET HEAD OFF; select count(*) from dual; EXIT; EOF` $> echo $res 1

If you connect to database in this method, the advantage is, you will be able to pass Unix side shell variables value to the database. See below example
$>res=`sqlplus -s username/password@database_name <<EOF SET HEAD OFF; select count(*) from student_table t where t.last_name=$1; EXIT; EOF` $> echo $res 12

How to execute a database stored procedure from Shell script?

$> SqlReturnMsg=`sqlplus -s username/password@database<<EOF BEGIN Proc_Your_Procedure( your-input-parameters ); END; / EXIT; EOF` $> echo $SqlReturnMsg

How to check the command line arguments in a UNIX command in Shell Script?
In a bash shell, you can access the command line arguments using $0, $1, $2, variables, where $0 prints the command name, $1 prints the first input parameter of the command, $2 the second input parameter of the command and so on.

How to fail a shell script programmatically?


Just put an [exit] command in the shell script with return value other than 0. this is because the exit codes of successful Unix programs is zero. So, suppose if you write
exit -1

inside your program, then your program will thrown an error and exit immediately.

How to list down file/folder lists alphabetically?


Normally [ls lt] command lists down file/folder list sorted by modified time. If you want to list then alphabetically, then you should simply specify: [ls l]

How to check if the last command was successful in Unix?


To check the status of last executed command in UNIX, you can check the value of an inbuilt bash variable [$?]. See the below example:
$> echo $?

How to check if a file is present in a particular directory in Unix?


Using command, we can do it in many ways. Based on what we have learnt so far, we can make use of [ls] and [$?] command to do this. See below:
$> ls l file.txt; echo $?

If the file exists, the [ls] command will be successful. Hence [echo $?] will print 0. If the file does not exist, then [ls] command will fail and hence [echo $?] will print 1.

How to check all the running processes in Unix?

The standard command to see this is [ps]. But [ps] only shows you the snapshot of the processes at that instance. If you need to monitor the processes for a certain period of time and need to refresh the results in each interval, consider using the [top] command.
$> ps ef

If you wish to see the % of memory usage and CPU usage, then consider the below switches
$> ps aux

If you wish to use this command inside some shell script, or if you want to customize the output of [ps] command, you may use -o switch like below. By using -o switch, you can specify the columns that you want [ps] to print out.
$>ps -e -o stime,user,pid,args,%mem,%cpu

How to tell if my process is running in Unix?


You can list down all the running processes using [ps] command. Then you can grep your user name or process name to see if the process is running. See below:
$>ps -e -o stime,user,pid,args,%mem,%cpu | grep "opera" 14:53 opera 29904 sleep 60 0.0 0.0 14:54 opera 31536 ps -e -o stime,user,pid,arg 0.0 0.0 14:54 opera 31538 grep opera 0.0 0.0

How to get the CPU and Memory details in Linux server?


In Linux based systems, you can easily access the CPU and memory details from the /proc/cpuinfo and /proc/meminfo, like this:
$>cat /proc/meminfo $>cat /proc/cpuinfo

Вам также может понравиться